Observability Without Datadog: A $50/Month Stack That Works
Datadog at series A is fine. Datadog at seed is malpractice. Here's a stack that gets you 80% of the value for 1% of the cost.
You're a small team. Your Datadog bill is $4k/month. The CFO asks why. You don't have a good answer.
Datadog is excellent. It's also priced for companies that have already won. If you're pre-product-market-fit and burning runway on observability, you've made a mistake.
There's a stack that runs for $50/month, scales to mid-six-figure ARR, and gives you logs, metrics, traces, and alerts. It's open source plus one cheap managed service.
What you actually need
Not what Datadog sells you. What an engineer at 2am actually uses to fix a production incident:
- Logs — searchable, time-filtered, with structured fields
- Metrics — CPU, memory, request count, error rate, p50/p95/p99 latency
- Traces — when a request is slow, where in the call graph
- Alerts — page when error rate or latency crosses a threshold
That's it. Custom dashboards, anomaly detection, and APM are nice-to-haves. They are not what saves you at 2am.
The stack
Logs: structured JSON to stdout, shipped to Grafana Loki (self-hosted) or Better Stack (managed, $25/month for 30GB).
Metrics: Prometheus + Grafana. Self-hosted on a $10/month VM, or use Grafana Cloud free tier (10k series, 50GB logs, 50GB traces).
Traces: OpenTelemetry SDK in your app, exported to Grafana Tempo or Jaeger.
Alerts: Grafana alerting → PagerDuty (free for up to 5 users) or just email/Slack for early stage.
Total: roughly $25-50/month managed, or one $20 VM if you self-host. You can scale this to ~50M requests/day before hitting limits.
The 30-minute setup
Use OpenTelemetry. It's the unifying SDK that emits all three signals. Your app doesn't care where the data goes:
// app.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_ENDPOINT,
}),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Auto-instrumentation captures HTTP, database, Redis, and most library calls without code changes.
For metrics, expose /metrics from your app via the Prometheus exporter. Point Prometheus at it.
For logs, write JSON to stdout:
import pino from 'pino';
const log = pino();
log.info({ userId, requestId, action: 'order.create' }, 'order created');
Vector or Promtail tails stdout, ships to Loki.
The Grafana Cloud free tier shortcut
If you don't want to run anything: Grafana Cloud free tier covers most early-stage apps. Sign up, get an OTLP endpoint, point your SDK at it. Done.
You get:
- 10k metric series (more than you think — that's 100 services with 100 metrics each)
- 50GB logs/month
- 50GB traces/month
- 14 days retention
That's plenty for a pre-Series A startup.
The two queries you'll actually run
After all this, here's what you'll use day-to-day:
LogQL:
{service="api"} | json | level="error" | line_format "{{.requestId}} {{.message}}"
PromQL:
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="api"}[5m]))
That's 80% of debugging. The fancy dashboards mostly gather dust.
What this won't do
- Real User Monitoring — Datadog/New Relic do RUM well. The OSS equivalents are worse. If you need this, accept the cost.
- Automatic anomaly detection — you have to write threshold-based alerts. That's fine for early stage.
- Slick mobile app — Grafana mobile is okay, not great.
- Dependency graphs — Datadog auto-discovers service maps. With OTel you get traces but not the slick visualization.
When to graduate
You should move to Datadog (or similar) when:
- You have an SRE team that exists to use it
- Your incident volume justifies the better UX
- You're spending more than 5% of an engineer's time maintaining the OSS stack
For most companies, that's series B+ or 50+ engineers. Not before.
The takeaway
Datadog is a great product priced for companies that have already won. If you're still figuring out PMF, $50/month of OSS observability gets you what you need. Save the $50k/year for hiring.
Work with me
I consult with engineering teams on AI adoption, cloud architecture, and engineering effectiveness. If this post surfaced a challenge you're facing, let's talk.
Get in touch →Related posts
Explore more on these topics: