Observability Without Datadog: A $50/Month Stack That Works

You're a small team. Your Datadog bill is $4k/month. The CFO asks why. You don't have a good answer.

Datadog is excellent. It's also priced for companies that have already won. If you're pre-product-market-fit and burning runway on observability, you've made a mistake.

There's a stack that runs for $50/month, scales to mid-six-figure ARR, and gives you logs, metrics, traces, and alerts. It's open source plus one cheap managed service.

What you actually need

Not what Datadog sells you. What an engineer at 2am actually uses to fix a production incident:

Logs — searchable, time-filtered, with structured fields
Metrics — CPU, memory, request count, error rate, p50/p95/p99 latency
Traces — when a request is slow, where in the call graph
Alerts — page when error rate or latency crosses a threshold

That's it. Custom dashboards, anomaly detection, and APM are nice-to-haves. They are not what saves you at 2am.

The stack

Logs: structured JSON to stdout, shipped to Grafana Loki (self-hosted) or Better Stack (managed, $25/month for 30GB).

Metrics: Prometheus + Grafana. Self-hosted on a $10/month VM, or use Grafana Cloud free tier (10k series, 50GB logs, 50GB traces).

Traces: OpenTelemetry SDK in your app, exported to Grafana Tempo or Jaeger.

Alerts: Grafana alerting → PagerDuty (free for up to 5 users) or just email/Slack for early stage.

Total: roughly $25-50/month managed, or one $20 VM if you self-host. You can scale this to ~50M requests/day before hitting limits.

The 30-minute setup

Use OpenTelemetry. It's the unifying SDK that emits all three signals. Your app doesn't care where the data goes:

// app.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_ENDPOINT,
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Auto-instrumentation captures HTTP, database, Redis, and most library calls without code changes.

For metrics, expose /metrics from your app via the Prometheus exporter. Point Prometheus at it.

For logs, write JSON to stdout:

import pino from 'pino';
const log = pino();

log.info({ userId, requestId, action: 'order.create' }, 'order created');

Vector or Promtail tails stdout, ships to Loki.

The Grafana Cloud free tier shortcut

If you don't want to run anything: Grafana Cloud free tier covers most early-stage apps. Sign up, get an OTLP endpoint, point your SDK at it. Done.

You get:

10k metric series (more than you think — that's 100 services with 100 metrics each)
50GB logs/month
50GB traces/month
14 days retention

That's plenty for a pre-Series A startup.

The two queries you'll actually run

After all this, here's what you'll use day-to-day:

LogQL:

{service="api"} | json | level="error" | line_format "{{.requestId}} {{.message}}"

PromQL:

histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="api"}[5m]))

That's 80% of debugging. The fancy dashboards mostly gather dust.

What this won't do

Real User Monitoring — Datadog/New Relic do RUM well. The OSS equivalents are worse. If you need this, accept the cost.
Automatic anomaly detection — you have to write threshold-based alerts. That's fine for early stage.
Slick mobile app — Grafana mobile is okay, not great.
Dependency graphs — Datadog auto-discovers service maps. With OTel you get traces but not the slick visualization.

When to graduate

You should move to Datadog (or similar) when:

You have an SRE team that exists to use it
Your incident volume justifies the better UX
You're spending more than 5% of an engineer's time maintaining the OSS stack

For most companies, that's series B+ or 50+ engineers. Not before.

The takeaway

Datadog is a great product priced for companies that have already won. If you're still figuring out PMF, $50/month of OSS observability gets you what you need. Save the $50k/year for hiring.