SQS vs Kafka vs Redis Streams: Choose Wrong, Pay for Years

You need a queue. The team has opinions. Someone says Kafka. Someone says SQS. Someone says "we already have Redis, let's use Streams."

These are three radically different products. Picking the wrong one isn't a small mistake — it's a six-month migration two years from now.

Here's how to actually decide.

What you're picking between

SQS — fully managed AWS queue. Pay-per-message. Effectively infinite scale. Limited features.

Kafka — distributed log. High throughput, replay, event sourcing. Either run it yourself (operational burden) or pay Confluent/MSK (expensive at scale).

Redis Streams — append-only log inside Redis. Cheap, fast, simple. Limited durability and scale.

These overlap in the diagram but solve different problems.

The decision tree

Question 1: Do you need to replay messages?

If yes (event sourcing, ML training pipelines, audit logs that downstream services consume) — Kafka or compatible (Redpanda, MSK).

If no (most CRUD work, background jobs) — keep going.

Question 2: Do you need >10k messages/second per topic?

If yes — Kafka. SQS can technically scale this high but costs and ergonomics break down.

If no — keep going.

Question 3: Are you already on AWS and don't want to operate anything?

If yes — SQS. It's the right answer for 80% of "we need a queue" use cases.

Question 4: Do you already have Redis, low message volumes (<1k/sec), and want zero new infra?

If yes — Redis Streams. Good for short-term internal job queues.

That covers most cases. If you find yourself answering "yes" to multiple — pick the most expensive answer (Kafka). It's the most flexible.

SQS: where it shines

Background jobs (email sending, image resizing, webhook delivery)
Decoupling services (producer doesn't care about consumer health)
Spike absorption (front-end can write fast, processing catches up)
Anything that doesn't need ordering across the whole queue (FIFO queues add complexity)

Cost: $0.40 per million requests. A million jobs/day = $12/month. You will not beat this with self-hosted anything.

Limitations:

Max message size 256KB (use S3 for blob, send pointer)
Visibility timeout model — if your consumer takes longer than expected, message redelivered
No replay — once consumed, gone (unless you wrote it to S3 yourself)
FIFO mode is slower (300 msg/sec/group) than standard

Kafka: where it shines

Event sourcing, where new services want to replay history
High-throughput data pipelines (millions of msgs/sec)
Multi-consumer fanout (10 services consume the same topic, each at their own pace)
Stream processing (with Kafka Streams or Flink)

Cost reality:

Self-hosted: at least 3 brokers + ZooKeeper/KRaft. ~$500/month minimum for a small cluster. Plus operational time.
Confluent Cloud: ~$1/GB-month for storage, $0.11/GB ingress. A modest pipeline runs $1-5k/month.
MSK: AWS-managed. Cheaper than Confluent, more operational overhead.

Limitations:

Operational complexity (partitions, rebalancing, schema management)
Painful cost curve once you scale
Easy to misuse — using Kafka for a simple job queue is over-engineering

Redis Streams: where it shines

Internal job queues at low volume
Real-time dashboards (consumer reads recent events)
Anything where you already pay for Redis and don't want to add a new service

Limitations:

Durability is "as good as your Redis backup strategy" — for many setups, that's "not great"
No partitioning model. Single-node throughput cap (~100k msgs/sec, but practical ceiling is lower)
Consumer groups exist but the ergonomics are clunky compared to Kafka or SQS
Can grow your Redis memory unexpectedly if consumers fall behind

For low-volume internal queues, this is genuinely fine. For anything customer-facing or load-bearing — pick differently.

Common wrong picks

"We chose Kafka for our background jobs." You set up a 5-broker cluster to deliver 100 emails/minute. You spent 3 weeks. You're now paying $2k/month plus an engineer's time. SQS would have cost $0.50.

"We chose SQS for event sourcing." No replay, no fanout, no log compaction. You'll re-implement Kafka inside SQS, badly.

"We chose Redis Streams for our durable order pipeline." Redis crashed. You lost a queue. You found out backups were the previous day's. The order pipeline is the last place to discover this.

The migration cost

Switching queue products later is expensive:

Producer code changes (different SDKs, different semantics)
Consumer code changes (different ack/visibility model)
Replay or migration of in-flight messages
Two systems running in parallel during cutover
Updated monitoring, alerting, runbooks

Estimate ~2 engineer-months per migration. Pick well now.

A reasonable default

Most teams need: SQS for background jobs, Kafka if/when they need event sourcing, Redis Streams nowhere.

If I'm being concrete: 90% of "we need a queue" requests are SQS. 8% are Kafka. 2% are Redis Streams (for narrow internal use).

Default to SQS. Only escalate to Kafka when you can articulate exactly why (and "we might need replay someday" doesn't count — wait until you actually do).

The takeaway

Queue products look similar in slides. They're not. Pick by the actual question: do you need replay (Kafka), high throughput (Kafka), AWS-native simplicity (SQS), or zero new infra at low volume (Redis Streams). Default to SQS. Avoid Kafka until you genuinely need its specific properties.

What you're picking between

The decision tree

SQS: where it shines

Kafka: where it shines

Redis Streams: where it shines

Common wrong picks

The migration cost

A reasonable default

The takeaway

Related posts

Subscribe to new posts