SQS vs Kafka vs Redis Streams: Choose Wrong, Pay for Years
Three queueing options with very different cost, throughput, and operational profiles. Pick the wrong one early and you'll re-platform later.
You need a queue. The team has opinions. Someone says Kafka. Someone says SQS. Someone says "we already have Redis, let's use Streams."
These are three radically different products. Picking the wrong one isn't a small mistake — it's a six-month migration two years from now.
Here's how to actually decide.
What you're picking between
SQS — fully managed AWS queue. Pay-per-message. Effectively infinite scale. Limited features.
Kafka — distributed log. High throughput, replay, event sourcing. Either run it yourself (operational burden) or pay Confluent/MSK (expensive at scale).
Redis Streams — append-only log inside Redis. Cheap, fast, simple. Limited durability and scale.
These overlap in the diagram but solve different problems.
The decision tree
Question 1: Do you need to replay messages?
If yes (event sourcing, ML training pipelines, audit logs that downstream services consume) — Kafka or compatible (Redpanda, MSK).
If no (most CRUD work, background jobs) — keep going.
Question 2: Do you need >10k messages/second per topic?
If yes — Kafka. SQS can technically scale this high but costs and ergonomics break down.
If no — keep going.
Question 3: Are you already on AWS and don't want to operate anything?
If yes — SQS. It's the right answer for 80% of "we need a queue" use cases.
Question 4: Do you already have Redis, low message volumes (<1k/sec), and want zero new infra?
If yes — Redis Streams. Good for short-term internal job queues.
That covers most cases. If you find yourself answering "yes" to multiple — pick the most expensive answer (Kafka). It's the most flexible.
SQS: where it shines
- Background jobs (email sending, image resizing, webhook delivery)
- Decoupling services (producer doesn't care about consumer health)
- Spike absorption (front-end can write fast, processing catches up)
- Anything that doesn't need ordering across the whole queue (FIFO queues add complexity)
Cost: $0.40 per million requests. A million jobs/day = $12/month. You will not beat this with self-hosted anything.
Limitations:
- Max message size 256KB (use S3 for blob, send pointer)
- Visibility timeout model — if your consumer takes longer than expected, message redelivered
- No replay — once consumed, gone (unless you wrote it to S3 yourself)
- FIFO mode is slower (300 msg/sec/group) than standard
Kafka: where it shines
- Event sourcing, where new services want to replay history
- High-throughput data pipelines (millions of msgs/sec)
- Multi-consumer fanout (10 services consume the same topic, each at their own pace)
- Stream processing (with Kafka Streams or Flink)
Cost reality:
- Self-hosted: at least 3 brokers + ZooKeeper/KRaft. ~$500/month minimum for a small cluster. Plus operational time.
- Confluent Cloud: ~$1/GB-month for storage, $0.11/GB ingress. A modest pipeline runs $1-5k/month.
- MSK: AWS-managed. Cheaper than Confluent, more operational overhead.
Limitations:
- Operational complexity (partitions, rebalancing, schema management)
- Painful cost curve once you scale
- Easy to misuse — using Kafka for a simple job queue is over-engineering
Redis Streams: where it shines
- Internal job queues at low volume
- Real-time dashboards (consumer reads recent events)
- Anything where you already pay for Redis and don't want to add a new service
Limitations:
- Durability is "as good as your Redis backup strategy" — for many setups, that's "not great"
- No partitioning model. Single-node throughput cap (~100k msgs/sec, but practical ceiling is lower)
- Consumer groups exist but the ergonomics are clunky compared to Kafka or SQS
- Can grow your Redis memory unexpectedly if consumers fall behind
For low-volume internal queues, this is genuinely fine. For anything customer-facing or load-bearing — pick differently.
Common wrong picks
"We chose Kafka for our background jobs." You set up a 5-broker cluster to deliver 100 emails/minute. You spent 3 weeks. You're now paying $2k/month plus an engineer's time. SQS would have cost $0.50.
"We chose SQS for event sourcing." No replay, no fanout, no log compaction. You'll re-implement Kafka inside SQS, badly.
"We chose Redis Streams for our durable order pipeline." Redis crashed. You lost a queue. You found out backups were the previous day's. The order pipeline is the last place to discover this.
The migration cost
Switching queue products later is expensive:
- Producer code changes (different SDKs, different semantics)
- Consumer code changes (different ack/visibility model)
- Replay or migration of in-flight messages
- Two systems running in parallel during cutover
- Updated monitoring, alerting, runbooks
Estimate ~2 engineer-months per migration. Pick well now.
A reasonable default
Most teams need: SQS for background jobs, Kafka if/when they need event sourcing, Redis Streams nowhere.
If I'm being concrete: 90% of "we need a queue" requests are SQS. 8% are Kafka. 2% are Redis Streams (for narrow internal use).
Default to SQS. Only escalate to Kafka when you can articulate exactly why (and "we might need replay someday" doesn't count — wait until you actually do).
The takeaway
Queue products look similar in slides. They're not. Pick by the actual question: do you need replay (Kafka), high throughput (Kafka), AWS-native simplicity (SQS), or zero new infra at low volume (Redis Streams). Default to SQS. Avoid Kafka until you genuinely need its specific properties.
Work with me
I consult with engineering teams on AI adoption, cloud architecture, and engineering effectiveness. If this post surfaced a challenge you're facing, let's talk.
Get in touch →Related posts
Explore more on these topics: