Tag

Infrastructure

16 posts on Infrastructure

All (83)AI (3)Architecture (1)Developer Experience (1)Developer Productivity (1)Engineering Management (3)Metrics (1)Platform Engineering (1)Product (2)Software Development (1)agents (6)agile (1)ai (35)analytics (1)architecture (23)benchmarks (1)business (4)career (1)claude-code (3)cloud (1)communication (1)cost (3)culture (9)database (2)databases (1)developer-experience (6)devex (9)devops (12)docker (1)elasticsearch (1)embeddings (1)engineering (25)engineering management (1)engineering-management (23)finops (1)gpu (1)guide (1)hiring (2)infrastructure (16)interviews (1)kafka (1)kubernetes (3)leadership (3)llm (10)mcp (1)meta (1)monorepo (1)navigation (1)networking (1)observability (3)on-call (1)patterns (1)performance (1)postgres (1)process (1)product (17)product-management (2)productivity (8)prompt-engineering (2)qdrant (1)queues (1)rag (3)reliability (2)saas (1)scaling (1)search (1)security (7)software-engineering (17)startups (2)strategy (4)teams (6)technical-debt (1)testing (5)threat-intelligence (1)tooling (1)tools (4)typescript (1)vector-search (3)vibe-coding (1)web (2)wordpress (1)

May 20, 20266 min read
Feature Flags Die in Production
Feature flags start as a deployment safety tool and end as permanent conditionals no one understands. Here is how to prevent the graveyard.
engineering devex infrastructure
May 20, 20267 min read
Oncall Burnout Is a Design Failure
Paging fatigue isn't a staffing problem. It's a design problem. Systems that generate noise do so because they weren't designed for operability.
engineering observability infrastructure
May 20, 20266 min read
Staging Is Not What You Think It Is
Every team believes their staging environment reflects production. Almost none of them do. Here is how to test in production safely instead.
engineering infrastructure testing
May 19, 20266 min read
Database Migrations Are the Riskiest Code You Ship
Application code that breaks can be rolled back in seconds. A migration that breaks has already changed your data. Migrations deserve more caution than any other code in your pipeline — and usually get less.
engineering databases infrastructure architecture devops
May 18, 20265 min read
The Retry Storm: When Your Resilience Code Causes the Outage
Retries, timeouts, and health checks are supposed to make systems resilient. Configured naively, they turn a recoverable blip into a self-sustaining outage. The resilience code becomes the incident.
engineering infrastructure architecture devops reliability
May 18, 20266 min read
Your Staging Environment Is Lying to You
Staging exists to catch problems before production. Most staging environments catch the wrong problems and miss the real ones, because they differ from production in exactly the ways that matter.
engineering devops infrastructure developer-experience testing
April 30, 20267 min read
Elasticsearch Across Many Services: The Right Way
Shared cluster, isolated tenants, write-through pipelines, and the index design choices that decide whether you scale or burn down.
architecture infrastructure elasticsearch search
April 30, 20263 min read
Observability Without Datadog: A $50/Month Stack That Works
Datadog at series A is fine. Datadog at seed is malpractice. Here's a stack that gets you 80% of the value for 1% of the cost.
observability devops infrastructure cost
April 30, 20263 min read
Postgres Indexes That Actually Matter at Scale
Most slow queries aren't about hardware. They're about three indexes you didn't add. Here's the playbook.
postgres database performance infrastructure
April 30, 20265 min read
When to Move Analytics Off Postgres (And When Not To)
Your dashboards are slow. Engineers want ClickHouse. The CFO is nervous. Here's the real decision framework.
database analytics infrastructure architecture
April 30, 20264 min read
SQS vs Kafka vs Redis Streams: Choose Wrong, Pay for Years
Three queueing options with very different cost, throughput, and operational profiles. Pick the wrong one early and you'll re-platform later.
architecture infrastructure queues kafka
April 25, 20268 min read
How Transparent Proxies Work (And Why You're Probably Behind One Right Now)
Every HTTP request you make likely passes through a proxy you never configured. Here is the network-level mechanism — iptables NAT REDIRECT, TPROXY, Squid in action, and why HTTPS only partially protects you.
networking security infrastructure
April 10, 20266 min read
Self-Hosting an LLM on Kubernetes
Managed inference APIs are convenient until they are not. Here is the full picture of running your own LLM on Kubernetes: GPU scheduling, model storage, vLLM vs Ollama, and the operational tradeoffs.
kubernetes llm ai gpu infrastructure
March 28, 202610 min read
RAG in Production: How Retrieval-Augmented Generation Actually Works
LLMs don't know your data. RAG fixes that by turning your documents into a searchable knowledge base. Here is the full pipeline: chunking strategies, dense vs hybrid retrieval, re-ranking, and when to reach for graph-based RAG with LightRAG.
ai llm rag vector-search infrastructure
March 20, 202612 min read
Why I Run Qdrant in Production: A 3-Node Cluster vs the Alternatives
Pinecone, Weaviate, Milvus, pgvector, Qdrant — five viable choices for a vector database. Here is why I picked Qdrant for production, how the 3-node cluster is laid out, and what the other options actually trade away.
vector-search qdrant rag infrastructure kubernetes
March 12, 20265 min read
Docker Gets You to Production. Kubernetes Keeps You There.
Docker solves the packaging problem. Kubernetes solves the operational problem. Here is what K8s actually adds, how its core objects work, and why rolling updates change how you think about deployments.
kubernetes docker devops infrastructure

Feature Flags Die in Production

Oncall Burnout Is a Design Failure

Staging Is Not What You Think It Is

Database Migrations Are the Riskiest Code You Ship

The Retry Storm: When Your Resilience Code Causes the Outage

Your Staging Environment Is Lying to You

Elasticsearch Across Many Services: The Right Way

Observability Without Datadog: A $50/Month Stack That Works

Postgres Indexes That Actually Matter at Scale

When to Move Analytics Off Postgres (And When Not To)

SQS vs Kafka vs Redis Streams: Choose Wrong, Pay for Years

How Transparent Proxies Work (And Why You're Probably Behind One Right Now)

Self-Hosting an LLM on Kubernetes

RAG in Production: How Retrieval-Augmented Generation Actually Works

Why I Run Qdrant in Production: A 3-Node Cluster vs the Alternatives

Docker Gets You to Production. Kubernetes Keeps You There.