Tag

LLM

10 posts on LLM

All (83)AI (3)Architecture (1)Developer Experience (1)Developer Productivity (1)Engineering Management (3)Metrics (1)Platform Engineering (1)Product (2)Software Development (1)agents (6)agile (1)ai (35)analytics (1)architecture (23)benchmarks (1)business (4)career (1)claude-code (3)cloud (1)communication (1)cost (3)culture (9)database (2)databases (1)developer-experience (6)devex (9)devops (12)docker (1)elasticsearch (1)embeddings (1)engineering (25)engineering management (1)engineering-management (23)finops (1)gpu (1)guide (1)hiring (2)infrastructure (16)interviews (1)kafka (1)kubernetes (3)leadership (3)llm (10)mcp (1)meta (1)monorepo (1)navigation (1)networking (1)observability (3)on-call (1)patterns (1)performance (1)postgres (1)process (1)product (17)product-management (2)productivity (8)prompt-engineering (2)qdrant (1)queues (1)rag (3)reliability (2)saas (1)scaling (1)search (1)security (7)software-engineering (17)startups (2)strategy (4)teams (6)technical-debt (1)testing (5)threat-intelligence (1)tooling (1)tools (4)typescript (1)vector-search (3)vibe-coding (1)web (2)wordpress (1)

April 30, 20264 min read
Embedding Models: Which One, and Why It Matters Less Than You Think
Embedding model choice is a 5% problem for most RAG systems. Your chunking strategy is the 50% problem. Here's how to pick anyway.
ai llm rag embeddings vector-search
April 30, 20265 min read
Prompts Are Code: How to Version, Test, and Deploy Them
Your AI feature has a 200-line system prompt living in a string in app.py. That's tech debt. Here's how to treat prompts like first-class artifacts.
ai llm software-engineering prompt-engineering
April 30, 20264 min read
Prompt Caching: The Cost Math Most Teams Get Wrong
Prompt caching is not a 90% discount. It's a 90% discount on the static parts only. Here's how to actually compute your cache savings.
ai llm cost claude-code prompt-engineering
April 30, 20265 min read
Testing AI Features: Why Unit Tests Lie and What to Do Instead
Your AI feature passes 100% of unit tests and ships broken to users every other week. Here's why, and how to actually test LLM-powered systems.
ai llm testing software-engineering
April 29, 20263 min read
Why Your AI Product Feels Broken (Even Though the Model Is Good)
Claude 4 didn't get stupider. Your safety layer is failing. How to identify when the problem is your architecture, not the LLM.
ai product architecture llm
April 28, 20267 min read
MCP Is Not a Better Function Calling. It's a Different Layer Entirely.
Ten months after MCP went multi-vendor, most teams are still treating it as a nicer function-calling wrapper. That's the wrong mental model — and it's quietly producing architectures that don't scale.
ai architecture mcp agents llm software-engineering
April 27, 202613 min read
86% of Multi-Agent Systems Die Before Production. Here's Why.
A MAST taxonomy of 1,600+ execution traces maps 14 failure modes across 3 root causes. The model is almost never the problem. The orchestration architecture almost always is.
ai architecture agents software-engineering llm
April 26, 202611 min read
Context Engineering Is Just Systems Design (And Most Teams Are Starting Over)
82% of AI teams say prompt engineering alone isn't enough. The ones succeeding in production are treating context design the same way they treat database indexes — as an architectural decision, not a prompt trick.
ai architecture software-engineering llm agents
April 10, 20266 min read
Self-Hosting an LLM on Kubernetes
Managed inference APIs are convenient until they are not. Here is the full picture of running your own LLM on Kubernetes: GPU scheduling, model storage, vLLM vs Ollama, and the operational tradeoffs.
kubernetes llm ai gpu infrastructure
March 28, 202610 min read
RAG in Production: How Retrieval-Augmented Generation Actually Works
LLMs don't know your data. RAG fixes that by turning your documents into a searchable knowledge base. Here is the full pipeline: chunking strategies, dense vs hybrid retrieval, re-ranking, and when to reach for graph-based RAG with LightRAG.
ai llm rag vector-search infrastructure

Embedding Models: Which One, and Why It Matters Less Than You Think

Prompts Are Code: How to Version, Test, and Deploy Them

Prompt Caching: The Cost Math Most Teams Get Wrong

Testing AI Features: Why Unit Tests Lie and What to Do Instead

Why Your AI Product Feels Broken (Even Though the Model Is Good)

MCP Is Not a Better Function Calling. It's a Different Layer Entirely.

86% of Multi-Agent Systems Die Before Production. Here's Why.

Context Engineering Is Just Systems Design (And Most Teams Are Starting Over)

Self-Hosting an LLM on Kubernetes

RAG in Production: How Retrieval-Augmented Generation Actually Works