Tag

Gpu

1 post on Gpu

All (83)AI (3)Architecture (1)Developer Experience (1)Developer Productivity (1)Engineering Management (3)Metrics (1)Platform Engineering (1)Product (2)Software Development (1)agents (6)agile (1)ai (35)analytics (1)architecture (23)benchmarks (1)business (4)career (1)claude-code (3)cloud (1)communication (1)cost (3)culture (9)database (2)databases (1)developer-experience (6)devex (9)devops (12)docker (1)elasticsearch (1)embeddings (1)engineering (25)engineering management (1)engineering-management (23)finops (1)gpu (1)guide (1)hiring (2)infrastructure (16)interviews (1)kafka (1)kubernetes (3)leadership (3)llm (10)mcp (1)meta (1)monorepo (1)navigation (1)networking (1)observability (3)on-call (1)patterns (1)performance (1)postgres (1)process (1)product (17)product-management (2)productivity (8)prompt-engineering (2)qdrant (1)queues (1)rag (3)reliability (2)saas (1)scaling (1)search (1)security (7)software-engineering (17)startups (2)strategy (4)teams (6)technical-debt (1)testing (5)threat-intelligence (1)tooling (1)tools (4)typescript (1)vector-search (3)vibe-coding (1)web (2)wordpress (1)

April 10, 20266 min read
Self-Hosting an LLM on Kubernetes
Managed inference APIs are convenient until they are not. Here is the full picture of running your own LLM on Kubernetes: GPU scheduling, model storage, vLLM vs Ollama, and the operational tradeoffs.
kubernetes llm ai gpu infrastructure