The Spec Is Now the Code: Why Spec-Driven Development Is the Skill Nobody's Talking About

The most common reason your AI agent builds the wrong thing isn't the model.

The model is fine. Claude, GPT-4o, Gemini 2.0 — any of them can build what you need. The reason your agent builds the wrong thing is almost always the same: you gave it a vague instruction and expected it to fill in the gaps the way a senior engineer would.

It won't.

A senior engineer fills gaps with organizational context, taste, and years of implicit knowledge about your codebase and customers. An AI agent fills gaps by pattern-matching on its training data — which means it gives you a reasonable-looking answer that isn't the right answer for your specific situation.

The bottleneck has shifted. You don't need to learn to write better code. You need to learn to write better specs.

What Actually Changed (And Why Right Now)

Two things happened simultaneously around late 2025 that created this moment:

Models became good enough to execute from precise specifications. Not just "here's a function" execution — full feature execution. GitHub Spec Kit crossed 80,000 stars within months of launch and works with 24+ coding agents. Amazon shipped Kiro, an IDE built entirely around this idea. Martin Fowler is writing about it. ThoughtWorks placed Spec-Driven Development in their Technology Radar "Assess" ring. Something real is happening.

The cost of bad specifications became visible. Before AI, vague tickets were painful but recoverable. A developer would read a bad ticket, make reasonable assumptions, get feedback in code review, and iterate. The feedback loop was tight. With AI agents running for hours against your spec, bad input compounds. Every ambiguity becomes a branching point — and the agent will choose silently at each one.

A one-sentence Jira ticket that used to cost you a ten-minute miscommunication now costs you three hours of agent runtime and a PR that does the wrong thing convincingly.

Spec-Driven Development Is Not Writing Bigger Tickets

This is the first misconception to kill: SDD is not "write longer PRDs" or "add more acceptance criteria to your stories."

A PRD is written for human readers who can interpret ambiguity. An engineer reads a vague requirement like "users should be able to manage their profile" and knows from context that it means name, avatar, and password — not the entire account settings tree. Humans fill gaps from shared organizational context.

AI agents fill gaps from training data. There's no organizational context. There's no implicit knowledge about your users or your existing data model. Give an agent "users should be able to manage their profile" and it'll build something reasonable-looking and probably wrong.

A spec, in the SDD sense, is written to be executable. As Thoughtworks put it:

"A PRD or design doc is written for human readers who can interpret ambiguity and fill gaps from organizational context. AI agents fill gaps too — but not in the way you'd want."

The goal of a spec isn't to describe the intention. It's to constrain the solution space.

The Three Layers Every Executable Spec Needs

The pattern that's stabilized across GitHub Spec Kit, Kiro, and the Claude Code community is a three-phase spec. Here's what each phase does and why you can't skip one:

Phase 1: Requirements (User-observable behavior)

What does the system do from the user's perspective? Expressed as user stories with EARS-notation acceptance criteria:

WHEN a user submits the contact form with valid inputs
THEN the system SHALL:
  - Display a success confirmation within 2 seconds
  - Insert a row into contact_requests with status=pending
  - Return HTTP 201 with { "success": true }

WHEN the Turnstile token is invalid
THEN the system SHALL:
  - Return HTTP 422 with { "error": "captcha_failed" }
  - NOT insert into contact_requests

Notice what's different from a typical acceptance criterion: observable inputs, specific outputs, explicit exceptions. No ambiguity about what "success" looks like. The agent has no room to interpret.

Phase 2: Design (Technical constraints and contracts)

How does the system accomplish the requirement? This is architecture, schemas, API contracts, and sequence logic:

Component: ContactForm (frontend)
  - Collects: name (string, max 100), email (valid), subject (max 200),
    message (max 5000)
  - Turnstile widget: rendered via ClientOnly, token in POST body
  - On submit: POST ${VITE_API_URL}/api/contact, Content-Type: application/json

Component: /api/contact (Worker)
  - Validates body via ContactSchema (zod)
  - Rate-limits by cf-connecting-ip (10 req/60s via RATE_LIMITER binding)
  - Verifies Turnstile if TURNSTILE_SECRET is set; skips if unset
  - On success: INSERT into contact_requests, POST to GAS_URL (best-effort)
  - Returns: 201 success | 422 validation/captcha fail | 429 rate-limit

This phase forces you to answer "how?" before handing work to an agent. It surfaces design decisions as explicit choices rather than implicit assumptions. The agent now knows what the data model looks like, not just that one is needed.

Phase 3: Tasks (Discrete implementation steps)

Break the design into atomic, verifiable steps. The key word is atomic — each task should have a definition of done that can be verified independently:

Task 1: Add zod ContactSchema to backend/src/index.ts
  - Fields: name (max 100), email (valid), subject (max 200),
    message (max 5000), turnstileToken
  - Done: schema is exported and tsc --noEmit passes

Task 2: Implement rate limiting in /api/contact handler
  - Use RATE_LIMITER binding from wrangler.toml
  - Return 429 with { "error": "rate_limited" } when exceeded
  - Done: 11 rapid requests → 11th returns 429

Task 3: Implement Turnstile verification (opt-in)
  - Skip if TURNSTILE_SECRET is undefined or empty string
  - Done: form submits successfully with no secret set;
    returns 422 with invalid token

Small, verifiable, explicit. Not "implement the contact form" — that's a PRD bullet. A task is a contract between you and the agent.

The Traditional Flow Is Costing You More Than You Think

Most teams are still running this loop:

Idea → Vague ticket → Agent generates code → Wrong output → Negotiate in review → Revise

The review step is where 80% of the friction lives. When specs are vague, review becomes negotiation. "This isn't what I meant." "That's a reasonable interpretation of what you wrote." Nothing is obvious to an agent.

The SDD loop changes the review entirely. You're not asking "is this what we wanted?" — you're asking "does this match the task's definition of done?" That's a verification, not a debate. Review gets cheap when the spec is good.

Teams that have adopted SDD consistently report 2–3× throughput gains with unchanged headcount. The time "lost" writing specs is recovered many times over in review cycles that don't exist.

What This Means If You're a PM

The most important implication of SDD isn't for engineers. It's for product managers.

If AI agents execute from specs, and PMs write specs, then the PM who can write executable specifications has significantly more direct control over what gets built than ever before. The handoff layer between "what we want" and "what gets built" just got thinner.

But there's a catch. The skill required to write an executable spec is meaningfully different from the skill required to write a good PRD.

A good PRD tells a story. A good spec is a constraint system. You need both — the story to communicate intent to stakeholders, the constraint system to communicate it to agents. The teams that figure this out first will move noticeably faster.

The uncomfortable truth: most PMs write prose when they need to write contracts. That's not a personal failure — it's just not a skill anyone taught, because it didn't matter until now.

The Anti-Pattern: Analysis Paralysis by Spec

Thoughtworks flagged this in the same breath as celebrating SDD: "a bias toward heavy up-front specification and a big-bang release" is the anti-pattern that kills teams who adopt SDD badly.

The point isn't to write a 50-page spec before you write one line of code. That's waterfall with extra steps.

The point is to be precise at the right granularity before you hand work to an agent. A spec for a single feature should take 30–60 minutes to write. If it takes longer, the feature is too big — break it down.

A useful heuristic: if the spec has more than five tasks, split it into two specs. Each task should be achievable in one agent session. Longer than that and you're fighting context window limits and error propagation anyway.

The Tooling That's Stabilizing Around This

You don't need any specific tool to practice SDD — it's a methodology, not a framework. But the ecosystem is converging:

| Tool | Approach | Best For | |---|---|---| | GitHub Spec Kit | Portable, 24+ agents supported | Teams using any coding agent | | Amazon Kiro | Spec-first built into IDE | Teams wanting opinionated integration | | Claude Code + CLAUDE.md | Native hooks + skills system | Claude-first teams | | cc-sdd | Spec as inter-component contract | Multi-agent parallel execution |

If you're already using Claude Code, the path of least resistance is zero new tooling: write specs in a /specs directory, use TodoWrite to track tasks, and use agent subagents in isolated git worktrees for parallel task execution.

Where to Start Monday Morning

You don't need to overhaul your process. Try this on the next feature your team starts:

1. Before writing code (or prompting an agent), spend 30 minutes on a spec. Three sections: what the user observes (requirements with WHEN/THEN), how it works (design with contracts), and discrete steps (tasks with done criteria).

2. Give the spec to your agent instead of the Jira ticket. Compare the output quality.

3. In the PR review, check against the task definitions, not against your mental model of what you wanted. If there's a gap, the spec was unclear — update the spec first.

4. After two weeks, look at PR revision count. That's the metric that moves.

The skill compounds fast. The first spec takes 45 minutes and feels like overhead. The fifth takes 15 minutes and saves you two hours in review. By the tenth you'll be irritated by anyone who hands you a vague ticket.

The Shift in One Sentence

The best engineering teams are no longer distinguished by how fast they write code. They're distinguished by how precisely they can specify what they want.

That's a different skill than most of us trained on. It's learnable. And right now, very few people are doing it well — which means the window to gain a real edge is still open.

Sources: Thoughtworks on SDD · Martin Fowler — SDD Tools · GitHub Spec Kit · SDD with Claude Code · cc-sdd repo · Anthropic Agentic Coding Trends Report 2026