93% of Developers Use AI. Your Team Is Still Missing Deadlines. Here's Why.
Faros AI tracked 22,000 developers and found individual AI gains evaporate at the org level. PR merge times are down 20%. Incidents are up 23.5%. Here is the mechanism — and what actually fixes it.
Your developers are faster than they've ever been.
They're closing PRs in hours that used to take days. Code review queues that stretched a week are clearing in an afternoon. An engineer who used to spend a sprint on boilerplate wrote the entire thing in a Tuesday afternoon.
And yet your last three releases shipped late. Your incident rate is up. The CTO is frustrated. The PM is calling for more headcount.
This is the AI productivity paradox — and it's now showing up in real data.
The numbers that should stop you cold
Faros AI published a study in early 2026 tracking two years of telemetry across 22,000 developers at real companies. The headline findings deserve to be read slowly:
- PR merge times improved 20% at the individual level
- AI generates roughly 42% of all code written globally
- Organizational incident rates increased 23.5%
- Production failure rates increased 30%
- 63% of developers report spending more time debugging AI-generated code than it would have taken to write from scratch
Individual speed is up. System reliability is down. Delivery velocity at the org level? Flat.
Faros's conclusion, stated plainly: "Any correlation between AI adoption and key performance metrics evaporates at the company level."
There's a second data point that lands even harder. METR — the AI safety research org that runs rigorous economic-impact studies — tried to measure this properly with a controlled trial in early 2026. They had to abandon the experimental design midway. The reason: developers in the control group (no AI access) refused to participate. The study lead wrote that the team was "unable to find developers willing to work without AI assistance for even a two-week period," making a proper control impossible.
That's not a footnote. It's the whole story. AI has become load-bearing in the development process before we've measured whether it's actually helping at scale.
Why individual gains don't compound to org gains
This isn't random noise. There are five specific mechanisms that absorb individual-level gains before they show up in delivery metrics.
THE AI PRODUCTIVITY PARADOX 22,000 developers · 2 years · Faros AI, 2026
PR merge time −20% ↓ faster per developer
AI code share +42% ↑ of all code written
PRs per developer/week +30% ↑ more output per person
Developer satisfaction +14% ↑ survey scores up
Boilerplate time −60% ↓ real, measurable, consistent
THE GAP gains absorbed by review · bugs · debt
Incident rate +23.5% ↑ pages up across high-AI teams
Production failures +30% ↑ plausible-wrong bugs reach prod
Incident resolution time +34% ↑ engineers debug unfamiliar code
Refactoring activity −60% ↓ structural debt accumulating silently
Delivery velocity (org) ≈ 0% → all individual gains absorbed
Source: Faros AI, 2026 · Forrester, Dec 2025 · DX Q1 2026 Impact Report
The chart makes the paradox concrete. Everything individual developers report as improved — speed, output, satisfaction — is moving in the right direction. Everything that shows up in system-level metrics is moving in the wrong direction. Or not moving at all.
The five mechanisms eating your gains
1. The review bottleneck absorbs the write speedup
When code is generated faster, the bottleneck shifts downstream. Your developers are outputting more code per day — but the reviewers on the other side of those PRs haven't gotten faster. AI-assisted developers create 30% more PRs per week; review turnaround has improved only 8%. Queue length grows. Context-switching increases under load. Review quality degrades. A bottleneck that used to be invisible because writing and reviewing happened at roughly the same rate is now visible.
2. Bug density compounds through the stack
AI-generated code contains 1.7x more major issues than human-written code at equivalent lines of code (Forrester, December 2025). More important: the bugs are different. Human code tends to have obvious mistakes that fail early — a null check missing, a wrong index, a typo that breaks compilation. AI-generated code tends to produce plausible-sounding logic that's subtly wrong under edge conditions. Those bugs survive CI. They reach production. Security vulnerability rates in AI-co-authored code are running 2.74x higher than in human-written code.
3. Refactoring has nearly stopped
Faros found refactoring activity dropped 60% on high-AI-adoption teams. This makes structural sense: AI is good at generating new code and mediocre at improving existing code. Engineers are shipping more net-new output and doing less of the structural maintenance that keeps codebases navigable. Code duplication increased 48%. The codebase becomes harder to reason about, which makes AI output harder to verify, which creates more bugs. The feedback loop is negative.
4. Engineers aren't internalizing what they ship
When you write code from scratch, you understand it. When you accept AI output, you sometimes understand it and sometimes don't — and in a fast-moving team with queue pressure, you often don't stop to find out. The difference matters acutely at incident time. When something breaks at 2 AM, the engineer who wrote the code can reason about it. The engineer who accepted the AI's output and moved on often can't. Incident resolution time is up 34% across teams with the highest AI adoption rates.
5. Coordination overhead is invisible in individual metrics
Individual productivity metrics don't capture the cost of coordination. When developers are outputting more code faster, the product managers, architects, and tech leads who need to stay aligned have more to review, de-conflict, and prioritize. That work doesn't show up in commit counts or PR merge times. It shows up in missed deadlines and misaligned features.
The sustainable AI adoption band
Here's the number that actually matters for engineering leaders: the sustainable AI code share appears to sit between 25–40%.
Teams running above 41–42% AI-generated code are showing the degradation patterns above. Teams below 25% are leaving real individual productivity gains on the table. The teams navigating this well — lower incident rates, recovering delivery velocity — are operating in the middle: high AI adoption with active human verification practices layered on top.
What distinguishes the 25–40% range isn't less AI. It's more intentional use:
- Code review checklists that explicitly address AI-generated patterns (off-by-one in generated loops, hallucinated library methods, confident-but-wrong security logic)
- Pair review on complex AI-generated sections, not just linting
- Refactoring sprints budgeted explicitly — even once a quarter — dedicated to consolidating AI-accumulated duplication
- Architectural decision records that capture why, because AI doesn't have that context and won't generate it
What this means for engineering managers
Three things are probably true about your team right now:
Your senior engineers are the bottleneck. Not because they're slow — because they're saturated. Junior and mid-level developers are outputting more code per day. That code flows upward into the same number of senior reviewers who've been reviewing for two years. If your senior engineers are constantly in review, the throughput ceiling isn't AI tooling — it's your code review capacity. Adding more AI tools to junior developers while keeping review bandwidth constant makes this worse.
Your on-call rotation is about to get harder. The 34% increase in incident resolution time isn't random. Engineers are getting paged on code they don't fully understand. The fix isn't to stop using AI — it's to require that developers who accept AI output can explain it before it merges. That sounds obvious. Most teams haven't actually enforced it because the PR queue pressure makes it feel costly.
Your refactoring backlog is growing silently. The 60% drop in refactoring is the most dangerous number in the Faros study because it doesn't surface for months. Duplicated code and increasing complexity accumulate until the codebase becomes hard to reason about — which makes AI output harder to verify — which creates more bugs. Budget refactoring into sprints the same way you budget features. If you don't, your future sprint planning will be doing it for you, in the form of unexplained slowdowns.
What this means for product people
If you're a PM or product leader, the insight is uncomfortable: adding more AI tooling to your engineering team will not straightforwardly increase your delivery throughput.
It might increase PR volume. It will not automatically increase reliable feature delivery.
The lever you actually have is review bandwidth. If you want to capture the gains from AI coding tools at the org level, the investment is in the quality gate — not the generation step. That means senior engineers who do less individual coding and more review and mentoring. It means code review as a first-class activity with time carved in sprint planning. It means post-mortems that explicitly ask "did we understand this code before we shipped it?"
The velocity metrics that feel broken right now? They're not broken because AI made them obsolete. They're broken because you're measuring the wrong thing. You were measuring output — code merged, tickets closed, story points. You need to be measuring outcomes — incident rate, mean time to restore, change failure rate.
Those are the metrics that separate teams where AI adoption is actually working from teams where it's creating the illusion of progress.
The honest summary
AI coding tools are genuinely useful. The developers who use them feel faster, and they are faster — at writing. The problem is that software delivery has never been bottlenecked on writing. It's been bottlenecked on understanding: understanding the problem, understanding the system, understanding whether the code does what you intended.
The tools are real. The individual gains are real. The org-level stagnation is also real. The teams escaping the paradox aren't using less AI. They're building the review and refactoring infrastructure to absorb the extra output without losing reliability.
If you're trying to make AI work at the team level, don't ask "how do we write more code?" Ask "how do we understand more of the code we're shipping?"
The answer to that question doesn't involve a new AI tool. It involves culture, review practices, and the willingness to treat "I accepted the AI's output" as the beginning of the review process — not the end of it.
Sources: Faros AI 2026 Engineering Report · METR Uplift Study Update, Feb 2026 · DX Q1 2026 AI Impact Report · Forrester AI Code Quality Analysis, Dec 2025
Work with me
I consult with engineering teams on AI adoption, cloud architecture, and engineering effectiveness. If this post surfaced a challenge you're facing, let's talk.
Get in touch →Related posts
Explore more on these topics: