Skip to main content
← All posts
15 min read

How to Structure an Engineering Team When AI Writes 41% of the Code

The org chart most teams run was designed when humans wrote all the code. Anthropic's 2026 data says that assumption is gone. Here is what the structure should look like now — and what roles actually matter.

Share

Most engineering teams in 2026 look like this: an engineering manager, two or three seniors, a handful of mids, and a few juniors working their way up.

That structure was designed for 2020. The assumptions underneath it have changed.

According to Anthropic's 2026 Agentic Coding Trends Report, roughly 41% of all code being written today is AI-generated. Engineers spend 60% of their work time with AI in the loop. And 27% of the work getting done in your team right now wouldn't have been attempted at all without AI making it feasible.

The team structure you're running was designed for a world where humans were the bottleneck for code production. That world is over. But the org chart hasn't caught up.


The gap in the numbers

The surface data looks good. TELUS saved 500,000 hours across 57,000 team members using AI coding agents, shipping engineering code 30% faster. Rakuten had Claude Code complete a complex task autonomously in 7 hours on a 12.5-million-line codebase at 99.9% numerical accuracy. Individual PR merge times are down 20%.

So why are production incidents up 23.5% and production failures up 30%?

Faros AI's 2026 study of 22,000 developers found that individual productivity gains aren't compounding to org-level outcomes. Teams are faster. Systems are less reliable. Delivery velocity at the org level is flat.

The answer isn't the tools. It's structural.

When you add AI without changing the structure, you get three specific failure modes:

The intent gap. Agents execute well when told precisely what to build. Most teams are still writing specs the same way they did in 2019. Vague intent multiplied across three concurrent agent sessions produces three times the inconsistency.

The review bottleneck. If an engineer who used to produce 200 lines a day is now producing 800, your senior reviewers need to evaluate 4x as much code. Most teams haven't added review capacity. They've added production bandwidth without adding judgment bandwidth.

The accountability vacuum. In the old model, someone wrote every line. In the new model, the agent wrote the line, the engineer accepted it, and the senior approved the PR. When something breaks at 2am, nobody knows whose mental model was wrong.

These aren't model problems. They're structure problems.


The old structure and why it made sense

The traditional engineering pyramid was a reasonable optimization for a specific bottleneck: human time is scarce, so pack it efficiently.

THE ORG CHART MOST TEAMS STILL RUN Built when human code production was the bottleneck · headcount = output

Juniors wrote first drafts. Seniors reviewed and mentored. The EM removed blockers and set direction. Code output scaled linearly with headcount. The ratio that made sense: roughly 1 senior per 3-4 juniors, 1 EM per 6-8 engineers.

Everything in that model optimized for "how fast can humans produce code."

In 2026, that constraint is effectively gone. Agents produce code faster than any human. What's left as the human constraint is different:

  • Clear intent: Can you define what you're building precisely enough for an agent to execute correctly?
  • Judgment under ambiguity: When the agent produces something plausible but wrong, can you recognize it?
  • System-level trust: Across a codebase with 41% AI-generated code, can you trust the whole thing — not just the parts you touched?

These are different skills. The org chart should reflect them.


What the work actually looks like now

Here's a composite of how well-structured teams at TELUS, Zapier, and Fountain describe their actual engineering workflows in Anthropic's report.

An engineer starts the day with three concurrent agent sessions. One is processing a feature spec. One is working through a bug in the auth layer. One is writing test coverage for a module approved last week. The engineer isn't writing any of that code — they're reviewing what agents produce, pushing back when output doesn't match intent, and escalating decisions that require judgment the agent can't have.

A good engineer in this model does three things:

  1. Writes specs precisely enough that agent output doesn't require a full rewrite
  2. Reads agent output critically — not line by line, but for intent match, edge cases, and hidden assumptions
  3. Makes trust calls — "this is good enough to ship" vs "this is plausible but I don't trust it"

This is less "developer" and more "technical editor + air traffic controller + system architect" in one role.

The old structure doesn't develop or reward these skills. It rewards writing code fast. Those are not the same thing anymore.


The structure that actually works

Here is how I would build a 10-person engineering team today, designed for the actual bottlenecks.

THE STRUCTURE THAT FITS 2026 Designed around the actual bottlenecks: intent, judgment, and system trust

WORK FLOWS DOWN ↓

Here is how I explain each layer.


The three layers, defined

The Intent Layer (2 people)

One EM and one person whose primary output is spec quality. Their output isn't code — it's clarity. They own the problem definition, the acceptance criteria, the constraints every agent session runs against.

In the old model, this was handled informally by whoever had the most context. That worked when specs only had to be good enough for a human developer who could ask follow-up questions. It doesn't work when the agent executing the spec can't ask follow-ups and will produce plausible-but-wrong output if the intent was ambiguous.

The Spec Lead isn't a PM role. It's an engineering role. The person needs to understand implementation constraints, edge cases, and failure modes — because agents will exploit every underspecified assumption in the spec.

The Orchestration Layer (3-4 people)

These are your engineers doing the actual work. But "the work" is no longer primarily writing code. It's running agent sessions, reviewing output, maintaining context across a codebase that is 41% AI-generated, and making the trust call: "does this output match the intent, and do I trust it enough to send it to validation?"

The skill that matters here is reading code, not writing it. Specifically, reading AI-generated code with calibrated skepticism — understanding what the agent was trying to do, where it likely got it right, and what categories of errors it's prone to making. This is exactly the shift described in Reading Code Is the Bottleneck Now.

The mid-to-senior career path runs through this layer. Juniors earn seniority by developing judgment, not by producing code. That means more time reviewing and less time executing.

The Validation Layer (2 people)

One person owns system-level trust. Not line-by-line review — that already happened in orchestration. This is cross-cutting: do the security patterns hold across the whole codebase? Are the data flows consistent? Are there emergent architectural problems that nobody saw because they were each looking at their own agent sessions?

The second person owns eval design. This is the piece most teams are missing entirely. Behavioral testing for AI-generated code is different from unit testing. You're not checking that a function returns the right value on known inputs — you're checking whether the system behaves correctly across the space of realistic inputs that the agent may have subtly optimized for. If you don't have this role, you're finding your eval failures in production.

The ratio: 2 : 3 : 2 instead of 2 : 3 : 5. Fewer people, more distinct functions, no role that exists purely to produce code.


What to do with this if you're running a team

Audit where your review capacity actually is. If your individual output has tripled with AI tools but your senior review hours haven't changed, you have a structural deficit. That gap is where your incidents are coming from. The fix isn't slowing down production — it's investing in validation proportionally to how fast production has gotten.

Redesign the spec process before the agent process. Most teams jumped straight to "how do we use AI to build faster" without asking "how do we define what to build clearly enough for AI to build correctly." Bad specs get multiplied, not smoothed out, when agents execute them. Fix upstream first.

Stop hiring juniors to fill production bandwidth. That bandwidth now costs effectively zero — agents provide it. Hire juniors to develop judgment: reviewing agent output, learning to orchestrate before they can architect, building the reading-and-trust-call muscle that is the actual senior skill in 2026. Give them more review responsibility, not more execution responsibility.

Name the Orchestrator role explicitly. Not for the job posting — internal clarity. Senior engineers need to know that their job is now 60% reviewing, orchestrating, and maintaining context, and 40% building. If you don't name it, you'll keep hiring and evaluating for the old profile. You'll select people who want to write code, and then wonder why they're frustrated when the agents write the code instead.

Create the Eval Lead role before your incident rate creates it for you. Every team I've seen without a dedicated eval function discovers the gap the same way: a plausible-looking failure in production that passed all the tests. Tests check correctness on known inputs. Evals check behavioral fidelity across the realistic input space. These are different problems.


The career angle (if you're an engineer, not a manager)

The engineers who will have leverage in three years are the ones who can do two things the agent can't:

Define the problem precisely. Not requirements gathering — the ability to take an ambiguous business goal and decompose it into specifications tight enough that an agent can execute without introducing subtle inconsistencies. This is an architectural skill, not a writing skill. It requires understanding implementation constraints before you start specifying.

Make trust calls at scale. Across a codebase with thousands of AI-generated commits, the engineer who can quickly assess whether a module is trustworthy — not by reading every line but by understanding its intent, its edge cases, and the failure modes of the agent that produced it — is genuinely rare. That skill is hard to develop and almost impossible to fake.

Both of these skills come from reading more and generating less. Ironically, the best thing junior engineers can do for their career in 2026 is spend less time with AI generating code for them and more time reviewing and critically evaluating AI-generated code from others.


The uncomfortable conclusion

The teams that will struggle most in the next 18 months are the ones that adopted AI tools at the individual level without restructuring at the org level. They'll have faster engineers producing more output with less accountability. They'll have incident rates climbing and no structural explanation for why.

The org chart isn't an HR formality. It encodes assumptions about where the work happens, where judgment lives, and where failures get caught.

41% of your code is now AI-generated. That's not a feature flag. That's a structural change. The structure should reflect it.


Data sources: Anthropic 2026 Agentic Coding Trends Report · Faros AI 2026 Developer Productivity Study · Pragmatic Engineer: Impact of AI on Software Engineers 2026

Work with me

I consult with engineering teams on AI adoption, cloud architecture, and engineering effectiveness. If this post surfaced a challenge you're facing, let's talk.

Get in touch →

Explore more on these topics:

Subscribe to new posts

Get an email when I publish something new. No spam, unsubscribe any time.