The Security Bill for Vibe Coding Is Coming Due
Georgia Tech tracked 35 CVEs from AI-generated code in March 2026 alone — more than all of 2025 combined. Here's what the data says, why it's happening, and what a secure AI workflow actually looks like.
Georgia Tech's Vibe Security Radar tracked 35 CVEs from AI-generated code in March 2026 alone — more than all of 2025 combined. If you missed that study, it was published two weeks ago and it should change how you think about your AI-assisted development workflow.
We spent 2025 optimizing for speed. The security bill is arriving.
The data that should alarm you
The headline number is jarring, but the pattern underneath it is more useful than the count. Researchers at Georgia Tech analyzed thousands of AI-generated code samples and found:
- 45% of AI-generated code contains security vulnerabilities
- Misconfigurations are 75% more common in AI-generated code than human-written code
- Logic errors — incorrect dependencies, flawed control flow, missing null checks — are the dominant failure mode
- Across the industry, pull requests per developer increased 20% with AI adoption, but incidents per PR increased 23.5%
That last one is the important ratio. We're shipping more, faster, and breaking production at a higher rate per unit of output. Velocity metrics look great. Incident metrics are quietly getting worse.
It's not random bugs — it's a specific failure signature
The distribution of AI security bugs is not random, which means it's predictable and therefore preventable. Three categories dominate:
Missing or misconfigured authorization. The model knows to add authentication middleware, but it doesn't always thread it consistently through every route. It writes the check; it doesn't always wire it. This is how you get endpoints that look secured in the happy path and are wide open to direct access.
Overly permissive configurations. AI tends toward working-not-minimal. It will configure CORS to *, leave debug endpoints reachable in production, or open storage buckets to public read because that makes the feature function. The intent to lock it down later doesn't make it into the diff.
Trust boundary confusion. The model has no intuitive sense of what's internal vs external, what should be validated vs trusted. It will validate user input in one place and pass it unsanitized to a downstream call three layers deep.
None of these are subtle zero-days. They're the same category of mistakes a rushed junior engineer makes — except the AI makes them at the speed of generating text, across every file it touches.
The incidents that made this concrete
Two production incidents from 2025 that got less coverage than they deserved:
Tea App (July 2025): A women's dating safety app — of all the use cases — left Firebase storage completely open. 72,000 images exposed, including 13,000 government ID photos. The cause: AI-generated backend code where the storage rules were never locked down. The security configuration was copy-pasted from a tutorial state and never hardened.
Lovable Platform (May 2025): Missing Row Level Security on Supabase tables resulted in full database exposure. The tables were created, the data was there, the access policies were not. The model built the feature; it didn't build the boundary around it.
Both are textbook examples of the overly-permissive configuration failure mode. Both were caught by external researchers rather than internal review.
The management blind spot
Most engineering teams have a dashboard that tracks deployment frequency, lead time for changes, and cycle time. These are the DORA metrics — the industry-standard proxy for engineering productivity. AI coding tools have improved all of them.
What those dashboards don't track: security debt accumulation rate, misconfiguration surface area, or the percentage of AI-generated code that received meaningful review before merge. These aren't in most team's OKRs because they're harder to count and the consequences are lagging by months.
The structural problem is that speed is visible immediately and security failures are visible only when they materialize. A team can run excellent DORA metrics for six months while quietly accumulating a storage exposure that surfaces when someone decides to look.
Only 5.5% of organizations are seeing real financial returns from their AI investments despite near-universal adoption. The gap between tool adoption and actual value is real, and security debt is a major component of what's hiding in that gap.
A secure AI coding workflow
The answer is not to stop using AI coding tools. The productivity gains are real and the competitive pressure is real. The answer is to treat AI output like you'd treat output from a fast, confident contractor who has never worked in your specific threat model before.
Here is the review layer most teams are missing:
┌─────────────────────────────────────────────────────────────────┐
│ SECURE AI CODING WORKFLOW │
└─────────────────────────────────────────────────────────────────┘
PROMPT PHASE REVIEW PHASE SHIP PHASE
───────────── ───────────── ──────────
┌──────────┐ ┌──────────────┐ ┌─────────┐
│ Define │ │ Human diff │ │ CI │
│ threat │──► AI Agent ──► │ review with │──► ───► │ SAST │
│ model │ │ security │ │ scan │
│ first │ │ checklist │ │ │
└──────────┘ └──────┬───────┘ └────┬────┘
│ │
┌─────▼──────┐ ┌────▼────┐
│ Automated │ │ Deploy │
│ secrets │ │ with │
│ scan │ │ runtime │
│ (local) │ │ WAF │
└────────────┘ └─────────┘
KEY CHECKPOINTS:
① Before prompting: write the trust boundaries down
② After AI output: read authorization paths explicitly
③ Before merge: run semgrep or equivalent locally
④ In CI: block on SAST failures, not just test failures
⑤ In production: runtime misconfiguration detection
The most important checkpoint is ①. If you don't define the trust model before you prompt, the AI has no way to infer it. "Build me an API that does X" will produce something that does X. Whether it does X only for authorized callers with validated input is a different question, and the model won't ask it unless you make it part of the task definition.
The security review prompt I actually use
When I'm using a coding agent for anything touching auth, data access, or external integrations, I add this to the task:
Before writing any code: list the trust boundaries this feature crosses. For each external input, specify what validation occurs and where. For each data access, specify what authorization check gates it. Then implement with those constraints explicit.
It adds thirty seconds to the prompt. It consistently catches the class of bug that makes it into production otherwise. The model is good at reasoning about security when you make security part of the task — it just doesn't default to it.
What this means if you're a manager
Three things worth making explicit on your team:
Track the review rate on AI-generated code. Not the volume of AI-assisted PRs — the percentage where a human actually read the diff with security intent, not just functional intent. These are different reads.
Add a security gate to your AI workflow. semgrep --config auto runs in seconds. Trufflehog for secrets. Make these blocking in CI, not advisory. The false positive rate is manageable; the false negative cost is not.
Define what "done" means for AI-generated code. Most teams have a definition of done that dates from before AI-assisted development was the norm. It almost certainly doesn't include "authorization paths verified" or "configuration reviewed against minimal-privilege baseline." Update it.
The optimism buried in the data
Here's the part most of the coverage missed: the Georgia Tech finding that 45% of AI-generated code has vulnerabilities is alarming, but it also means 55% doesn't. The distribution isn't uniform — it clusters around identifiable patterns. The mistakes are learnable. The review checklist is finite.
We're not in a situation where AI code is fundamentally untrustworthy. We're in a situation where we adopted a powerful tool without updating our review process to match it. That's fixable.
The companies that figure out the secure AI workflow in 2026 will ship faster and safer than competitors who either slow down or don't look. That combination is the actual competitive advantage — not the raw speed, which everyone has access to now.
Statistics in this post are sourced from Georgia Tech's Vibe Security Radar (April 2026), Stack Overflow Engineering Blog's incident analysis (January 2026), and InfoQ's AI technical debt report (November 2025). The Tea App and Lovable incidents were reported by multiple outlets in 2025; the AI Flooding Close Projects piece covers the broader open-source fallout.
Work with me
I consult with engineering teams on AI adoption, cloud architecture, and engineering effectiveness. If this post surfaced a challenge you're facing, let's talk.
Get in touch →Related posts
Explore more on these topics: