Feature Flags Are Architecture, Not Toggles

Your codebase has 200 feature flags. Half of them haven't been read in a year. The other half have unclear semantics. New engineers are afraid to touch them. Old engineers can't remember what enableNewBillingFlow_v2 actually controls.

This is the natural endpoint of feature flags treated as toggles. They become permanent if/else branches that calcify into architectural debt.

The fix isn't fewer feature flags. It's understanding that feature flags are an architectural choice — and treating them like one.

What flags are actually for

Feature flags solve four distinct problems. Each has a different lifetime and discipline:

Release flags — decouple deploy from release. Code is in production but disabled until ready. Lifetime: days to weeks.
Experiment flags — A/B test variants. Lifetime: weeks to months.
Operational flags — kill switches, throttles, circuit breakers. Lifetime: permanent (but rarely flipped).
Permission flags — enabled for some customers/plans. Lifetime: permanent (this is product configuration, not really a flag).

The problem starts when you don't separate these. A "flag" is treated as one type, but really fills several different roles. Cleanup discipline differs.

The cleanup rule that works

For release and experiment flags: every flag has an expiration date.

When you create one, write it in the code:

// EXPIRES: 2026-06-01
// OWNER: @doronmak
// PURPOSE: Roll out new pricing engine. Remove after 100% rollout.
if (await flags.enabled('new_pricing_engine_v2', user)) {
  return computePriceV2(order);
}
return computePriceLegacy(order);

Add a CI check that scans for expired flags and fails the build. Now flags can't outlive their purpose by years.

For operational flags: explicit naming. kill_switch_*, circuit_breaker_*. Permanent by design. Reviewed quarterly.

The two-stage rollout pattern

A release flag should follow this lifecycle:

Add flag, default off. Deploy. Code is in production but inactive.
Enable for internal users. Smoke test in production with low risk.
Enable for 1% of users. Monitor metrics for 24 hours.
Ramp 5% → 25% → 50% → 100% over days, with checkpoints.
Default to on, flag inert. Mark for removal.
Remove flag and old code path. PR to delete.

Step 6 is the one teams skip. The flag becomes permanent.

The discipline that fixes this: the same engineer who added the flag is responsible for removing it. Auto-create a follow-up ticket on day one with the expiration date.

Why flag explosion is dangerous

A codebase with 200 stale flags has these problems:

Untested combinations. With 20 flags each having on/off, you have 1M possible configurations. Your tests cover three. Production has the other 999,997.

Performance death. Every flag eval is a network call (or a memory read with deserialization). 50 flag evals per request × 10k req/sec = 500k flag evals/sec. Add monitoring overhead. Now you've got a latency problem.

Onboarding cliff. New engineers see enableFooBarV3 and don't know if it's safe to remove or load-bearing. They leave it. The graveyard grows.

Lost rollbacks. "We used to be able to flip this flag and revert. Now half the codebase assumes it's true."

The flags-as-architecture mindset

Treat each flag as a first-class architecture decision. That means:

Documentation in the code (purpose, owner, expiration)
Evaluation: how is this flag tested? What's the off path? What's the on path?
Cleanup plan: what gets deleted when this flag is removed?

If you can't answer those questions, don't add the flag.

For operational flags (kill switches), document the trigger conditions:

// PERMANENT — kill switch for outbound webhooks
// FLIP IF: webhook delivery rate drops below 50%, or upstream returns >10% 5xx
// FLIPS BACK: when @ops confirms upstream healthy
if (await flags.enabled('kill_switch_webhooks')) {
  return queueForLaterDelivery(payload);
}

The runbook for "what to do if webhooks are broken" includes "flip the kill switch." It's documented at the flag site.

What good flag tooling does

Most homegrown flag tools are bad. Use a real one (LaunchDarkly, Statsig, Unleash, ConfigCat). What you want:

Audit trail of who flipped what when
User targeting by attributes, not just user ID
Percentage rollouts with sticky bucketing
Default values if the flag service is down (fail-safe)
SDK with local cache to avoid network on every check
Code references — "where in the codebase is this flag read?"
Stale flag detection — flags untouched for N days

If your flag tool doesn't surface stale flags, it's not helping you avoid the trap.

The cost-benefit recalibration

Feature flags have real costs (complexity, performance, cleanup overhead). They're worth it for:

Risky changes you want to roll back fast
Gradual rollouts to reduce blast radius
A/B tests that need real measurement
Kill switches for known-fragile dependencies

They're not worth it for:

"I'll add a flag in case we need to roll back" — no concrete plan to use it
Cosmetic changes — just deploy
Internal admin features — just ship

Be picky. Every flag added without a concrete plan is a flag that becomes permanent debt.

The takeaway

Feature flags are powerful and dangerous. Treat them as architecture: each one with a purpose, an owner, and an expiration. Add CI to enforce cleanup. Distinguish release flags (temporary) from operational flags (permanent). Without this discipline, your codebase fills with toggles nobody remembers.