Feature Flags Die in Production

Feature flags are one of the better ideas in modern deployment practice. Ship code behind a flag, enable it for a percentage of users, roll back instantly if something breaks without a deploy. The idea is sound. The execution, at scale, tends to produce something nobody intended: a production system riddled with permanently active conditional branches, each one a small mystery, collectively representing an unknowable amount of implicit state.

The feature flag graveyard is not a hypothetical. If your company is more than two or three years old and has been using feature flags without governance, you almost certainly have one.

The lifecycle of a flag that never dies

Flags are easy to create and hard to delete. That asymmetry is the core of the problem.

Creating a flag takes minutes: define it in your flag service, add a conditional in the code, deploy. The PR is small, easy to review, low risk. Deleting a flag takes coordination: confirm the feature is stable, identify every code path that checks the flag, remove the conditional, clean up the flag service entry, test that nothing regressed. The work is not technically difficult, but it requires confidence that the flag is safe to remove, and that confidence is hardest to establish precisely when it matters most — after the original engineers have moved on.

So flags accumulate. The typical lifecycle: engineer adds flag for a new checkout flow. Feature ships, flag gets enabled for 100% of users. The rollout is declared complete. The flag is not removed because removing it requires a separate PR, and there is always something more urgent. Six months pass. The engineer joins another team. The flag is now a permanent conditional that the codebase accommodates without anyone knowing why. A year later, a new engineer reads the code and asks "what does this flag do?" Nobody knows. Disabling it would be safe, but nobody is certain, so nobody does.

This is how you end up with a flag named enable_new_checkout_flow that has been enabled for 100% of users for fourteen months. The old checkout flow code is still there, reachable only through the disabled branch, tested by no one, drifting further from reality with every change. It is not dead code. It is code that could theoretically run and would produce undefined behavior if it did.

Flags as load-bearing walls

The worse category is not the orphaned flag but the load-bearing flag — the one where disabling it actually does break something, but for a reason that has nothing to do with the feature it was supposed to control.

This happens when flag logic gets entangled with other systems over time. An engineer notices that a certain code path is only active when a flag is enabled, and adds logic that depends on that path being skipped for a different reason. Another engineer uses the flag to guard an unrelated configuration change. By the time someone tries to remove the flag, the conditional is doing three things instead of one, and removing it requires understanding all three.

This is not an imaginary failure mode. The teams that inherit complex codebases with years of accumulated flag debt describe exactly this: flags that cannot be removed because their full effect is not understood, and whose full effect cannot be understood without running the disabled branch in production to see what breaks. The safety tool has become a source of risk.

Why governance feels bureaucratic until you need it

The standard recommendation for flag management is governance: a flag registry, defined expiration dates, an ownership model, a regular audit process. These recommendations are correct and are routinely ignored because they feel like process overhead when your team is small and your flag usage is modest.

The problem is that the governance costs scale linearly but the graveyard costs scale with team size, codebase age, and flag accumulation. By the time governance feels necessary, you already have enough legacy flags that the cleanup cost is significant. Teams that institute governance early pay a small, constant overhead. Teams that skip it pay a large, episodic cleanup cost — and often just decide the cleanup is not worth it, leaving the graveyard intact.

What actually prevents the graveyard

The most effective intervention is making flag removal the default next step after a successful rollout. This requires a few specific practices.

Every flag should have an expiration date set at creation time. Not a soft suggestion — an actual entry in your flag service that triggers a notification when the flag is past its expected lifetime. The engineer who created the flag is responsible for the cleanup unless they've formally handed ownership to someone else. This does not require sophisticated tooling: a column in a database table, a scheduled job that produces a report, someone who is responsible for acting on that report.

Flags should be typed by lifecycle. Operational flags — kill switches, capacity controls, configuration toggles — are permanent by design and should be marked as such. Release flags — the kind used to gradually roll out features — are temporary by design and should have aggressive expiration. Treating both types the same way is how release flags become operational flags by accident.

The cleanup PR should be as easy to write as the creation PR. This is a tooling problem as much as a process problem. If your codebase requires touching twenty files to remove a flag because the conditional is scattered throughout the code, flags will not get removed because the cleanup cost is too high. Flags that are centralized behind a single abstraction point — a flag-checked function call rather than an inline conditional spread across components — are easier to remove. Design for removal at the time you add the flag.

The compounding cost

A codebase with a flag graveyard is harder to work in on every dimension. Test coverage becomes theoretical: the test suite may not exercise disabled branches at all, meaning broken code is silently present. Reasoning about behavior requires tracking flag state, which is external state the code itself does not encode. Onboarding takes longer because new engineers need to learn not just the codebase but the flag registry. Debugging is harder because the behavior of any given request depends on which flags were active for that user at that time, which may not be logged.

None of these costs are catastrophic individually. Together, they represent a consistent drag on development velocity that is hard to attribute to any specific cause — which makes it hard to prioritize fixing.

The fix is not complicated. Flags should be temporary unless explicitly designated otherwise. Removal should be as easy as creation. Someone should own the list. The engineering investment is small. The payoff, compounded over years of not accumulating a graveyard, is significant.

Feature flags work. Feature flag graveyards don't. The difference is whether you treat removal as a first-class part of the lifecycle or as cleanup you'll get to eventually.

The lifecycle of a flag that never dies

Flags as load-bearing walls

Why governance feels bureaucratic until you need it

What actually prevents the graveyard

The compounding cost

Related posts

Subscribe to new posts