June 11, 2026·5 min read

Why I stopped using feature flags as configuration and started treating them as code

Feature flags stored in external systems create deployment coupling and make debugging production issues nearly impossible.

AI Dev

feature-flags

deployment

debugging

configuration

Why I stopped using feature flags as configuration and started treating them as code

I used to store all my feature flags in external configuration systems like LaunchDarkly, Split, and Unleash. Toggle a switch in the dashboard, and instantly roll out new features or kill problematic code paths without touching the deployment pipeline. It felt like having a remote control for production -- the ability to enable features for specific user segments, run A/B tests, and perform instant rollbacks when things went wrong. The non-technical stakeholders loved being able to control feature rollouts themselves. Marketing could enable the new checkout flow for premium users, and support could quickly disable the problematic notification system when customers started complaining. Then I spent four hours debugging a critical payment processing issue only to discover the problem was a feature flag configuration that had been changed three days earlier by someone who had already left the company.

The breaking point came when investigating why our mobile app's push notifications stopped working for iOS users. The code looked perfect, the logs showed successful API calls, and our monitoring indicated healthy system performance. After diving deep into device-specific debugging and questioning our entire notification infrastructure, I discovered that someone had disabled the ios_push_v2 flag to "temporarily test the fallback system" and forgotten to re-enable it. The flag change was not in our deployment history, not in our code review process, and not tracked in any audit log that the engineering team had access to.

The hidden complexity of external flag systems

External feature flag platforms promise simplicity but introduce a complex secondary deployment system that runs parallel to your code deployments. Every feature flag decision point in your application now depends on a network call to retrieve the current configuration state. This creates an invisible dependency graph where your application's behavior is controlled by data stored outside your version control system.

When debugging issues in staging or development environments, you need to understand not just what code is running, but what flag configuration was active at the time of the problem. Different environments might have different flag states, making it impossible to reproduce issues consistently. I found myself constantly switching between code repositories and flag management dashboards, trying to reconstruct the complete application state.

Here's what a typical flag-dependent function looked like in my old approach:

async function processPayment(order: Order): Promise<PaymentResult> {
  const useNewProcessor = await featureFlags.isEnabled('new_payment_processor', {
    userId: order.userId,
    plan: order.plan
  });
 
  if (useNewProcessor) {
    return processWithStripeConnect(order);
  }
  
  return processWithLegacySystem(order);
}

This code has a runtime dependency on an external system that could fail, return stale data, or have inconsistent state across different application instances.

Configuration drift and debugging nightmares

Feature flags stored externally create configuration drift between environments. Your staging environment might have different flag values than production, making it impossible to accurately test the exact code paths that users will experience. When issues occur in production, you need to know not just what version of the code was deployed, but what flag configuration was active for the specific user experiencing the problem.

I started treating feature flags as immutable configuration that lives alongside the code:

// features/paymentProcessor.ts
export const PAYMENT_PROCESSOR_CONFIG = {
  useStripeConnect: {
    default: false,
    overrides: {
      plan_premium: true,
      plan_enterprise: true
    }
  }
} as const;
 
function shouldUseNewProcessor(order: Order): boolean {
  const config = PAYMENT_PROCESSOR_CONFIG.useStripeConnect;
  
  if (config.overrides[`plan_${order.plan}`]) {
    return config.overrides[`plan_${order.plan}`];
  }
  
  return config.default;
}

This approach makes the feature flag logic deterministic and testable. The configuration is versioned with the code, deployed atomically, and can be reviewed in pull requests.

Code-based flags enable better testing

When feature flags live in code, you can write comprehensive tests for all flag combinations without depending on external systems or network calls. Test environments become perfectly reproducible because the flag configuration is baked into the deployed artifact.

import { describe, test, expect } from 'node:test';
import { shouldUseNewProcessor } from './paymentProcessor';
 
describe('payment processor selection', () => {
  test('uses legacy processor for basic plans', () => {
    const order = { plan: 'basic', userId: '123' };
    expect(shouldUseNewProcessor(order)).toBe(false);
  });
 
  test('uses new processor for premium plans', () => {
    const order = { plan: 'premium', userId: '123' };
    expect(shouldUseNewProcessor(order)).toBe(true);
  });
});

These tests run without network dependencies and give you confidence that flag logic behaves consistently across all environments.

When external flags still make sense

I still use external feature flag systems for true operational controls -- circuit breakers that need to respond to system health, temporary killswitches for problematic features, and genuine experiments where configuration needs to change independent of deployments. But these are the exception, not the default.

The key insight is distinguishing between feature configuration (which version of the checkout flow should run) and operational controls (should we disable checkout entirely because payments are failing). Feature configuration should be deployed with code. Operational controls can live in external systems.

The deployment and rollback reality

Moving feature flags into code does not eliminate the ability to roll back problematic features quickly. Modern deployment pipelines support rapid rollbacks of the entire application state -- code and configuration together. This approach gives you better rollback granularity because you can roll back to a specific combination of code and feature configuration that you know worked together.

When flags live externally, rolling back code does not roll back flag configuration, creating potentially untested combinations of old code with new flag states.

My current approach treats feature flags as code that gets reviewed, tested, and deployed atomically with the application logic they control. This creates more predictable deployments, easier debugging, and better testing coverage -- even if it means I can not toggle features from a web dashboard anymore.