Feature Flags in SaaS: Shipping Safely and Testing in Production

Decoupling Deployment From Release

The most useful shift in thinking about software deployment is separating "deploy" from "release." Deploying means getting code into production. Releasing means making a feature available to users. These can — and often should — happen at different times.

Feature flags are the mechanism that makes this possible. With feature flags, you can deploy code continuously (which keeps branches short and integration conflict-free) while controlling precisely who sees each feature and when. This capability enables safer deployments, controlled rollouts, A/B testing, instant rollbacks, and environment-specific behavior — all without branching complexity.

The Types of Feature Flags

Not all feature flags are the same. Confusing them leads to a messy flag system that accumulates flags that should have been removed months ago.

Release flags. The most common type. A feature is behind a flag while it's in development, turned on for internal testing, then progressively rolled out to users. Once the rollout is complete, the flag should be removed from the code. Release flags are temporary by design.

Operational flags. Controls for system behavior that might need to be adjusted in response to conditions — kill switches for expensive features under load, rate limiting toggles, cache behavior settings. These are permanent flags with long lifespans. They're not releases; they're operational controls.

Experiment flags. A/B test flags that show different variants to different user segments, used to test the impact of a change before full release. These are temporary, and they should have a defined end date: the experiment runs for X weeks, you analyze the results, and the winning variant becomes the default.

Permission flags. Used to gate features by plan or role. "This feature is available on Professional plan and above." These overlap with your billing/permission system and are typically longer-lived than release flags.

Building a Simple Feature Flag System

For an early-stage SaaS, a managed service (LaunchDarkly, Unleash, Flagsmith) is often overkill. A simple in-house implementation works well until you have complex targeting requirements.

Database schema:

feature_flags (
 id, key, -- unique identifier: 'new_dashboard', 'beta_csv_export'
 enabled, -- global on/off switch
 rollout_percentage, -- 0-100, percentage of users to enable for
 description,
 created_at, updated_at
)

Feature_flag_overrides (
 id, flag_id, entity_type, -- 'user', 'organization', 'plan'
 entity_id, enabled,
 created_at
)

Evaluation logic (TypeScript):

async function isEnabled(
 flagKey: string,
 context: { userId: string; organizationId: string; plan: string }
): Promise<boolean> {
 const flag = await getFlag(flagKey)
 if (!flag) return false

 // Check overrides first (highest priority)
 const override = await getOverride(flag.id, context)
 if (override !== null) return override.enabled

 // Global kill switch
 if (!flag.enabled) return false

 // Percentage rollout (deterministic by userId)
 const hash = hashStringToNumber(flagKey + context.userId)
 return (hash % 100) < flag.rollout_percentage
}

The hashing ensures a user gets a consistent experience (always in or always out for a given flag) rather than a random experience on each request.

Caching. Flag evaluations happen on every request for flagged features. Cache flag state aggressively — Redis with a 30-60 second TTL is typical. Stale flags for 30 seconds is a minor inconvenience. Querying the database for every flag on every request is a performance problem.

Progressive Rollout in Practice

The typical rollout progression for a new feature:

Internal only (0%, with overrides for the team). The feature is deployed but visible only to internal users. You're testing that it works in production, not that it works in your local environment.

Alpha (5-10%). A small percentage of users see the feature. You're looking for error rate spikes, performance regressions, and unexpected edge cases. Monitor error reporting and performance dashboards continuously.

Beta (25-50%). Broader exposure. Collect user feedback. Monitor business metrics (are feature users converting at the same rate? Are they churning less?). A/B analysis begins.

General availability (100%). Full rollout. The flag remains in place for one to two sprints as a kill switch, then gets removed from the code.

The "kill switch" period is important. If something goes sideways after a full rollout, you want to be able to disable the feature in 30 seconds (by toggling the flag) rather than deploying a revert. The flag stays in place until you're confident the feature is stable.

Flag Hygiene: The Problem Nobody Talks About

Flags accumulate. Teams add them for releases and then forget to remove them when the release is complete. Two years later, the codebase is full of conditions like if (featureFlags.isEnabled('new_checkout_flow', user)) for a checkout flow that shipped in 2024 and is now the only checkout flow.

Dead flags are technical debt that:

Makes code harder to read (more branching conditions)
Creates confusion about what behavior is actually in production
Occasionally causes real bugs when someone assumes a flag is off that's actually on

Establish a flag cleanup discipline:

Every release flag gets a "remove by" date set in the flag description at creation time
Sprint review includes a check of flags past their remove-by date
Removing a completed flag is a discrete task that gets estimated and assigned

LaunchDarkly and similar tools have built-in flag age tracking and stale flag alerts. If you're building your own system, add a remove_by field to the flag table and build an internal dashboard that surfaces stale flags.

Testing With Feature Flags

Feature flags create a testing complexity: your code now has branches, and each branch needs test coverage. A naive approach is to test only the "flag enabled" path. This leaves the "flag disabled" path untested, which means when you eventually remove the flag and delete the old branch, you didn't know if the removal broke anything.

The right approach: test both states explicitly.

describe('Invoice export', () => {
 it('shows CSV export option when flag is enabled', async () => {
 mockFeatureFlag('csv_export', true)
 render(<InvoicePage />)
 expect(screen.getByText('Export to CSV')).toBeInTheDocument()
 })

 it('hides CSV export option when flag is disabled', async () => {
 mockFeatureFlag('csv_export', false)
 render(<InvoicePage />)
 expect(screen.queryByText('Export to CSV')).not.toBeInTheDocument()
 })
})

Your test utilities need to support mocking flag state. This is a small investment that pays off across every flagged feature.

When to Use a Managed Flag Service

If your team reaches a point where you're running multiple simultaneous experiments, have complex user targeting requirements (enable for users in specific geographies, with specific attributes, on specific account types), or need real-time flag changes without a code deployment, a managed service becomes worth the cost.

LaunchDarkly is the market leader. Unleash is a capable open-source alternative you can self-host. Flagsmith sits between the two in terms of features and price. Any of them will give you a better targeting UI, audit logs, and real-time flag evaluation than a home-built solution.

The migration path from a simple home-built system to a managed service is straightforward if you've abstracted your flag evaluation behind a clean interface — which is another argument for the abstraction layer from the start.

Feature flags are one of the practices that separate teams who ship confidently from teams who treat every deployment as a gamble. If you're building the deployment infrastructure for a SaaS product and want to talk through the approach, book a call at calendly.com/jamesrossjr.