Skip to main content
Engineering7 min readMarch 3, 2026

Database Migrations in Production: Zero-Downtime Strategies

How to run database migrations in production without downtime — expand-contract patterns, safe column changes, large table strategies, and rollback plans that actually work.

James Ross Jr.

James Ross Jr.

Strategic Systems Architect & Enterprise Software Developer

Database migrations are where confident developers become nervous. Get them wrong and you have production downtime, data corruption, or a rollback that takes longer than the original migration. Get them right and they are invisible — users never know they happened.

The difference between getting them right and wrong comes down to understanding which operations are safe while the application is running and which are not.

The Core Problem

When you deploy a migration, you have a window where the old application code and the new application code may both be running simultaneously. Old code runs during deployment while new instances come up. Both versions must be able to work with whatever state the database is in.

This constraint rules out some common migration patterns. Adding a NOT NULL column with no default? The old application code does not know to set this value — it will start failing the moment the migration runs. Renaming a column? Old code looks for the old name, which is gone.

The solution is the expand-contract pattern: make database changes that are backward-compatible, deploy the new application code, then remove compatibility shims.

Safe vs Unsafe Schema Changes

Safe operations (can run while application is running):

  • Adding a nullable column
  • Adding a column with a default value
  • Adding a new table
  • Adding an index (with CONCURRENTLY)
  • Adding a foreign key constraint with NOT VALID then VALIDATE CONSTRAINT
  • Dropping a column the application no longer references

Unsafe operations (require downtime or multi-step process):

  • Adding a NOT NULL column without a default (PostgreSQL < 14)
  • Renaming a column
  • Changing a column's type
  • Adding a unique constraint (takes a full table lock)
  • Dropping a column the application currently reads

The Expand-Contract Pattern

Every dangerous schema change becomes safe when decomposed into three phases:

Expand: Add the new structure while keeping the old. Both old and new code can run.

Migrate: Backfill data, update application code to use the new structure.

Contract: Remove the old structure once all code uses the new version.

Example: Renaming a Column

users.full_nameusers.name

Phase 1 — Expand (migration):

ALTER TABLE users ADD COLUMN name TEXT;
UPDATE users SET name = full_name;

Application code continues to write full_name. A trigger keeps name in sync:

CREATE OR REPLACE FUNCTION sync_user_name()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' OR NEW.full_name IS DISTINCT FROM OLD.full_name THEN
    NEW.name := NEW.full_name;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sync_user_name_trigger
BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION sync_user_name();

Phase 2 — Migrate: Deploy application code that reads from name and writes to both. Verify everything works.

Phase 3 — Contract: Deploy code that reads and writes only name. Drop full_name and the trigger.

DROP TRIGGER sync_user_name_trigger ON users;
DROP FUNCTION sync_user_name();
ALTER TABLE users DROP COLUMN full_name;

This process takes more time than a rename in a maintenance window, but it never requires downtime.

Example: Adding a NOT NULL Column

Phase 1 — Add as nullable:

ALTER TABLE orders ADD COLUMN customer_notes TEXT;

Phase 2 — Backfill (for existing rows):

-- Backfill in batches to avoid locking
DO $$
DECLARE
  batch_size INT := 1000;
  last_id BIGINT := 0;
  max_id BIGINT;
BEGIN
  SELECT MAX(id) INTO max_id FROM orders;

  WHILE last_id < max_id LOOP
    UPDATE orders
    SET customer_notes = ''
    WHERE id > last_id AND id <= last_id + batch_size
    AND customer_notes IS NULL;

    last_id := last_id + batch_size;
    PERFORM pg_sleep(0.01); -- Brief pause to reduce I/O pressure
  END LOOP;
END $$;

Phase 3 — Add NOT NULL constraint:

ALTER TABLE orders ALTER COLUMN customer_notes SET NOT NULL;

In PostgreSQL 14+, adding a NOT NULL column with a constant default is safe and instant (the default is stored in the catalog, not written to every row). This eliminates the need for the expand-contract pattern in simple cases:

-- Safe in PostgreSQL 14+: instant, no table rewrite
ALTER TABLE orders ADD COLUMN customer_notes TEXT NOT NULL DEFAULT '';

Adding Indexes Without Locking

A standard CREATE INDEX takes an access share lock that blocks writes for the duration. On a large table, this can take hours and cause production downtime.

Always use CONCURRENTLY in production:

-- This blocks for the entire duration (bad for production)
CREATE INDEX idx_posts_author_id ON posts(author_id);

-- This builds the index without blocking writes (good for production)
CREATE INDEX CONCURRENTLY idx_posts_author_id ON posts(author_id);

CONCURRENTLY takes longer because it makes two passes and can only proceed when there are no conflicting locks. It also cannot run inside a transaction. But it does not block your application.

If a concurrent index build fails partway through, it leaves an invalid index that must be dropped before retrying:

-- Check for invalid indexes
SELECT schemaname, tablename, indexname, indisvalid
FROM pg_indexes
JOIN pg_class ON pg_class.relname = indexname
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
WHERE NOT pg_index.indisvalid;

-- Drop and recreate if found
DROP INDEX CONCURRENTLY idx_posts_author_id;
CREATE INDEX CONCURRENTLY idx_posts_author_id ON posts(author_id);

Migration Management in CI/CD

Structure your deployment pipeline to run migrations before deploying new application code:

# .github/workflows/deploy.yml
deploy:
  steps:
    - name: Run database migrations
      run: prisma migrate deploy
      env:
        DATABASE_URL: ${{ secrets.DATABASE_URL }}

    - name: Deploy application
      run: kubectl rollout restart deployment/api

This order matters. New application code must be able to work with the pre-migration schema (during the migration window) and the post-migration schema. Design your application code to be backward-compatible with the old schema until all instances have updated.

Rollback Planning

Not every migration is reversible. Before running a migration in production, know the answer to "what is my rollback plan?"

For additive changes (new columns, new tables): rollback is trivial — drop what was added.

For destructive changes (dropping columns, type changes): rollback requires either:

  • A database snapshot from before the migration
  • A reverse migration that restores the structure (may lose data added after the forward migration)
  • The expand-contract pattern which avoids the need for rollback

Always take a database snapshot before running a major migration. Most managed databases make this trivial. The 10 minutes to take and verify a snapshot is much cheaper than the hours spent recovering from a failed migration without one.

Testing Migrations

Test migrations in a staging environment that matches production in:

  • Row count (not just schema)
  • Index configuration
  • PostgreSQL version

A migration that takes 2 seconds on a 1,000-row test database might take 45 minutes on a 50-million-row production database. Always run EXPLAIN and estimate time from your staging data volume before running in production.

# Restore production data to staging (anonymized)
pg_dump $PRODUCTION_URL | pg_anonymizer | psql $STAGING_URL

# Test the migration
psql $STAGING_URL -f migration.sql

# Measure execution time on staging data
time psql $STAGING_URL -f migration.sql

Database migrations are not glamorous work, but they are consequential. A disciplined approach — expand-contract for dangerous changes, concurrent index builds, mandatory staging testing, pre-migration snapshots — keeps what should be invisible changes from becoming incidents.


Planning a complex database migration or need help designing a zero-downtime migration strategy? Book a call and let's think through the approach together: calendly.com/jamesrossjr.


Keep Reading