SaaS Data Migration: Moving Customers Without Downtime

The Stakes Are Higher Than You Think

Data migration in a SaaS application isn't the same as migrating a single application's database. You're moving data for dozens, hundreds, or thousands of customers who are actively using your product. Each customer's data has its own consistency requirements, its own volume characteristics, and its own tolerance for downtime.

A botched migration in a single-tenant application affects one customer. A botched migration in a multi-tenant SaaS affects everyone simultaneously. And in SaaS, "migration" happens more often than people expect — schema changes, database moves, infrastructure upgrades, tenant isolation changes, and customer imports all involve moving data while the product is live.

The techniques for doing this safely are well-established, but they require discipline and planning that's easy to skip when you're under pressure to ship.

The Expand-Contract Pattern

The safest approach to schema migration in a live system is the expand-contract pattern, sometimes called parallel change. It works in three phases.

Expand. Add the new schema alongside the old one. If you're renaming a column, add the new column without removing the old one. If you're restructuring a table, create the new table without dropping the old one. Deploy code that writes to both the old and new locations but reads from the old location. This phase is entirely backward-compatible.

Migrate. Backfill the new schema with data from the old schema. This can run as a background job, processing records in batches to avoid overwhelming the database. For large datasets, this phase might take hours or days. Because the application is writing to both locations, new data is already in the new schema — you only need to backfill historical data.

Contract. Once all data is in the new schema and verified, switch reads to the new location. After a confidence period where you monitor for issues, remove the writes to the old location and drop the old schema.

This pattern adds engineering effort compared to a simple ALTER TABLE in a maintenance window, but it eliminates downtime entirely. For a SaaS product where customers are in different time zones and there's no good time for a maintenance window, it's the only responsible approach.

The pattern applies to more than just database schemas. Migrating between multi-tenant architecture patterns — from shared tables to schema-per-tenant, for example — follows the same expand-contract approach at a larger scale.

Batch Processing and Backpressure

The backfill phase of a migration is where things most commonly go wrong. You have a background job processing millions of rows, and it needs to complete in a reasonable time frame without degrading the performance of the live application.

Batch size matters. Processing one row at a time is too slow for large datasets. Processing a million rows at once locks the database. The right batch size depends on your database, your row size, and your query complexity, but 500-2000 rows per batch is a reasonable starting point.

Backpressure is essential. Your migration job should monitor database performance — query latency, connection pool use, replication lag — and automatically slow down or pause when the database is under pressure. A migration that completes in four hours is better than one that completes in two hours but causes a production incident.

Idempotency is mandatory. Migration jobs fail. Servers crash. Network connections drop. Your migration must be resumable, which means each batch operation must be idempotent — processing the same batch twice should produce the same result. Track progress with a cursor or checkpoint rather than assuming the job runs start to finish without interruption.

Validation runs in parallel. Don't wait until the migration is complete to validate the data. Run validation checks continuously during the migration, comparing source and destination data for each completed batch. Catching errors during migration is vastly easier than catching them after the old data has been dropped.

Customer Data Import as Migration

Beyond internal schema changes, SaaS products frequently need to import customer data from external systems. A new enterprise customer switching from a competitor or from spreadsheets needs their historical data in your system, and the quality of that import experience significantly affects their perception of your product.

The principles are the same as internal migration — batch processing, validation, idempotency — but with additional concerns. External data is messy. Formats are inconsistent. Required fields are missing. Duplicates exist. Relationships between records may be implicit rather than explicit.

Build a data import pipeline that separates parsing, validation, transformation, and loading into distinct stages. The validation stage should produce a detailed report of issues — missing fields, format errors, potential duplicates — that the customer can review and resolve before the data is committed. Never silently drop or modify records during import.

For multi-tenant platforms, customer imports also need tenant isolation verification. Every imported record must be tagged with the correct tenant identifier, and the import pipeline must enforce that no record can reference data belonging to a different tenant.

Rollback Strategy

Every migration needs a rollback plan, and "restore from backup" is not a rollback plan for a live SaaS product. Restoring from backup means losing every change made by every customer since the backup was taken.

The expand-contract pattern provides a natural rollback mechanism. During the expand phase, the old schema is still active and receiving writes. If problems are discovered during the migrate or contract phases, you can switch reads back to the old schema without data loss.

For more complex migrations, maintain a reverse migration script that can undo the data transformation. Test it on a copy of production data before running the forward migration. And set clear decision criteria for when to roll back — don't wait for a full post-mortem to decide that the migration is failing.

The ability to migrate data safely and without downtime is a capability that scales in importance as your customer base grows. Investing in the tooling and patterns early pays compounding returns as your platform matures.