Zero-Downtime Deployments: Strategies and Implementation

Downtime during deployment is a solved problem for most applications. The techniques exist, the tooling is mature, and the cloud platforms make it straightforward. Yet I still encounter teams that accept 30-second outages during every release because "it is just a quick restart" or "we deploy during off-hours." Those seconds add up across multiple daily deployments, and off-hours are never truly off-hours for a global user base.

Zero-downtime deployment is not about perfection — it is about keeping the application available to users during every release, even when the release includes database schema changes and backend API changes.

Rolling Updates

The most common zero-downtime strategy is a rolling update. Instead of stopping all instances, deploying, and starting them again, you update one instance at a time while the others continue serving traffic. The load balancer routes requests to healthy instances and stops routing to instances that are being updated.

# kubernetes deployment with rolling update strategy
spec:
 replicas: 3
 strategy:
 type: RollingUpdate
 rollingUpdate:
 maxSurge: 1
 maxUnavailable: 0
 template:
 spec:
 containers:
 - name: app
 readinessProbe:
 httpGet:
 path: /health
 port: 3000
 initialDelaySeconds: 5
 periodSeconds: 10

The maxUnavailable: 0 setting is critical — it tells Kubernetes to never have fewer running instances than the desired count. maxSurge: 1 allows one extra instance during the rollout. The rolling update creates a new pod, waits for it to pass its readiness probe, then terminates an old pod. This cycle repeats until all pods run the new version.

Without the readiness probe, Kubernetes considers a pod ready as soon as it starts, which can route traffic to an instance that has not finished initializing. The probe verifies the application is actually serving requests before it receives traffic. This principle applies equally to Docker-based deployments and bare-metal setups.

Health Checks and Readiness

A health check endpoint is not the same as a readiness check, and conflating them causes deployment problems.

Liveness checks answer "is the process alive?" — they verify the application has not crashed or deadlocked. A liveness check that fails triggers a restart.

Readiness checks answer "can this instance handle traffic?" — they verify the application has completed initialization, connected to the database, warmed caches, and loaded configuration. A readiness check that fails removes the instance from the load balancer but does not restart it.

// Health endpoint with separate liveness and readiness
app.get('/health/live', (req, res) => {
 res.status(200).json({ status: 'alive' })
})

App.get('/health/ready', async (req, res) => {
 try {
 await db.query('SELECT 1')
 await cache.ping()
 res.status(200).json({ status: 'ready' })
 } catch {
 res.status(503).json({ status: 'not ready' })
 }
})

The readiness check should test the dependencies your application actually needs. If your application cannot serve requests without a database connection, check the database. If it can serve some requests from cache, the database check might belong in a degraded-mode check rather than the readiness check.

Connection Draining

When an instance is being removed from the load balancer, it may still have active requests in progress. Connection draining — also called graceful shutdown — gives those requests time to complete before the instance terminates.

process.on('SIGTERM', () => {
 // Stop accepting new connections
 server.close(() => {
 // All existing connections have finished
 process.exit(0)
 })

 // Force shutdown after timeout
 setTimeout(() => {
 process.exit(1)
 }, 30_000)
})

The SIGTERM signal is what Kubernetes (and most orchestrators) sends before terminating a pod. The application stops accepting new connections, finishes processing existing requests, then exits cleanly. The 30-second timeout is a safety net for requests that take unexpectedly long.

In Kubernetes, the terminationGracePeriodSeconds setting controls how long the orchestrator waits before sending SIGKILL. Set it to at least the duration of your longest expected request, plus buffer. If your API has endpoints that take 60 seconds to process, set the grace period to 90 seconds.

Load balancers need configuration too. AWS Application Load Balancer has a deregistration delay (default 300 seconds, often reduced to 30-60 seconds) that controls how long it waits before stopping traffic to a deregistered target. Without this, the load balancer might send new requests to an instance that has already started its shutdown sequence.

Database Migrations Without Downtime

Database schema changes are the hardest part of zero-downtime deployment because you cannot update the database and the application code simultaneously. During a rolling update, old and new application versions run concurrently, and both must work with the current database schema.

The rule: every migration must be backward-compatible with the previous application version.

Adding a column — safe. The old application ignores columns it does not know about.

Removing a column — unsafe if done in one step. First deploy code that stops reading the column. Then deploy a migration that removes it. Two separate releases.

Renaming a column — never do this in one step. Add the new column, deploy code that writes to both and reads from the new one, migrate data, deploy code that only uses the new column, then remove the old column. This is three releases minimum.

-- Step 1: Add new column (deploy with code that writes to both)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

-- Step 2: Backfill data (run as a background job)
UPDATE users SET display_name = username WHERE display_name IS NULL;

-- Step 3: After code only reads from display_name, drop old column
ALTER TABLE users DROP COLUMN username;

This expand-contract pattern is the foundation of zero-downtime database changes. It is more work than a single migration, but it eliminates the window where the application and database are out of sync. The same principles apply whether you use infrastructure as code for managing your database or run migrations manually — the migration strategy itself is what matters.

Feature flags complement this pattern by letting you deploy the new code behind a flag, verify it works with the new schema, then enable it for all users. The flag adds a control point between deployment and activation that makes schema changes less risky.

Zero-downtime deployment is a discipline more than a technology. The tools make it possible; the discipline of backward-compatible changes, proper health checks, and graceful shutdowns makes it reliable. Once established, it changes the team's relationship with deployment from a high-stakes event to a routine activity.