Background Jobs in Node.js: Queues, Workers, and Failure Recovery
A complete guide to background job processing in Node.js — BullMQ, job queues, worker processes, priority queues, rate limiting, and the failure recovery patterns that matter in production.

James Ross Jr.
Strategic Systems Architect & Enterprise Software Developer
Every production application eventually needs to do work outside of the request-response cycle. Email sending, PDF generation, image processing, webhook delivery, data imports, report generation — these are all things you should not block a user's request waiting for. Background jobs are how you handle them.
The challenge is not adding a job queue — it is building one that behaves correctly when things go wrong: workers crash, jobs fail, the database is temporarily unavailable, or the queue gets backed up. This guide covers the patterns that handle those situations correctly.
Why Queues, Not Just setTimeout
The temptation is to offload work with setTimeout or setImmediate. This breaks in several ways:
- Process restarts lose all in-flight work
- No visibility into job status or failures
- No retry logic for transient failures
- No rate limiting for external API calls
- No concurrency control for resource-intensive operations
A proper job queue stores jobs durably, tracks their state, provides retry logic, and gives you visibility into what is happening.
BullMQ: The Standard Choice
BullMQ (backed by Redis) is my default for Node.js job queues. It is mature, TypeScript-first, and handles the edge cases correctly.
npm install bullmq ioredis
Defining Jobs With Type Safety
// types/jobs.ts
export interface EmailJob {
to: string
subject: string
template: 'welcome' | 'reset-password' | 'invoice'
data: Record<string, unknown>
}
export interface PdfJob {
reportId: string
userId: string
format: 'pdf' | 'xlsx'
}
export interface ImageProcessingJob {
imageId: string
operations: Array<{
type: 'resize' | 'crop' | 'convert'
params: Record<string, unknown>
}>
}
export type JobData = {
email: EmailJob
pdf: PdfJob
'image-processing': ImageProcessingJob
}
Setting Up Queues
// queues/index.ts
import { Queue } from 'bullmq'
import { redis } from '../lib/redis'
import type { JobData } from '../types/jobs'
function createQueue<K extends keyof JobData>(name: K) {
return new Queue<JobData[K]>(name, {
connection: redis,
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 2000, // Start at 2s, then 4s, 8s
},
removeOnComplete: { count: 100 }, // Keep last 100 completed
removeOnFail: { count: 500 }, // Keep last 500 failed for debugging
},
})
}
export const emailQueue = createQueue('email')
export const pdfQueue = createQueue('pdf')
export const imageQueue = createQueue('image-processing')
Adding Jobs
// In your API handlers
await emailQueue.add('send-welcome', {
to: user.email,
subject: 'Welcome to the platform',
template: 'welcome',
data: { name: user.name, activationUrl: `https://app.com/activate/${token}` },
})
// Priority jobs (lower number = higher priority)
await emailQueue.add(
'send-password-reset',
{
to: user.email,
subject: 'Reset your password',
template: 'reset-password',
data: { resetUrl: `https://app.com/reset/${token}` },
},
{ priority: 1 } // Process before normal priority jobs
)
// Delayed jobs
await emailQueue.add(
'send-trial-expiry-warning',
{ to: user.email, template: 'trial-expiry', data: {} },
{ delay: 7 * 24 * 60 * 60 * 1000 } // 7 days from now
)
// Scheduled recurring jobs
await emailQueue.add(
'weekly-digest',
{ to: user.email, template: 'weekly-digest', data: {} },
{ repeat: { cron: '0 9 * * 1' } } // Every Monday at 9am
)
Worker Implementation
// workers/email.ts
import { Worker, UnrecoverableError } from 'bullmq'
import { redis } from '../lib/redis'
import type { EmailJob } from '../types/jobs'
const emailWorker = new Worker<EmailJob>(
'email',
async (job) => {
const { to, subject, template, data } = job.data
job.log(`Sending ${template} email to ${to}`)
await job.updateProgress(10)
// Render the email template
const html = await renderTemplate(template, data)
await job.updateProgress(40)
// Send via your email provider
await sendEmail({ to, subject, html })
await job.updateProgress(100)
return { sentAt: new Date().toISOString() }
},
{
connection: redis,
concurrency: 10, // Process 10 emails simultaneously
limiter: {
max: 100, // Max 100 jobs per interval
duration: 1000, // Per second
},
}
)
// Handle worker events
emailWorker.on('completed', (job) => {
console.log(`Job ${job.id} completed`)
})
emailWorker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message)
// Alert on final failure (all retries exhausted)
if ((job?.attemptsMade ?? 0) >= (job?.opts.attempts ?? 1)) {
console.error(`Job permanently failed after ${job?.attemptsMade} attempts`)
// Send alert to your monitoring system
}
})
emailWorker.on('error', (err) => {
console.error('Worker error:', err)
})
Non-Retryable Errors
Some failures should not be retried. If an email address is permanently invalid or a user does not exist, retrying wastes resources and clutters your failed job logs.
import { UnrecoverableError } from 'bullmq'
async (job) => {
const user = await db.user.findUnique({ where: { id: job.data.userId } })
if (!user) {
// Throw UnrecoverableError to skip retries
throw new UnrecoverableError(`User ${job.data.userId} not found`)
}
if (user.emailBounced) {
throw new UnrecoverableError(`Email bounced for user ${user.email}`)
}
// ... proceed with sending
}
Job Progress and Logging
Progress tracking gives you visibility into long-running jobs:
async (job) => {
const rows = await db.select().from(users).where(eq(users.status, 'active'))
const total = rows.length
for (let i = 0; i < rows.length; i++) {
await processUser(rows[i])
// Update progress
await job.updateProgress(Math.round((i / total) * 100))
// Log to job's log (visible in Bull Board)
if (i % 100 === 0) {
job.log(`Processed ${i}/${total} users`)
}
}
}
Priority Queues
For applications with multiple job types competing for worker resources, use priority:
const reportQueue = new Queue('reports', {
connection: redis,
defaultJobOptions: { priority: 10 }, // Default priority
})
// VIP customer report: high priority
await reportQueue.add(
'generate-report',
{ customerId, reportType },
{ priority: 1 } // Lower number = higher priority
)
// Background analytics: low priority
await reportQueue.add(
'generate-analytics',
{ period: 'monthly' },
{ priority: 100 }
)
BullMQ Flow: Job Chains and Pipelines
For multi-step workflows where jobs depend on each other:
import { FlowProducer } from 'bullmq'
const flow = new FlowProducer({ connection: redis })
// Create a data import pipeline
await flow.add({
name: 'validate-and-import',
queueName: 'validation',
data: { fileId },
children: [
{
name: 'process-data',
queueName: 'processing',
data: { fileId },
children: [
{
name: 'generate-report',
queueName: 'reporting',
data: { fileId },
},
],
},
],
})
Child jobs run first. Parent jobs wait for all children to complete. If a child fails, the parent is not started.
Bull Board: Monitoring Dashboard
Install Bull Board for a visual dashboard of your queues:
import { createBullBoard } from '@bull-board/api'
import { BullMQAdapter } from '@bull-board/api/bullMQAdapter'
import { HonoAdapter } from '@bull-board/hono'
const serverAdapter = new HonoAdapter()
createBullBoard({
queues: [
new BullMQAdapter(emailQueue),
new BullMQAdapter(pdfQueue),
new BullMQAdapter(imageQueue),
],
serverAdapter,
})
app.route('/admin/queues', serverAdapter.registerPlugin())
Protect this route with admin authentication. The dashboard shows queue depth, job throughput, failure rates, and lets you manually retry or delete jobs.
Graceful Shutdown
Workers should finish in-progress jobs before shutting down:
async function shutdown() {
console.log('Shutting down workers...')
await emailWorker.close()
await pdfWorker.close()
console.log('Workers stopped gracefully')
process.exit(0)
}
process.on('SIGTERM', shutdown)
process.on('SIGINT', shutdown)
worker.close() stops accepting new jobs and waits for current jobs to complete before returning.
Deployment Considerations
Run workers as separate processes from your API server. This allows:
- Independent scaling (more workers for high-volume queues)
- Separate restarts (worker crash does not affect API)
- Per-worker resource configuration (more memory for image processing workers)
In Docker, a separate service per worker type:
# docker-compose.yml
services:
api:
build: .
command: node dist/api.js
email-worker:
build: .
command: node dist/workers/email.js
scale: 2 # Two instances for redundancy
pdf-worker:
build: .
command: node dist/workers/pdf.js
environment:
- WORKER_CONCURRENCY=2 # PDF is memory-intensive
Background jobs are infrastructure you build once and rely on continuously. Design them with failure in mind from the start and you will sleep better when things inevitably go wrong.
Designing a background job architecture or migrating from a brittle in-process approach to a proper queue? I can help design a system that scales. Book a call: calendly.com/jamesrossjr.