Skip to main content
Engineering7 min readMarch 3, 2026

Building a Reliable Webhook System: Delivery Guarantees and Failure Handling

A complete guide to building production-grade webhooks — HMAC signatures, retry logic, idempotency, fanout architecture, and the operational concerns that most guides skip.

James Ross Jr.

James Ross Jr.

Strategic Systems Architect & Enterprise Software Developer

Webhooks sound simple — send an HTTP POST when something happens. The simplicity is deceptive. A production webhook system needs delivery guarantees, security, retry logic, failure visibility, and a way to handle the thousands of edge cases that emerge when you are delivering millions of events to hundreds of different endpoints.

This guide covers building a webhook system that behaves correctly under failure conditions and gives customers the reliability they need to build against.

The Core Architecture

A naive webhook system: an event happens, you send a POST, you move on. The problem is what happens when the POST fails — the customer's endpoint is down, returns a 500, or times out. The event is lost.

A reliable webhook system separates event publishing from delivery:

Event occurs
    → Write to webhook_events table (durable)
    → Enqueue delivery job
    → Job delivers to each endpoint
    → Retry on failure
    → Mark delivered or permanently failed

This design ensures that even if every delivery attempt fails, the event is recorded and can be replayed.

Database Schema

CREATE TABLE webhook_endpoints (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id     UUID NOT NULL REFERENCES users(id),
  url         TEXT NOT NULL,
  secret      TEXT NOT NULL,  -- Stored encrypted
  events      TEXT[] NOT NULL DEFAULT '{}',  -- Which events to subscribe to
  active      BOOLEAN NOT NULL DEFAULT true,
  created_at  TIMESTAMP DEFAULT NOW()
);

CREATE TABLE webhook_deliveries (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  endpoint_id   UUID NOT NULL REFERENCES webhook_endpoints(id),
  event_type    TEXT NOT NULL,
  payload       JSONB NOT NULL,
  status        TEXT NOT NULL DEFAULT 'pending',  -- pending, delivered, failed
  attempts      INTEGER NOT NULL DEFAULT 0,
  next_retry_at TIMESTAMP,
  last_error    TEXT,
  delivered_at  TIMESTAMP,
  created_at    TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_webhook_deliveries_status ON webhook_deliveries(status, next_retry_at)
WHERE status IN ('pending', 'failed');

HMAC Signatures

Endpoints cannot trust that an incoming webhook is really from you without cryptographic verification. Sign every payload with HMAC-SHA256:

import crypto from 'crypto'

export function signPayload(payload: string, secret: string): string {
  const timestamp = Math.floor(Date.now() / 1000).toString()
  const signedPayload = `${timestamp}.${payload}`

  const signature = crypto
    .createHmac('sha256', secret)
    .update(signedPayload)
    .digest('hex')

  return `t=${timestamp},v1=${signature}`
}

// Include in headers
headers: {
  'Content-Type': 'application/json',
  'Webhook-Signature': signPayload(JSON.stringify(payload), endpoint.secret),
  'Webhook-ID': deliveryId,
  'Webhook-Timestamp': timestamp,
}

Verification code your customers implement:

function verifyWebhook(
  payload: string,
  signature: string,
  secret: string,
  toleranceSeconds = 300
): boolean {
  const parts = Object.fromEntries(
    signature.split(',').map(p => p.split('='))
  )

  const timestamp = parseInt(parts.t)
  const receivedSig = parts.v1

  // Reject old webhooks (replay attack prevention)
  if (Math.abs(Date.now() / 1000 - timestamp) > toleranceSeconds) {
    return false
  }

  const expectedSig = crypto
    .createHmac('sha256', secret)
    .update(`${timestamp}.${payload}`)
    .digest('hex')

  // Constant-time comparison prevents timing attacks
  return crypto.timingSafeEqual(
    Buffer.from(receivedSig),
    Buffer.from(expectedSig)
  )
}

Retry Logic With Exponential Backoff

Delivery failures should be retried with exponential backoff:

const RETRY_DELAYS = [
  5,     // 5 seconds
  30,    // 30 seconds
  300,   // 5 minutes
  1800,  // 30 minutes
  7200,  // 2 hours
  86400, // 24 hours
]

async function deliverWebhook(deliveryId: string): Promise<void> {
  const delivery = await db.query.webhookDeliveries.findFirst({
    where: eq(webhookDeliveries.id, deliveryId),
    with: { endpoint: true },
  })

  if (!delivery) return

  const payload = JSON.stringify({
    id: delivery.id,
    type: delivery.eventType,
    data: delivery.payload,
    created: delivery.createdAt.toISOString(),
  })

  try {
    const response = await fetch(delivery.endpoint.url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Webhook-Signature': signPayload(payload, delivery.endpoint.secret),
        'Webhook-ID': delivery.id,
      },
      body: payload,
      signal: AbortSignal.timeout(30000), // 30 second timeout
    })

    if (response.ok) {
      await db.update(webhookDeliveries)
        .set({ status: 'delivered', deliveredAt: new Date() })
        .where(eq(webhookDeliveries.id, deliveryId))
      return
    }

    throw new Error(`HTTP ${response.status}: ${await response.text()}`)
  } catch (error) {
    const attempts = delivery.attempts + 1
    const maxAttempts = RETRY_DELAYS.length

    if (attempts >= maxAttempts) {
      await db.update(webhookDeliveries)
        .set({
          status: 'failed',
          attempts,
          lastError: error instanceof Error ? error.message : 'Unknown error',
        })
        .where(eq(webhookDeliveries.id, deliveryId))

      // Disable endpoint after repeated failures
      await checkAndDisableEndpoint(delivery.endpointId)
      return
    }

    const delaySeconds = RETRY_DELAYS[attempts - 1]
    const nextRetryAt = new Date(Date.now() + delaySeconds * 1000)

    await db.update(webhookDeliveries)
      .set({
        status: 'pending',
        attempts,
        nextRetryAt,
        lastError: error instanceof Error ? error.message : 'Unknown error',
      })
      .where(eq(webhookDeliveries.id, deliveryId))
  }
}

The Delivery Worker

A worker process polls for pending deliveries:

async function runDeliveryWorker() {
  while (true) {
    const pending = await db.select()
      .from(webhookDeliveries)
      .where(and(
        eq(webhookDeliveries.status, 'pending'),
        lte(webhookDeliveries.nextRetryAt, new Date()),
      ))
      .limit(50)

    if (pending.length === 0) {
      await new Promise(resolve => setTimeout(resolve, 5000))
      continue
    }

    // Process deliveries concurrently
    await Promise.allSettled(
      pending.map(delivery => deliverWebhook(delivery.id))
    )
  }
}

In production, use a proper job queue (BullMQ, Inngest, or similar) rather than polling. The database polling approach works for modest volumes but does not scale to high delivery rates.

Idempotency

Webhooks may be delivered more than once (the delivery succeeded but your acknowledgment was lost, so the system retried). Customers must handle duplicate deliveries.

Every webhook should have a unique ID that customers can use to deduplicate:

{
  "id": "evt_01j9abc...",
  "type": "payment.succeeded",
  "data": { ... },
  "created": "2026-03-03T12:00:00Z"
}

Customers store processed event IDs:

// Customer-side deduplication
async function handleWebhook(event: WebhookEvent) {
  const alreadyProcessed = await redis.set(
    `webhook:${event.id}`,
    '1',
    'EX', 86400,    // 24 hours
    'NX'           // Only set if not exists
  )

  if (!alreadyProcessed) {
    return  // Already processed
  }

  // Process the event
}

Fanout to Multiple Endpoints

When a single event needs to be delivered to multiple endpoints (different customers subscribed to the same event type), create a delivery record per endpoint:

async function publishEvent(eventType: string, payload: unknown) {
  // Find all active endpoints subscribed to this event type
  const endpoints = await db.select()
    .from(webhookEndpoints)
    .where(and(
      eq(webhookEndpoints.active, true),
      sql`${webhookEndpoints.events} @> ARRAY[${eventType}]`
    ))

  // Create delivery records for each endpoint
  if (endpoints.length > 0) {
    await db.insert(webhookDeliveries)
      .values(endpoints.map(endpoint => ({
        endpointId: endpoint.id,
        eventType,
        payload: payload as Record<string, unknown>,
        nextRetryAt: new Date(),
      })))
  }
}

Operational Visibility

Your customers need to see delivery attempts, successes, and failures. Build a delivery log UI:

// GET /api/webhooks/deliveries
app.get('/api/webhooks/deliveries', requireAuth, async (c) => {
  const userId = c.get('userId')
  const { endpointId, status, limit = 50 } = c.req.query()

  const deliveries = await db.select()
    .from(webhookDeliveries)
    .innerJoin(webhookEndpoints, eq(webhookEndpoints.id, webhookDeliveries.endpointId))
    .where(and(
      eq(webhookEndpoints.userId, userId),
      endpointId ? eq(webhookDeliveries.endpointId, endpointId) : undefined,
      status ? eq(webhookDeliveries.status, status) : undefined,
    ))
    .orderBy(desc(webhookDeliveries.createdAt))
    .limit(Number(limit))

  return c.json(deliveries)
})

// POST /api/webhooks/deliveries/:id/retry
app.post('/api/webhooks/deliveries/:id/retry', requireAuth, async (c) => {
  // Allow manual retry of failed deliveries
  await db.update(webhookDeliveries)
    .set({ status: 'pending', nextRetryAt: new Date() })
    .where(eq(webhookDeliveries.id, c.req.param('id')))

  return c.json({ success: true })
})

Testing Your Webhook System

Provide a test mode that sends webhooks to a local endpoint or a testing service like webhook.site. For development, use a tool like ngrok or Cloudflare Tunnel to expose your local server:

// Test webhook endpoint
app.post('/api/webhooks/test', requireAuth, async (c) => {
  const { endpointId, eventType } = await c.req.json()

  await publishEvent(eventType, {
    test: true,
    timestamp: new Date().toISOString(),
  })

  return c.json({ success: true, message: 'Test event published' })
})

A reliable webhook system is the foundation of a trustworthy API platform. Getting it right means your customers can build confidently on your events, knowing that delivery failures are handled gracefully and every event is auditable.


Building a webhook system or adding event-driven features to an existing API? I have built these in production and can help you avoid the common pitfalls. Book a call: calendly.com/jamesrossjr.


Keep Reading