Technology

Building a Webhook Delivery System That Measures Its Own Latency (With Code)

I needed to build a system that fires outbound HTTP callbacks (webhooks/postbacks) to hundreds of different endpoints, tracks delivery success and failure, measures per-endpoint latency percentiles, retries intelligently, and quarantines endpoints that are consistently slow or broken.

This is the kind of infrastructure that looks like a weekend project and becomes a three-month war. Every webhook delivery system I’ve seen in production has the same failure mode: it works fine at low volume, then degrades silently at scale because nobody instrumented the delivery pipeline itself.

This article walks through the full implementation — queue architecture, retry logic with exponential backoff, per-endpoint latency tracking with P50/P95/P99 percentiles, circuit breaker pattern for failing endpoints, and a dead letter queue for permanent failures. All code is Python with PostgreSQL. No external message broker required (though I’ll note where you’d swap in Redis or RabbitMQ for higher throughput).

The Data Model

Everything starts with two tables: one for the delivery queue and one for the latency measurements.

sql

-- Pending and in-flight webhook deliveries
CREATE TABLE webhook_queue (
    id              SERIAL PRIMARY KEY,
    endpoint_id     INTEGER NOT NULL,
    endpoint_url    TEXT NOT NULL,
    payload         JSONB NOT NULL,
    
    -- Delivery state
    status          VARCHAR(20) NOT NULL DEFAULT 'pending',
        -- pending, in_flight, delivered, failed, dead_letter
    attempt_count   INTEGER NOT NULL DEFAULT 0,
    max_attempts    INTEGER NOT NULL DEFAULT 5,
    
    -- Timing
    created_at      TIMESTAMP NOT NULL DEFAULT NOW(),
    next_attempt_at TIMESTAMP NOT NULL DEFAULT NOW(),
    delivered_at    TIMESTAMP,
    last_error      TEXT,
    
    -- Response tracking
    last_status_code INTEGER,
    last_response_ms INTEGER,  -- response time in milliseconds
    
    INDEX idx_queue_next (status, next_attempt_at)
        WHERE status IN ('pending', 'failed')
);

-- Per-endpoint latency measurements (ring buffer)
CREATE TABLE endpoint_latency (
    id              SERIAL PRIMARY KEY,
    endpoint_id     INTEGER NOT NULL,
    response_ms     INTEGER NOT NULL,
    status_code     INTEGER,
    measured_at     TIMESTAMP NOT NULL DEFAULT NOW(),
    
    INDEX idx_latency_endpoint (endpoint_id, measured_at)
);

-- Endpoint health state (circuit breaker)
CREATE TABLE endpoint_health (
    endpoint_id         INTEGER PRIMARY KEY,
    endpoint_url        TEXT NOT NULL,
    state               VARCHAR(20) NOT NULL DEFAULT 'closed',
        -- closed (healthy), open (broken), half_open (testing)
    consecutive_failures INTEGER NOT NULL DEFAULT 0,
    failure_threshold   INTEGER NOT NULL DEFAULT 10,
    last_failure_at     TIMESTAMP,
    last_success_at     TIMESTAMP,
    opened_at           TIMESTAMP,  -- when circuit opened
    cooldown_seconds    INTEGER NOT NULL DEFAULT 300,  -- 5 min before half_open
    
    -- Latency stats (updated periodically)
    p50_ms              INTEGER,
    p95_ms              INTEGER,
    p99_ms              INTEGER,
    sample_count        INTEGER NOT NULL DEFAULT 0
);

Three things to note about this schema.

First, the webhook_queue table uses a next_attempt_at column instead of a separate scheduling mechanism. The worker polls for rows where status IN ('pending', 'failed') AND next_attempt_at <= NOW(). This is a poor man’s delay queue and it works fine up to about 10,000 deliveries per minute. Beyond that, swap in a proper message broker.

Second, the endpoint_latency table acts as a ring buffer. I periodically purge rows older than 24 hours. The latency percentiles in endpoint_health are calculated from this rolling window — they represent recent behavior, not historical averages.

Third, the endpoint_health table implements the circuit breaker state machine. More on this below.

The Delivery Worker

The core worker loop is deliberately simple. Complexity belongs in the retry logic and circuit breaker, not in the delivery path itself.

python

import requests
import time
import psycopg2
from psycopg2.extras import RealDictCursor
from datetime import datetime, timedelta

DB_DSN = "postgresql://user:pass@localhost/webhooks"

def get_connection():
    return psycopg2.connect(DB_DSN)

def deliver_webhooks(batch_size=50):
    """
    Fetch pending webhooks and attempt delivery.
    Uses SELECT FOR UPDATE SKIP LOCKED for safe concurrent workers.
    """
    conn = get_connection()
    cur = conn.cursor(cursor_factory=RealDictCursor)
    
    try:
        cur.execute("""
            SELECT id, endpoint_id, endpoint_url, payload, 
                   attempt_count, max_attempts
            FROM webhook_queue
            WHERE status IN ('pending', 'failed')
              AND next_attempt_at <= NOW()
            ORDER BY next_attempt_at ASC
            LIMIT %s
            FOR UPDATE SKIP LOCKED
        """, (batch_size,))
        
        rows = cur.fetchall()
        
        for row in rows:
            # Check circuit breaker before attempting
            if is_circuit_open(cur, row['endpoint_id']):
                # Don't attempt delivery — reschedule for after cooldown
                reschedule_for_cooldown(cur, row['id'], row['endpoint_id'])
                continue
            
            # Attempt delivery and measure latency
            result = attempt_delivery(
                row['endpoint_url'], 
                row['payload']
            )
            
            # Record latency measurement regardless of success/failure
            record_latency(
                cur, 
                row['endpoint_id'], 
                result['response_ms'], 
                result['status_code']
            )
            
            if result['success']:
                mark_delivered(cur, row['id'], result)
                record_success(cur, row['endpoint_id'])
            else:
                handle_failure(
                    cur, row['id'], row['endpoint_id'],
                    row['attempt_count'], row['max_attempts'],
                    result
                )
        
        conn.commit()
    
    except Exception as e:
        conn.rollback()
        raise
    finally:
        cur.close()
        conn.close()


def attempt_delivery(url, payload):
    """
    Fire the webhook and measure response time.
    Returns dict with success, status_code, response_ms, error.
    """
    start = time.monotonic()
    
    try:
        response = requests.post(
            url,
            json=payload,
            timeout=15,          # 15 second hard timeout
            headers={
                'Content-Type': 'application/json',
                'User-Agent': 'WebhookDelivery/1.0',
                'X-Delivery-Timestamp': str(int(time.time()))
            }
        )
        
        elapsed_ms = int((time.monotonic() - start) * 1000)
        
        return {
            'success': 200 <= response.status_code < 300,
            'status_code': response.status_code,
            'response_ms': elapsed_ms,
            'error': None if response.ok else f"HTTP {response.status_code}"
        }
    
    except requests.Timeout:
        elapsed_ms = int((time.monotonic() - start) * 1000)
        return {
            'success': False,
            'status_code': None,
            'response_ms': elapsed_ms,
            'error': 'timeout_15s'
        }
    
    except requests.ConnectionError as e:
        elapsed_ms = int((time.monotonic() - start) * 1000)
        return {
            'success': False,
            'status_code': None,
            'response_ms': elapsed_ms,
            'error': f'connection_error: {str(e)[:200]}'
        }

The SELECT FOR UPDATE SKIP LOCKED clause is critical for running multiple worker instances. Without SKIP LOCKED, two workers would block on the same row. With it, each worker grabs a different batch of pending webhooks. This gives you horizontal scaling by simply starting more worker processes.

The time.monotonic() call instead of time.time() is deliberate. time.time() can jump backward during NTP adjustments. time.monotonic() never goes backward, which matters when you’re measuring sub-second latency.

Retry Logic with Exponential Backoff and Jitter

When a delivery fails, the retry timing determines whether your system recovers gracefully or creates a thundering herd that hammers a struggling endpoint.

python

import random

def calculate_next_attempt(attempt_count, base_delay=30, max_delay=3600):
    """
    Exponential backoff with full jitter.
    
    Attempt 1: 0-30s
    Attempt 2: 0-60s  
    Attempt 3: 0-120s
    Attempt 4: 0-240s
    Attempt 5: 0-480s (capped at max_delay)
    
    Full jitter prevents thundering herd when an endpoint
    recovers and hundreds of retries fire simultaneously.
    """
    exponential_delay = base_delay * (2 ** attempt_count)
    capped_delay = min(exponential_delay, max_delay)
    jittered_delay = random.uniform(0, capped_delay)
    
    return datetime.utcnow() + timedelta(seconds=jittered_delay)


def handle_failure(cur, webhook_id, endpoint_id, 
                   attempt_count, max_attempts, result):
    """
    Handle a failed delivery attempt.
    Either retry with backoff or move to dead letter queue.
    """
    new_attempt_count = attempt_count + 1
    
    if new_attempt_count >= max_attempts:
        # Exhausted retries — dead letter
        cur.execute("""
            UPDATE webhook_queue 
            SET status = 'dead_letter',
                attempt_count = %s,
                last_error = %s,
                last_status_code = %s,
                last_response_ms = %s
            WHERE id = %s
        """, (
            new_attempt_count, result['error'],
            result['status_code'], result['response_ms'],
            webhook_id
        ))
    else:
        # Schedule retry with backoff
        next_attempt = calculate_next_attempt(new_attempt_count)
        cur.execute("""
            UPDATE webhook_queue
            SET status = 'failed',
                attempt_count = %s,
                next_attempt_at = %s,
                last_error = %s,
                last_status_code = %s,
                last_response_ms = %s
            WHERE id = %s
        """, (
            new_attempt_count, next_attempt,
            result['error'], result['status_code'],
            result['response_ms'], webhook_id
        ))
    
    # Update circuit breaker
    record_failure(cur, endpoint_id)

Why full jitter instead of decorrelated jitter or equal jitter? AWS published the definitive analysis on this. Full jitter (randomizing between 0 and the exponential cap) produces the lowest total completion time across all clients. Equal jitter (randomizing between half the cap and the full cap) is more conservative but slower to drain the retry backlog. For webhook delivery where you have many independent endpoints, full jitter is the right choice because each endpoint’s retries are independent — you’re not coordinating between them.

Circuit Breaker: Stop Hammering Broken Endpoints

The circuit breaker pattern prevents your system from wasting resources on endpoints that are consistently failing. Without it, a dead endpoint accumulates hundreds of pending retries that all timeout at 15 seconds each — burning your worker capacity on deliveries that will never succeed.

python

def is_circuit_open(cur, endpoint_id):
    """
    Check if the circuit breaker is open (endpoint is broken).
    If open and cooldown has passed, transition to half_open.
    """
    cur.execute("""
        SELECT state, opened_at, cooldown_seconds
        FROM endpoint_health
        WHERE endpoint_id = %s
    """, (endpoint_id,))
    
    row = cur.fetchone()
    if not row:
        return False  # no health record = assume healthy
    
    if row['state'] == 'closed':
        return False
    
    if row['state'] == 'open':
        # Check if cooldown period has elapsed
        if row['opened_at'] and row['cooldown_seconds']:
            elapsed = (datetime.utcnow() - row['opened_at']).total_seconds()
            if elapsed >= row['cooldown_seconds']:
                # Transition to half_open — allow one probe
                cur.execute("""
                    UPDATE endpoint_health
                    SET state = 'half_open'
                    WHERE endpoint_id = %s
                """, (endpoint_id,))
                return False  # allow the probe delivery
        return True  # still in cooldown
    
    if row['state'] == 'half_open':
        return False  # allow probe delivery
    
    return False


def record_failure(cur, endpoint_id):
    """
    Record a delivery failure. Open circuit if threshold reached.
    """
    cur.execute("""
        UPDATE endpoint_health
        SET consecutive_failures = consecutive_failures + 1,
            last_failure_at = NOW()
        WHERE endpoint_id = %s
        RETURNING consecutive_failures, failure_threshold, state
    """, (endpoint_id,))
    
    row = cur.fetchone()
    if not row:
        # Create health record on first failure
        cur.execute("""
            INSERT INTO endpoint_health (endpoint_id, endpoint_url, 
                consecutive_failures, last_failure_at)
            VALUES (%s, '', 1, NOW())
            ON CONFLICT (endpoint_id) DO UPDATE
            SET consecutive_failures = endpoint_health.consecutive_failures + 1,
                last_failure_at = NOW()
        """, (endpoint_id,))
        return
    
    if row['state'] == 'half_open':
        # Probe failed — re-open circuit with longer cooldown
        cur.execute("""
            UPDATE endpoint_health
            SET state = 'open',
                opened_at = NOW(),
                cooldown_seconds = LEAST(cooldown_seconds * 2, 3600)
            WHERE endpoint_id = %s
        """, (endpoint_id,))
    
    elif (row['state'] == 'closed' and 
          row['consecutive_failures'] >= row['failure_threshold']):
        # Threshold reached — open circuit
        cur.execute("""
            UPDATE endpoint_health
            SET state = 'open',
                opened_at = NOW(),
                cooldown_seconds = 300  -- reset to 5 minutes
            WHERE endpoint_id = %s
        """, (endpoint_id,))


def record_success(cur, endpoint_id):
    """
    Record a delivery success. Close circuit if half_open.
    """
    cur.execute("""
        UPDATE endpoint_health
        SET consecutive_failures = 0,
            last_success_at = NOW(),
            state = 'closed'
        WHERE endpoint_id = %s
    """, (endpoint_id,))

The half_open → open escalation with doubled cooldown is the detail most implementations miss. If an endpoint fails during the probe (half_open state), you don’t want to retry in another 5 minutes. The endpoint is still broken. Double the cooldown to 10 minutes, then 20, capping at 1 hour. This prevents the circuit breaker from becoming a periodic hammering mechanism.

Latency Percentile Tracking

Averages lie. An endpoint with an average response time of 200ms might respond in 50ms 95% of the time and 3,000ms the other 5%. The average looks fine. The P95 reveals a problem that affects 1 in 20 deliveries.

python

def record_latency(cur, endpoint_id, response_ms, status_code):
    """
    Record a latency measurement and update percentile stats.
    """
    cur.execute("""
        INSERT INTO endpoint_latency 
            (endpoint_id, response_ms, status_code)
        VALUES (%s, %s, %s)
    """, (endpoint_id, response_ms, status_code))


def update_latency_percentiles(cur, endpoint_id, window_hours=24):
    """
    Calculate P50, P95, P99 from the rolling window.
    Uses PostgreSQL's percentile_cont for exact percentiles.
    """
    cur.execute("""
        SELECT 
            COUNT(*) as sample_count,
            percentile_cont(0.50) WITHIN GROUP 
                (ORDER BY response_ms) AS p50,
            percentile_cont(0.95) WITHIN GROUP 
                (ORDER BY response_ms) AS p95,
            percentile_cont(0.99) WITHIN GROUP 
                (ORDER BY response_ms) AS p99
        FROM endpoint_latency
        WHERE endpoint_id = %s
          AND measured_at >= NOW() - INTERVAL '%s hours'
    """, (endpoint_id, window_hours))
    
    row = cur.fetchone()
    
    if row and row['sample_count'] > 0:
        cur.execute("""
            UPDATE endpoint_health
            SET p50_ms = %s,
                p95_ms = %s,
                p99_ms = %s,
                sample_count = %s
            WHERE endpoint_id = %s
        """, (
            int(row['p50']), int(row['p95']), 
            int(row['p99']), row['sample_count'],
            endpoint_id
        ))
    
    return row


def get_slow_endpoints(cur, p95_threshold_ms=2000):
    """
    Find endpoints whose P95 latency exceeds the threshold.
    These are candidates for investigation or circuit opening.
    """
    cur.execute("""
        SELECT endpoint_id, endpoint_url, 
               p50_ms, p95_ms, p99_ms, sample_count,
               state, consecutive_failures
        FROM endpoint_health
        WHERE p95_ms > %s
          AND sample_count >= 20  -- need sufficient samples
        ORDER BY p95_ms DESC
    """, (p95_threshold_ms,))
    
    return cur.fetchall()

PostgreSQL’s percentile_cont is an ordered-set aggregate function that computes exact percentiles. For large datasets, you’d switch to percentile_disc (which returns an actual observed value rather than interpolating) or use a t-digest approximation. For webhook delivery with a 24-hour window, exact percentiles on the raw data are fast enough up to about 100,000 measurements per endpoint.

The get_slow_endpoints function is what I run as a scheduled check every 15 minutes. Endpoints with P95 above 2 seconds get flagged for investigation. Endpoints with P95 above 5 seconds get their circuit breaker threshold reduced — they’re allowed fewer consecutive failures before the circuit opens, because each failed delivery ties up a worker thread for the full timeout duration.

Monitoring Webhook Delivery Health

Here’s the monitoring query I run every five minutes. It produces a single-row health summary of the entire delivery pipeline:

sql

SELECT
    -- Queue depth
    COUNT(*) FILTER (WHERE status = 'pending') AS pending,
    COUNT(*) FILTER (WHERE status = 'failed') AS awaiting_retry,
    COUNT(*) FILTER (WHERE status = 'in_flight') AS in_flight,
    COUNT(*) FILTER (WHERE status = 'dead_letter') AS dead_letter,
    
    -- Delivery rate (last hour)
    COUNT(*) FILTER (
        WHERE status = 'delivered' 
        AND delivered_at >= NOW() - INTERVAL '1 hour'
    ) AS delivered_last_hour,
    
    -- Failure rate (last hour)
    COUNT(*) FILTER (
        WHERE status IN ('failed', 'dead_letter')
        AND created_at >= NOW() - INTERVAL '1 hour'
    ) AS failed_last_hour,
    
    -- Oldest undelivered
    MIN(created_at) FILTER (
        WHERE status IN ('pending', 'failed')
    ) AS oldest_pending,
    
    -- Average delivery latency (last hour, successful only)
    AVG(last_response_ms) FILTER (
        WHERE status = 'delivered'
        AND delivered_at >= NOW() - INTERVAL '1 hour'
    ) AS avg_delivery_ms_last_hour

FROM webhook_queue;

The oldest_pending value is the most important metric in this query. If it’s older than your maximum retry window (sum of all backoff delays), something is structurally wrong — either the worker is stuck, the endpoint is blackholed, or the queue is growing faster than you can drain it.

I alert on three conditions: dead letter count increasing (endpoints are permanently failing and nobody’s investigating), pending queue age exceeding 30 minutes (delivery is falling behind), and per-endpoint P95 latency exceeding thresholds that indicate degraded delivery reliability . The third one is the early warning signal — latency increases before failures do. An endpoint that was responding in 200ms and starts responding in 3 seconds is about to start timing out.

The Dead Letter Queue Isn’t Just Storage

Most teams implement a dead letter queue as a table where failed webhooks go to die. They check it occasionally during incident response. This is a waste.

The dead letter queue is your most valuable debugging dataset. Every row represents a delivery that your system tried multiple times and gave up on. The pattern of dead letters tells you things the success metrics never will.

python

def analyze_dead_letters(cur, hours=24):
    """
    Analyze recent dead letter entries for patterns.
    Returns per-endpoint failure analysis.
    """
    cur.execute("""
        SELECT 
            endpoint_id,
            endpoint_url,
            COUNT(*) AS dead_count,
            
            -- Most common error
            MODE() WITHIN GROUP (ORDER BY last_error) AS primary_error,
            
            -- Most common status code
            MODE() WITHIN GROUP (ORDER BY last_status_code) 
                AS primary_status_code,
            
            -- Timing
            MIN(created_at) AS first_dead,
            MAX(created_at) AS last_dead,
            
            -- Average attempts before giving up
            AVG(attempt_count)::INTEGER AS avg_attempts
            
        FROM webhook_queue
        WHERE status = 'dead_letter'
          AND created_at >= NOW() - INTERVAL '%s hours'
        GROUP BY endpoint_id, endpoint_url
        ORDER BY dead_count DESC
        LIMIT 20
    """, (hours,))
    
    return cur.fetchall()

When I review dead letters, I’m looking for three patterns.

Cluster failures: 50 dead letters for the same endpoint in the same hour means the endpoint went down and didn’t recover within the retry window. Action: extend retry window or implement manual re-queue.

Status code patterns: A spike in 401/403 dead letters means the endpoint rotated credentials and nobody updated the webhook configuration. A spike in 429 (Too Many Requests) means you’re exceeding their rate limit and need to throttle.

Gradual accumulation: 2-3 dead letters per day for a single endpoint, spread evenly. This is the sneakiest pattern — the endpoint is mostly working but has intermittent failures that exhaust retries over time. The fix is usually increasing max_attempts for that specific endpoint or reducing the timeout.

Running the Worker

The main loop that ties everything together:

python

import signal
import sys

running = True

def shutdown_handler(signum, frame):
    global running
    running = False
    print(f"Received signal {signum}, shutting down gracefully...")

signal.signal(signal.SIGTERM, shutdown_handler)
signal.signal(signal.SIGINT, shutdown_handler)

def main():
    print("Webhook delivery worker starting...")
    
    while running:
        try:
            deliver_webhooks(batch_size=50)
        except Exception as e:
            print(f"Worker error: {e}")
            time.sleep(5)  # back off on errors
            continue
        
        # Update latency stats every 100 iterations
        # (cheap operation, doesn't need to run every loop)
        if int(time.time()) % 100 == 0:
            conn = get_connection()
            cur = conn.cursor(cursor_factory=RealDictCursor)
            try:
                cur.execute(
                    "SELECT DISTINCT endpoint_id FROM endpoint_health"
                )
                for row in cur.fetchall():
                    update_latency_percentiles(cur, row['endpoint_id'])
                conn.commit()
            finally:
                cur.close()
                conn.close()
        
        # Poll interval — 500ms keeps latency low without
        # hammering the database
        time.sleep(0.5)

    print("Worker shut down cleanly.")

if __name__ == '__main__':
    main()

The SIGTERM handler is essential for clean shutdowns in containerized environments. When Kubernetes sends SIGTERM, the worker finishes its current batch, commits the transaction, and exits. Without this, you get rows stuck in in_flight status with no worker processing them.

What This System Doesn’t Do (And When You Need More)

This implementation handles up to about 10,000 deliveries per minute on a single PostgreSQL instance with 2-3 worker processes. Beyond that, three changes are needed.

First, replace the PostgreSQL queue with Redis Streams or RabbitMQ. The SELECT FOR UPDATE SKIP LOCKED pattern creates write contention on the queue table at high throughput. A dedicated message broker eliminates this.

Second, add per-endpoint rate limiting. Some receiving endpoints have rate limits (100 requests per minute, 1,000 per hour). Without client-side rate limiting, you’ll blow through their quota and get 429’d. Implement a token bucket per endpoint.

Third, add request signing. HMAC-SHA256 signatures on the payload let the receiving endpoint verify that the webhook came from your system and wasn’t tampered with in transit. This is table stakes for any webhook system that sends financial data.

The system in this article is the foundation. It handles the hard problems — retry logic, circuit breaking, latency measurement, dead letter analysis — that every webhook delivery system needs regardless of scale. The specific components you bolt on (message broker, rate limiter, request signing) depend on your throughput and security requirements.

The part that matters most is the part most teams skip: measuring the delivery system itself. If you can’t answer “what is the P95 delivery latency to Endpoint X over the last 24 hours,” you’re operating blind. Build the instrumentation first. Everything else follows.

Caesar Fikson

I am an iGaming Data Analyst specializing in examining and interpreting data related to online gaming platforms and gambling activities as well as market trends. I analyze player behavior, game performance, and revenue trends to optimize gaming experiences and business strategies.

Recent Posts

Best iGaming Affiliate Tracking Software in 2026

The best iGaming affiliate tracking software in 2026 is not the platform with the prettiest…

2 days ago

Free Casino Scripts and Open-Source Game Engines: What’s Worth Deploying

The honest guide to free casino scripts and open-source casino game engines in 2026 —…

3 days ago

Best iGaming Affiliate Software: 2026 Vendor Comparison With Real Criteria

The 2026 iGaming affiliate software comparison operators actually need — evaluated on postback reliability, NGR…

5 days ago

iGaming Link Building: Safe Casino Backlink Strategies

iGaming link building is not normal SEO with casino keywords sprinkled on top. It is…

1 week ago

Casino Affiliate Programs That Pay Well: A Practitioner’s Shortlist

TL;DR Most casino affiliate program roundups list 50 programs ranked by commission percentage. Commission percentage…

1 week ago

iGaming Platform Provider: What the Shortlist Actually Looks Like in 2026

TL;DR The iGaming platform market in 2026 looks like this: enterprise providers have moved upmarket,…

2 weeks ago