What is the Retry Pattern — Exponential Backoff and Jitter [2026]

Q: What is exponential backoff?

Exponential backoff is a retry strategy where the delay between attempts grows exponentially: delay = base_delay * 2^attempt. For example with base_delay=1s: attempt 1 waits 1s, attempt 2 waits 2s, attempt 3 waits 4s, attempt 4 waits 8s. This gives the failing service time to recover and reduces load compared to immediate retries. Most implementations cap the maximum delay (e.g. max 60s) to prevent indefinite waits.

Q: What is jitter in retry logic?

Jitter adds randomness to retry delays to prevent synchronized retries. If 1,000 clients all experience the same failure at the same time and all retry with the same exponential backoff schedule, they will all retry simultaneously — causing a thundering herd against the recovering service. Jitter spreads retries over time. Full jitter: delay = random(0, base_delay * 2^attempt). Decorrelated jitter (AWS recommendation): delay = random(base_delay, prev_delay * 3). Equal jitter: delay = cap/2 + random(0, cap/2).

Q: What errors should NOT be retried?

Client errors (4xx) should generally not be retried because they indicate a problem with the request itself, not a transient server issue. 400 Bad Request (malformed request), 401 Unauthorized (invalid credentials), 403 Forbidden (no permission), 404 Not Found (resource does not exist), and 422 Unprocessable Entity should not be retried — retrying will not fix the underlying problem. Only transient errors (5xx server errors, 429 Too Many Requests with a Retry-After header, and network-level timeouts) should be retried.

Q: What is a retry budget?

A retry budget limits the total number of retries in a time window to prevent a cascade of retries amplifying load on a failing service. For example, a service with a 10% retry budget means at most 10% of total requests can be retries at any given time. If you receive 1,000 requests/second and 500 are failing, a 10% budget allows only 100 retries/second rather than 500 — preventing the retry storm from tripling the load on the downstream service. Implemented as a global rate limiter on retry operations.

Q: When should you use retry vs circuit breaker?

Use retry for transient errors — brief network blips, temporary service unavailability, or rate limiting where the service will recover quickly. Use a circuit breaker when a service is consistently failing and retries would just amplify the load. The circuit breaker opens after a threshold of failures, immediately returning errors without making downstream calls, giving the failing service time to recover. In practice, you use both together: retries handle brief hiccups, and the circuit breaker handles sustained outages.

Q: What is the difference between retry and timeout?

A timeout defines how long to wait for a single request to complete before giving up. A retry defines how many times to attempt an operation that has failed or timed out. They work together: set a per-request timeout (e.g. 2s) to avoid waiting forever, then retry up to N times with backoff on timeout or 5xx errors. Always set a timeout before configuring retries — without a timeout, a hung connection will wait indefinitely and your retry budget will be consumed by slow requests rather than failed ones.

1. Why Naive Immediate Retry is Dangerous

The simplest retry — immediately try again on failure — is also the most dangerous. If a service is overloaded and 1,000 clients retry immediately after receiving a 503, the service receives 2,000 requests instead of 1,000, making the overload worse. Immediate retries can turn a brief hiccup into a cascading failure.

  NAIVE IMMEDIATE RETRY (dangerous):

  T=0:    Service overloaded, returns 503 to 1,000 clients
  T=0.01: All 1,000 clients retry immediately
          Service receives 2,000 requests → MORE overloaded
  T=0.02: All 1,000 clients retry again
          Service receives 3,000 requests → crash

  EXPONENTIAL BACKOFF WITH JITTER (safe):

  T=0:    Service returns 503 to 1,000 clients
  T=1-2s: Clients retry with random delay in [0, 2s]
          Only ~500 clients retry per second → service can recover
  T=4-8s: Remaining failures retry with delay in [0, 8s]
          ~125 per second → service almost fully recovered ✓

2. Exponential Backoff

Exponential backoff increases the delay between retry attempts exponentially: delay = base_delay * (2 ^ attempt). With base_delay = 1 second:

Attempt 1: wait 1s (2^0 = 1)
Attempt 2: wait 2s (2^1 = 2)
Attempt 3: wait 4s (2^2 = 4)
Attempt 4: wait 8s (2^3 = 8)
Attempt 5: wait 16s (2^4 = 16)

Always cap the maximum delay to prevent indefinitely long waits: delay = min(base_delay * 2^attempt, max_delay). A max_delay of 30–60 seconds is typical.

3. Adding Jitter

Jitter randomises the backoff delay to prevent synchronized retries from multiple clients. AWS's distributed systems team recommends these three jitter strategies:

Python — exponential backoff with jitter strategies import random import time BASE = 1.0 # 1 second base delay CAP = 60.0 # 60 second max delay MAX_ATTEMPTS = 5 def full_jitter(attempt: int) -> float: """AWS recommended: random(0, min(cap, base * 2^attempt))""" return random.uniform(0, min(CAP, BASE * (2 ** attempt))) def equal_jitter(attempt: int) -> float: """cap/2 + random(0, cap/2) — ensures minimum delay""" v = min(CAP, BASE * (2 ** attempt)) return v / 2 + random.uniform(0, v / 2) def decorrelated_jitter(prev_delay: float) -> float: """AWS recommended: random(base, prev * 3)""" return min(CAP, random.uniform(BASE, prev_delay * 3)) def retry_with_backoff(fn, max_attempts=MAX_ATTEMPTS): prev_delay = BASE for attempt in range(max_attempts): try: return fn() except TransientError as e: if attempt == max_attempts - 1: raise delay = full_jitter(attempt) print(f"Attempt {attempt+1} failed, retrying in {delay:.2f}s") time.sleep(delay) prev_delay = delay

4. What to Retry — Retryable vs Non-Retryable Errors

Error Type	Retryable?	Examples	Notes
Network timeout	Yes	Connection reset, read timeout	Requires idempotency
503 Service Unavailable	Yes	Service restarting, overloaded	Respect Retry-After header
502 Bad Gateway	Yes	Upstream died mid-request	Transient, usually recovers
429 Too Many Requests	Yes	Rate limited	Use Retry-After header delay
500 Internal Server Error	Sometimes	Server bug vs temporary overload	Retry with low attempt count
400 Bad Request	No	Malformed request	Retrying won't fix the request
401 Unauthorized	No	Invalid API key	Fix credentials first
403 Forbidden	No	Insufficient permissions	Not a transient error
404 Not Found	No	Resource doesn't exist	Won't appear on retry
422 Unprocessable Entity	No	Validation failure	Fix the request payload

Idempotency is Required for Safe Retries

You can only safely retry an operation if it is idempotent — making the same request twice produces the same result. GET, HEAD, PUT, and DELETE are naturally idempotent. POST is not. For POST operations (creating resources, charging payments), use idempotency keys so the server can detect and deduplicate retried requests. Never retry non-idempotent POST operations without an idempotency key.

5. Retry Pattern Implementation in PHP

PHP — production retry with exponential backoff and jitter function retryWithBackoff( callable $fn, int $maxAttempts = 4, float $baseDelay = 1.0, float $maxDelay = 30.0, array $retryOn = [503, 502, 429] ): mixed { $lastException = null; for ($attempt = 0; $attempt < $maxAttempts; $attempt++) { try { return $fn(); } catch (HttpException $e) { if (!in_array($e->getStatusCode(), $retryOn)) { throw $e; // non-retryable } $lastException = $e; if ($attempt === $maxAttempts - 1) break; // Full jitter: random(0, min(maxDelay, base * 2^attempt)) $cap = min($maxDelay, $baseDelay * pow(2, $attempt)); $delay = mt_rand(0, (int)($cap * 1000)) / 1000.0; // Respect Retry-After header from 429 responses if ($e->getStatusCode() === 429) { $retryAfter = $e->getHeader('Retry-After'); if ($retryAfter) $delay = max($delay, (float)$retryAfter); } usleep((int)($delay * 1_000_000)); } catch (NetworkException $e) { $lastException = $e; if ($attempt === $maxAttempts - 1) break; $cap = min($maxDelay, $baseDelay * pow(2, $attempt)); $delay = mt_rand(0, (int)($cap * 1000)) / 1000.0; usleep((int)($delay * 1_000_000)); } } throw $lastException; }

6. Retry vs Circuit Breaker vs Timeout

Pattern	Purpose	When to Use	Works Together?
Retry + Backoff	Handle transient failures	Brief, self-healing failures (network blip, brief overload)	Yes — first line of defence
Circuit Breaker	Stop calling a failing service	Sustained failures, service down for >30s	Yes — opens when retry budget exhausted
Timeout	Bound request duration	Always — set before configuring retries	Yes — triggers retry on timeout
Bulkhead	Limit concurrent requests	Prevent one slow service from consuming all threads	Yes — complements circuit breaker

7. Libraries for Retry Logic

Do not implement retry logic from scratch in production. Use battle-tested libraries:

Polly (.NET): Policy.Handle<HttpRequestException>().WaitAndRetry(3, r => TimeSpan.FromSeconds(Math.Pow(2, r))) — supports retry, circuit breaker, timeout, bulkhead, fallback, and hedging in a unified fluent API.
Resilience4j (Java): Lightweight fault tolerance library for Java. Supports Retry, CircuitBreaker, RateLimiter, TimeLimiter, Bulkhead, and Cache as composable decorators.
tenacity (Python): @retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(5)) — decorator-based retry with exponential backoff, jitter, and custom retry conditions.
axios-retry (Node.js): Plug-in for the Axios HTTP client with configurable retry conditions and exponential backoff.
AWS SDK: All AWS SDKs have built-in retry logic with full jitter by default. Configure with maxAttempts and retryMode: 'adaptive'.

Retry Amplification — The Hidden Danger

With 3 retries, every failed request becomes 4 requests to the downstream service. If 50% of requests fail, your downstream service receives 4× the expected load instead of 2×. Under sustained failure, retries amplify load and can prevent recovery. Always implement a retry budget (max N% of total requests can be retries) and use circuit breakers to stop retrying when a service is consistently failing. Combine with exponential backoff to give the service time to recover between attempts.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

The workflow or formula is tested directly in the tool and compared against independent reference examples.
Examples are kept practical so readers can verify the result without hidden assumptions.
Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Retry Pattern

What is exponential backoff?

What is jitter in retry logic?

What errors should NOT be retried?

What is a retry budget?

When should you use retry vs circuit breaker?

What is the difference between retry and timeout?