1. Why Naive Immediate Retry is Dangerous

The simplest retry — immediately try again on failure — is also the most dangerous. If a service is overloaded and 1,000 clients retry immediately after receiving a 503, the service receives 2,000 requests instead of 1,000, making the overload worse. Immediate retries can turn a brief hiccup into a cascading failure.

  NAIVE IMMEDIATE RETRY (dangerous):

  T=0:    Service overloaded, returns 503 to 1,000 clients
  T=0.01: All 1,000 clients retry immediately
          Service receives 2,000 requests → MORE overloaded
  T=0.02: All 1,000 clients retry again
          Service receives 3,000 requests → crash

  EXPONENTIAL BACKOFF WITH JITTER (safe):

  T=0:    Service returns 503 to 1,000 clients
  T=1-2s: Clients retry with random delay in [0, 2s]
          Only ~500 clients retry per second → service can recover
  T=4-8s: Remaining failures retry with delay in [0, 8s]
          ~125 per second → service almost fully recovered ✓

2. Exponential Backoff

Exponential backoff increases the delay between retry attempts exponentially: delay = base_delay * (2 ^ attempt). With base_delay = 1 second:

  • Attempt 1: wait 1s (2^0 = 1)
  • Attempt 2: wait 2s (2^1 = 2)
  • Attempt 3: wait 4s (2^2 = 4)
  • Attempt 4: wait 8s (2^3 = 8)
  • Attempt 5: wait 16s (2^4 = 16)

Always cap the maximum delay to prevent indefinitely long waits: delay = min(base_delay * 2^attempt, max_delay). A max_delay of 30–60 seconds is typical.

3. Adding Jitter

Jitter randomises the backoff delay to prevent synchronized retries from multiple clients. AWS's distributed systems team recommends these three jitter strategies:

Python — exponential backoff with jitter strategies import random import time BASE = 1.0 # 1 second base delay CAP = 60.0 # 60 second max delay MAX_ATTEMPTS = 5 def full_jitter(attempt: int) -> float: """AWS recommended: random(0, min(cap, base * 2^attempt))""" return random.uniform(0, min(CAP, BASE * (2 ** attempt))) def equal_jitter(attempt: int) -> float: """cap/2 + random(0, cap/2) — ensures minimum delay""" v = min(CAP, BASE * (2 ** attempt)) return v / 2 + random.uniform(0, v / 2) def decorrelated_jitter(prev_delay: float) -> float: """AWS recommended: random(base, prev * 3)""" return min(CAP, random.uniform(BASE, prev_delay * 3)) def retry_with_backoff(fn, max_attempts=MAX_ATTEMPTS): prev_delay = BASE for attempt in range(max_attempts): try: return fn() except TransientError as e: if attempt == max_attempts - 1: raise delay = full_jitter(attempt) print(f"Attempt {attempt+1} failed, retrying in {delay:.2f}s") time.sleep(delay) prev_delay = delay

4. What to Retry — Retryable vs Non-Retryable Errors

Error TypeRetryable?ExamplesNotes
Network timeoutYesConnection reset, read timeoutRequires idempotency
503 Service UnavailableYesService restarting, overloadedRespect Retry-After header
502 Bad GatewayYesUpstream died mid-requestTransient, usually recovers
429 Too Many RequestsYesRate limitedUse Retry-After header delay
500 Internal Server ErrorSometimesServer bug vs temporary overloadRetry with low attempt count
400 Bad RequestNoMalformed requestRetrying won't fix the request
401 UnauthorizedNoInvalid API keyFix credentials first
403 ForbiddenNoInsufficient permissionsNot a transient error
404 Not FoundNoResource doesn't existWon't appear on retry
422 Unprocessable EntityNoValidation failureFix the request payload

Idempotency is Required for Safe Retries

You can only safely retry an operation if it is idempotent — making the same request twice produces the same result. GET, HEAD, PUT, and DELETE are naturally idempotent. POST is not. For POST operations (creating resources, charging payments), use idempotency keys so the server can detect and deduplicate retried requests. Never retry non-idempotent POST operations without an idempotency key.

5. Retry Pattern Implementation in PHP

PHP — production retry with exponential backoff and jitter function retryWithBackoff( callable $fn, int $maxAttempts = 4, float $baseDelay = 1.0, float $maxDelay = 30.0, array $retryOn = [503, 502, 429] ): mixed { $lastException = null; for ($attempt = 0; $attempt < $maxAttempts; $attempt++) { try { return $fn(); } catch (HttpException $e) { if (!in_array($e->getStatusCode(), $retryOn)) { throw $e; // non-retryable } $lastException = $e; if ($attempt === $maxAttempts - 1) break; // Full jitter: random(0, min(maxDelay, base * 2^attempt)) $cap = min($maxDelay, $baseDelay * pow(2, $attempt)); $delay = mt_rand(0, (int)($cap * 1000)) / 1000.0; // Respect Retry-After header from 429 responses if ($e->getStatusCode() === 429) { $retryAfter = $e->getHeader('Retry-After'); if ($retryAfter) $delay = max($delay, (float)$retryAfter); } usleep((int)($delay * 1_000_000)); } catch (NetworkException $e) { $lastException = $e; if ($attempt === $maxAttempts - 1) break; $cap = min($maxDelay, $baseDelay * pow(2, $attempt)); $delay = mt_rand(0, (int)($cap * 1000)) / 1000.0; usleep((int)($delay * 1_000_000)); } } throw $lastException; }

6. Retry vs Circuit Breaker vs Timeout

PatternPurposeWhen to UseWorks Together?
Retry + BackoffHandle transient failuresBrief, self-healing failures (network blip, brief overload)Yes — first line of defence
Circuit BreakerStop calling a failing serviceSustained failures, service down for >30sYes — opens when retry budget exhausted
TimeoutBound request durationAlways — set before configuring retriesYes — triggers retry on timeout
BulkheadLimit concurrent requestsPrevent one slow service from consuming all threadsYes — complements circuit breaker

7. Libraries for Retry Logic

Do not implement retry logic from scratch in production. Use battle-tested libraries:

  • Polly (.NET): Policy.Handle<HttpRequestException>().WaitAndRetry(3, r => TimeSpan.FromSeconds(Math.Pow(2, r))) — supports retry, circuit breaker, timeout, bulkhead, fallback, and hedging in a unified fluent API.
  • Resilience4j (Java): Lightweight fault tolerance library for Java. Supports Retry, CircuitBreaker, RateLimiter, TimeLimiter, Bulkhead, and Cache as composable decorators.
  • tenacity (Python): @retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(5)) — decorator-based retry with exponential backoff, jitter, and custom retry conditions.
  • axios-retry (Node.js): Plug-in for the Axios HTTP client with configurable retry conditions and exponential backoff.
  • AWS SDK: All AWS SDKs have built-in retry logic with full jitter by default. Configure with maxAttempts and retryMode: 'adaptive'.

Retry Amplification — The Hidden Danger

With 3 retries, every failed request becomes 4 requests to the downstream service. If 50% of requests fail, your downstream service receives 4× the expected load instead of 2×. Under sustained failure, retries amplify load and can prevent recovery. Always implement a retry budget (max N% of total requests can be retries) and use circuit breakers to stop retrying when a service is consistently failing. Combine with exponential backoff to give the service time to recover between attempts.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Retry Pattern