The Three States

🟢 Closed

Normal operation. Requests pass through. Failures are counted against a threshold.

🔴 Open

Threshold exceeded. All requests fail immediately, no network call attempted, for a cooldown window.

🟡 Half-Open

Cooldown ended. A few test requests are let through to check if the dependency has recovered.

If the test requests in Half-Open succeed, the circuit closes and traffic resumes normally. If they fail, it reopens and the cooldown timer restarts.

Why This Matters: Stopping Cascading Failures

Imagine Service A calls Service B, and Service B starts timing out under load. Without a circuit breaker, every request from A to B waits the full timeout (say, 30 seconds) before failing. If A is handling thousands of requests per second, threads/connections pile up waiting on B — and now A itself becomes slow or unresponsive, even though A's own code is fine. This is a cascading failure: one struggling service drags down everything that depends on it.

without a circuit breaker Service A -> calls Service B (failing) -> waits 30s for timeout (repeated for every request, threads pile up, A becomes unresponsive too)
with a circuit breaker Service A -> circuit OPEN for B -> fails in ~1ms, no network call made (A stays responsive, can return a fallback or cached response instead)

Circuit Breaker vs Retry

AspectRetryCircuit Breaker
Assumes the failure isTransient — likely to succeed on the next trySustained — the dependency is genuinely down
Effect on the failing serviceAdds more load (more attempts)Reduces load (stops attempts)
Best used forBrief network blips, momentary slownessOutages, dependency overload, deployment issues
Common pairingRetry with exponential backoff, then circuit breaker if retries keep failingOften wraps retry logic as the outer safety net

These two patterns are complementary, not competing — see Retry Pattern with Exponential Backoff for how retries are usually configured before a circuit breaker takes over.

What to Do When the Circuit Is Open

Failing fast is only half the story — the other half is what you do with that failure. Common strategies:

  • Return a cached response — slightly stale data is often better than no data.
  • Return a default/fallback value — e.g. show "Recommendations unavailable" instead of crashing the whole page.
  • Queue the request — for non-urgent writes, queue and retry later instead of failing the user-facing request.
  • Propagate a fast, clear error — so upstream callers can also fail fast instead of waiting.

⚠️ Tuning the Threshold Matters

Trip too aggressively (low failure threshold, short window) and you'll open the circuit on brief, harmless blips, rejecting traffic unnecessarily. Trip too conservatively and you delay protection during a real outage. Most implementations use a rolling window of recent requests (e.g. "open if more than 50% of the last 20 requests failed") rather than a simple consecutive-failure count.

Where This Is Implemented

Tool/LibraryEcosystem
resilience4jJava (successor to Netflix Hystrix)
Polly.NET
Istio / EnvoyService mesh — circuit breaking via config, no app code changes
opossumNode.js

💡 Mental Model

Think of it exactly like the circuit breaker in your home's electrical panel: when something downstream is drawing too much current (failing too often), trip the breaker to protect the rest of the system, then carefully test if it's safe to restore power.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Circuit Breaker Pattern