The Three States
🟢 Closed
Normal operation. Requests pass through. Failures are counted against a threshold.
🔴 Open
Threshold exceeded. All requests fail immediately, no network call attempted, for a cooldown window.
🟡 Half-Open
Cooldown ended. A few test requests are let through to check if the dependency has recovered.
If the test requests in Half-Open succeed, the circuit closes and traffic resumes normally. If they fail, it reopens and the cooldown timer restarts.
Why This Matters: Stopping Cascading Failures
Imagine Service A calls Service B, and Service B starts timing out under load. Without a circuit breaker, every request from A to B waits the full timeout (say, 30 seconds) before failing. If A is handling thousands of requests per second, threads/connections pile up waiting on B — and now A itself becomes slow or unresponsive, even though A's own code is fine. This is a cascading failure: one struggling service drags down everything that depends on it.
Circuit Breaker vs Retry
| Aspect | Retry | Circuit Breaker |
|---|---|---|
| Assumes the failure is | Transient — likely to succeed on the next try | Sustained — the dependency is genuinely down |
| Effect on the failing service | Adds more load (more attempts) | Reduces load (stops attempts) |
| Best used for | Brief network blips, momentary slowness | Outages, dependency overload, deployment issues |
| Common pairing | Retry with exponential backoff, then circuit breaker if retries keep failing | Often wraps retry logic as the outer safety net |
These two patterns are complementary, not competing — see Retry Pattern with Exponential Backoff for how retries are usually configured before a circuit breaker takes over.
What to Do When the Circuit Is Open
Failing fast is only half the story — the other half is what you do with that failure. Common strategies:
- Return a cached response — slightly stale data is often better than no data.
- Return a default/fallback value — e.g. show "Recommendations unavailable" instead of crashing the whole page.
- Queue the request — for non-urgent writes, queue and retry later instead of failing the user-facing request.
- Propagate a fast, clear error — so upstream callers can also fail fast instead of waiting.
⚠️ Tuning the Threshold Matters
Trip too aggressively (low failure threshold, short window) and you'll open the circuit on brief, harmless blips, rejecting traffic unnecessarily. Trip too conservatively and you delay protection during a real outage. Most implementations use a rolling window of recent requests (e.g. "open if more than 50% of the last 20 requests failed") rather than a simple consecutive-failure count.
Where This Is Implemented
| Tool/Library | Ecosystem |
|---|---|
| resilience4j | Java (successor to Netflix Hystrix) |
| Polly | .NET |
| Istio / Envoy | Service mesh — circuit breaking via config, no app code changes |
| opossum | Node.js |
💡 Mental Model
Think of it exactly like the circuit breaker in your home's electrical panel: when something downstream is drawing too much current (failing too often), trip the breaker to protect the rest of the system, then carefully test if it's safe to restore power.
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — Circuit Breaker Pattern
The circuit breaker pattern stops a service from repeatedly calling another service that is already failing, instead of letting requests pile up and time out one by one. After a threshold of failures, the circuit "opens" and immediately rejects calls without even attempting them — giving the failing dependency time to recover and protecting the calling service from being dragged down too.
Closed: requests flow through normally, and failures are counted. Open: after too many failures, the circuit trips — requests fail immediately without even being attempted, for a configured cooldown period. Half-Open: after the cooldown, a limited number of test requests are allowed through; if they succeed, the circuit closes again, if they fail, it reopens.
Retry handles transient, short-lived failures by trying the same request again, usually with backoff — appropriate when the failure is likely momentary. A circuit breaker handles sustained failures by stopping requests altogether once a failure threshold is crossed — appropriate when the dependency is clearly down and continuing to send (and retry) requests would only add load to an already struggling service. The two are often combined: retry for brief blips, circuit breaker for sustained outages.
It is a direct analogy to electrical circuit breakers in a home's fuse box — when current draw exceeds a safe threshold, the breaker trips and cuts the circuit to prevent a fire, rather than letting the dangerous condition continue. Software circuit breakers do the same thing for failing service calls: trip and cut off traffic before the failure spreads further.
They fail fast — typically returning an error or a fallback response immediately, without waiting for a network timeout. This is one of the biggest benefits: instead of every caller waiting 30 seconds for a timeout on a dead dependency (consuming threads, connections, and time), each call fails in milliseconds, freeing up resources and giving users (or upstream services) a quicker, more predictable failure.
Netflix's Hystrix popularised the pattern for Java microservices (now in maintenance mode, succeeded by resilience4j). Polly is the standard choice in .NET. Most service meshes (Istio, Envoy) and API gateways also offer circuit breaking as a built-in, configuration-driven feature without needing application code changes.