The Circuit Breaker Pattern
One-Liner
A design pattern used to detect failures and prevent a failing service from being constantly overwhelmed with requests.
What It Is
A proxy for operations that are liable to fail, such as remote calls. It monitors for failures and, after a certain threshold is reached, “opens” the circuit, causing subsequent calls to fail immediately without trying to execute the operation. After a timeout, it allows a limited number of test calls to pass through, and if they succeed, it closes the circuit again.
Why It Exists
To prevent a single failing service from causing a cascade of failures throughout the system. It allows a failing service time to recover by not overwhelming it with requests.
How It Works
- Closed: Requests are passed through to the downstream service. The circuit breaker monitors for failures.
- Open: After a certain number of failures, the circuit opens. All subsequent requests fail immediately with an error, without being sent to the downstream service.
- Half-Open: After a timeout, the circuit breaker enters a half-open state. It allows a single (or a few) requests to pass through. If the request succeeds, the circuit closes. If it fails, the circuit remains open.
Tradeoffs
Pros
- Prevents cascading failures.
- Allows failing services to recover.
- Improves application resilience.
Cons
- Adds complexity to the application.
- Requires careful tuning of thresholds and timeouts.
Failure Modes
- Incorrectly tuned thresholds: The circuit opens too quickly or not quickly enough.
- “Stuck” open circuit: A bug in the circuit breaker logic causes it to remain open even after the downstream service has recovered.
Interview Traps
- Only describing the “open” and “closed” states, without mentioning the “half-open” state, which is critical for recovery.
Real-World Usage
- Used extensively in microservices architectures to improve resilience. Many service mesh frameworks (like Istio) provide circuit breaking as a standard feature.
Anti-Patterns
- Using a circuit breaker for every single remote call, which can add unnecessary complexity.
- Not providing a reasonable fallback when the circuit is open (e.g., returning cached data).
Related Concepts
- Cascading Failures
- Retries with Exponential Backoff
- Bulkhead Pattern