In distributed systems, a failed service can bring down the entire platform. The Circuit Breaker pattern is inspired by electrical circuit breakers: it cuts the circuit when something is failing, preventing the problem from spreading.
The three states
🟢 Closed
All calls pass through normally. The circuit monitors for failures.
🔴 Open
Calls are blocked immediately. The service returns error or fallback without attempting the call.
🟡 Half-Open
It allows a few test calls. If they succeed, it closes. If they fail, it opens again.
Why implement it?
- Prevents cascading failures: a slow service doesn't drag down the rest.
- Frees resources: threads don't stay blocked waiting for timeouts.
- Faster recovery: the failed service can recover without being overwhelmed.
- Better user experience: quick response with fallback instead of hanging.
Key configurations
- Failure threshold: how many errors before opening (e.g., 50% in 10 calls).
- Wait time: how long it stays open before trying again (e.g., 30 seconds).
- Test calls: how many calls to allow in half-open state.
- Fallback: what to return when the circuit is open.
Popular implementations
There are robust implementations in most languages: Resilience4j for Java, Polly for .NET, Hystrix (in maintenance) for Java, and patterns built into service meshes like Istio and Linkerd.
A Circuit Breaker is useless without observability. You need to see when circuits open, how long they stay open, and how often they trip.