When a dependency starts timing out, naive retries can amplify the outage by piling on more work. A circuit breaker gives the system a chance to breathe: after enough failures, it opens and returns a fast error, then it half-opens to probe recovery. I keep the implementation simple and local to the integration layer, and I log breaker state changes because that’s the first breadcrumb you want when the incident starts. This doesn’t replace good SLAs, but it makes your service more resilient. The key is picking sane timeouts and thresholds based on real traffic patterns, not guesswork.