reliability

Idempotent event consumer with processed-events table

At-least-once delivery is the default for most queues and streams, so consumers must be idempotent. My go-to pattern is a processed_events table keyed by event_id with a unique constraint. When a message arrives, the consumer tries to insert event_id;

Graceful gRPC shutdown with fallback to Stop

For gRPC services, GracefulStop is ideal because it allows in-flight RPCs to finish, but it can hang if handlers ignore cancellation or if clients never close streams. I wrap shutdown in a deadline: call GracefulStop in a goroutine, and if it doesn’t

Avoid bufio.Scanner token limits by switching to Reader

bufio.Scanner is great until it isn’t: by default it refuses tokens larger than 64K, which makes it a bad fit for long log lines or large JSON records. The failure mode is subtle—scan stops and Err() returns a token-too-long error. For production log

Deadlock-Aware Retry Wrapper

Deadlocks happen under load. When the operation is safe to retry, rescue the deadlock exception and retry with jitter. Keep retries bounded and log when it happens.

Request deduplication with idempotency keys

Network failures and client retries can cause duplicate request processing, leading to duplicate charges, double-created resources, or inconsistent state. Idempotency keys solve this by tracking processed requests and returning cached responses for du

Database-Backed “Run Once” Migrations for Maintenance Tasks

Sometimes you need a one-time maintenance operation outside normal schema changes. Use a small table to track “run once” tasks so reruns are safe and the operation is visible.

Transactional Email “Send Once” with Delivered Marker

Emails should be idempotent. Store a delivered marker (or unique key) so retries don’t spam users. This pattern is especially useful for receipts and password reset flows.

HTTP Timeouts + Retries Wrapper (Faraday)

I wrapped external HTTP calls once I realized most “flaky APIs” were actually my fault: no timeouts, unclear retries, and logs that didn’t tell a story. In Client with timeouts, I centralize a Faraday connection with explicit open_timeout and timeout,

Graceful Degradation: Feature-Based Rescue

Not every failure should be a 500. If a non-critical dependency fails (e.g., recommendations), rescue narrowly, emit a metric/log, and serve a baseline response.

Circuit breaker around flaky dependencies

Retries alone can make an outage worse: if a dependency is hard failing, retries just add load. A circuit breaker adds a simple state machine: closed (normal), open (fail fast), and half-open (probe). I like gobreaker because it’s small and predictabl

Context-aware cache refresh with atomic.Pointer (Go 1.19+)

I often need a fast read path for small datasets (like a list of active plans or an allowlist) that updates periodically. Instead of locking on every read, I store a pointer to an immutable snapshot in atomic.Pointer. Reads are lock-free and safe; ref

Safer Background Job Arguments (Serialize IDs only)

Jobs should accept simple primitives (IDs, strings), not full objects. It avoids serialization surprises and makes jobs resilient across deploys. This also reduces job payload size.