reliability

Exponential backoff with jitter for retries

Immediate retries are a great way to stampede a struggling dependency. I use exponential backoff with jitter so retries spread out naturally and don’t synchronize across instances, and I cap the maximum delay so worst-case latency doesn’t become unbou

React Error Boundary + error reporting hook

Frontend errors are inevitable, so I’d rather control the blast radius. I wrap unstable sections with an ErrorBoundary that shows a friendly fallback and captures context for reporting. I also avoid spamming the error tracker by including a fingerprin

Background Job Dead Letter Queue (DLQ) Table

Retries can become infinite noise. For certain failure classes, record the payload + error in a DLQ table and alert/triage. This keeps queues healthy and makes failures actionable.

Cron scheduling with node-cron (with guard)

Periodic jobs are common (cleanup, digests, rollups), and it’s easy to accidentally run them on every instance in production. In local dev, in-process node-cron is fine. In production, I either run a dedicated worker or add a guard so only one instanc

Health Check Endpoint with Dependency Probes

A real health check tests the dependencies you care about (DB, Redis). Keep it fast and don’t make it do expensive queries. Use it for load balancers and alerts.

Safer Feature Flagging: Cache + DB Fallback

A robust feature flag read path should be fast, but also resilient to cache outages. Cache the computed result briefly and fall back to DB if needed; keep the interface dead simple.

Transactional outbox in Node (DB write + event)

The moment you split ‘write to DB’ and ‘publish to a queue’ into two independent operations, you create a place to lose data. Publish first and a DB failure means consumers act on something that never happened. Write first and a publish failure means

Idempotent Job with Advisory Lock

I reached for idempotency the moment retries started duplicating side effects. In Advisory lock helper, I generate a deterministic lock ID and use pg_try_advisory_lock to ensure only one worker owns the critical section; the ensure block always calls

Safer Background Reindex: slice batches + checkpoints

Full reindexes can be long and fragile. Add checkpoints (last processed id), process in batches, and make it resumable. That turns a scary operation into a routine one.

“Read Your Writes” Consistency: Pin to Primary After POST

With replicas, a user can create a record then immediately not see it due to replication lag. A simple mitigation is to pin subsequent reads to primary for a short window after writes.

Deadlock-Aware Retry Wrapper

Deadlocks happen under load. When the operation is safe to retry, rescue the deadlock exception and retry with jitter. Keep retries bounded and log when it happens.

Database-Backed “Run Once” Migrations for Maintenance Tasks

Sometimes you need a one-time maintenance operation outside normal schema changes. Use a small table to track “run once” tasks so reruns are safe and the operation is visible.