Transactional outbox in Node (DB write + event)

The moment you split ‘write to DB’ and ‘publish to a queue’ into two independent operations, you create a place to lose data. Publish first and a DB failure means consumers act on something that never happened. Write first and a publish failure means

Cursor-based pagination with stable ordering

Offset pagination falls apart as soon as rows are inserted or deleted between page fetches—users see duplicates or missing items. Cursor pagination fixes that with stable ordering and ‘seek’ queries. I use a compound cursor that includes both the prim

Rate limiting by IP + user (Express)

A single abusive client can ruin your latency budget for everyone else, so I rate limit early rather than trying to ‘detect abuse’ after the outage starts. I combine an IP bucket with a user bucket: IP protects unauthenticated endpoints, user protects

Redis cache-aside for expensive reads

Most ‘caching’ bugs are really invalidation bugs, so I stick to a simple cache-aside pattern with conservative TTLs and treat cache misses as normal. The big failure mode to avoid is a stampede: if many requests miss at once, you can crush your DB. Fo

Webhook signature verification (timing-safe compare)

For webhooks, I assume the internet is hostile by default. I don’t trust that a request ‘looks like’ it came from Stripe/GitHub/etc; I verify the signature over the raw request body and use crypto.timingSafeEqual to avoid leaking information via timin

Idempotency keys for “create” endpoints

Retries are inevitable: mobile clients, flaky networks, and load balancers will resend POST requests. Without idempotency you end up double-charging or double-creating records. I store an Idempotency-Key with a sha256 hash of the request body and the

Postgres connection pooling with pg + max lifetime

After getting burned by long-lived connections that slowly accumulate bad state (or get killed by the network) and then explode during peak traffic, I got strict about pg pooling. I keep the pool size small per instance and scale horizontally instead

Graceful shutdown for Node HTTP servers

Deploys got a lot calmer once I treated SIGTERM as a first-class signal instead of an afterthought. In Kubernetes (and most PaaS platforms), you’re expected to stop accepting new requests quickly while finishing in-flight work. The worst failure mode

Runtime validation for request bodies (Zod)

TypeScript only protects you at compile time; your API still receives untrusted JSON from the internet. I lean on Zod as the source of truth for parsing + validation so runtime and types stay aligned. The big win is that I don’t try to validate ‘every

Request ID + structured logging (Express + pino)

Debugging distributed requests is miserable without a stable request id, so I generate it once at the edge, echo it back via x-request-id, and attach it to every log line with a child logger. I keep this intentionally boring because it delivers 80% of