Health checks, sticky sessions, and what happens when the LB itself fails
A load balancer's value depends on three operational facts: it must only route to healthy backends, the cost of pinning users to a backend, and the reality that the LB is itself a component that can die.
Health checks — active vs passive
The LB needs a live view of which backends can serve traffic.
- Active health checks — the LB proactively probes each backend on a schedule (e.g.
GET /healthevery few seconds). After N consecutive failures it marks the backend unhealthy and stops routing to it; after M consecutive successes it returns it. Pros: detects a sick backend even with no traffic, and a backend stays out until it recovers. Cons: adds probe load, and a shallow probe (just "process is up") can miss real problems. - Passive health checks (outlier detection) — the LB observes real traffic and ejects a backend that starts returning errors or timing out (e.g. too many
5xx/connection failures in a window). Pros: zero extra traffic, reacts to genuine user-facing failures instantly. Cons: needs live traffic to notice, and the first few unlucky users hit the failing backend before it's ejected.
Production systems use both: active probes for baseline liveness, passive detection to catch real failures fast. A good health endpoint is a deep check (DB reachable, dependencies OK), but beware making it too deep — a shared-dependency blip can mark the whole fleet unhealthy at once.
Sticky sessions — and their cost
A sticky session pins a client to a specific backend (via a cookie at L7, or IP-hash at L4), usually because that backend holds in-memory session state. The costs:
- Uneven load — the LB can no longer freely balance; one backend can get hot while others idle.
- Painful failover — if the pinned backend dies, the session (and its in-memory state) is lost; the user is bounced to a fresh backend with no context.
- Hard to scale/deploy — you can't drain and replace a backend cleanly without disrupting its stuck sessions.
When the load balancer itself fails
A single LB is a single point of failure — if it dies, the entire fleet behind it is unreachable no matter how healthy the backends are. Mitigations:
- Redundant LBs. Run at least two in active-passive (a standby takes over) or active-active (both serve, sharing load). A virtual/floating IP (VRRP/keepalived) moves to the surviving LB on failover, so clients keep using the same address.
- DNS / anycast in front. DNS can hand out multiple LB addresses (with health-checked failover), and anycast advertises one IP from many locations so traffic reroutes to a live LB automatically — also the basis of multi-region resilience.
- Cloud-managed LBs (ALB/NLB, GCLB) hide this — they're already horizontally scaled and redundant across availability zones, which is why "use the cloud LB" is a perfectly senior answer.