1. Functional requirements
- Drivers stream live location; go online/offline.
- Rider requests a ride from A → B.
- Match rider to a nearby available driver; driver accepts/declines.
- Real-time trip tracking (both parties see live location).
- Surge pricing by area.
- Trip lifecycle to completion + payment.
2. Non-functional requirements
- Scale: 100M riders, 5M drivers (~1–2M concurrently online).
- Latency: nearby-driver query + match offer p99 < 1–2 s; location ingest must keep up in real time.
- Consistency: a driver is never assigned to two trips (strong consistency on assignment).
- Availability: 99.99%; degraded matching is better than no matching.
- Durability: trips and payments are financial records — never lost.
3. Capacity estimation
- Location updates: ~1.5M online drivers × 1 update / 4 s ≈ ~375K writes/s (the dominant write load).
- Ride requests: assume ~15M rides/day ≈ ~175 requests/s (×~5 peak ≈ 900/s) — tiny vs location writes.
- Each match fans out a proximity query over a few cells → ~tens of candidate drivers read per request.
- Location store: 1.5M drivers × ~100 B ≈ ~150 MB live state — easily fits in Redis; it's the write rate, not size, that matters.
- Trip history: 15M/day × 365 × 3 yr ≈ ~16B trips × ~1 KB ≈ ~16 TB in a durable, sharded store.
4. High-level architecture
5. API design
POST /api/v1/drivers/location
Body: { "driverId": "d_7", "lat": 41.31, "lng": 69.24, "ts": 1733000000 }
202 (fire-and-forget, high rate)
POST /api/v1/rides
Body: { "riderId": "r_9", "pickup": {...}, "dropoff": {...} }
201: { "rideId": "ride_55", "status": "MATCHING", "etaSec": 240, "fare": {...} }
POST /api/v1/rides/{id}/accept { "driverId": "d_7" } -- atomic claim
GET /api/v1/rides/{id} -- poll/stream status
POST /api/v1/rides/{id}/complete -- triggers payment
POST /rides is made idempotent with a client request key so a retried request doesn't create two trips.
6. Data model
CREATE TABLE trip (
trip_id BIGINT PRIMARY KEY,
rider_id BIGINT NOT NULL,
driver_id BIGINT,
status TEXT NOT NULL, -- MATCHING|ACCEPTED|ENROUTE|IN_PROGRESS|COMPLETED|CANCELLED
pickup_geo TEXT, dropoff_geo TEXT,
surge_mult NUMERIC(4,2),
fare_cents BIGINT,
created_at TIMESTAMPTZ NOT NULL
); -- sharded by trip_id (time-ordered snowflake)
CREATE TABLE payment (
payment_id BIGINT PRIMARY KEY,
trip_id BIGINT NOT NULL,
amount_cents BIGINT, currency TEXT,
status TEXT, -- AUTHORIZED|CAPTURED|FAILED|REFUNDED
idem_key TEXT UNIQUE -- exactly-once charge
);
-- Live driver location is NOT in SQL: it lives in the Redis geo index,
-- keyed by cell id (geohash/H3), value = set of driverId -> (lat,lng,updatedAt).
7. Detailed component design — geospatial matching
The crux of the problem.
- Indexing. Partition the surface of the earth into cells. Geohash (prefix-based rectangles) is simplest; Google S2 (quadtree on a sphere) and Uber H3 (hexagons) give more uniform neighbors and cleaner ring queries. Each online driver's location is written into the cell it falls in.
- Nearby query. For a rider in cell
X, queryXplus its neighbor cells (a k-ring) and collect candidate drivers — this is why you index by cell instead of scanning all 5M drivers. Choose cell resolution so a typical cell holds a useful handful of drivers; query more rings if a cell is sparse. Boundary effects are handled by always including neighbors. - Match + atomic claim. Rank candidates by ETA (road-network ETA, not straight-line), offer to the best driver, and on accept perform an atomic compare-and-set on the driver's status (Redis
SETNX/ a conditional DB update) so two concurrent requests can never both claim the same driver. If declined/timed out, offer the next candidate. - Trip state machine. The Trip Service drives
MATCHING → ACCEPTED → ENROUTE → IN_PROGRESS → COMPLETED, persisting each transition; onCOMPLETEDit invokes the Payment Service with an idempotency key to authorize/capture exactly once.
8. Scaling considerations
- Location ingest is the heaviest write path: fire-and-forget into Redis (overwrite latest per driver) and tee a durable copy to Kafka for history/analytics — never write each GPS ping to SQL.
- Geo index sharding by cell/region keeps a city's drivers co-located; matching queries stay local.
- Surge is computed per cell on a short interval from demand (requests) vs supply (available drivers) and cached; reads are cheap.
- Hot cities (dense cells) get finer resolution and more shards; quiet regions coarser.
- Real-time tracking to rider/driver via WebSocket/push, fed from the same location stream.
9. Trade-offs and alternatives
- Geohash vs S2 vs H3. Geohash is trivial but rectangular cells have uneven neighbor distances and prefix boundary quirks; S2/H3 give uniform, ring-friendly neighborhoods at more complexity. Pick by how much you need uniform proximity — for an interview, name the trade-off.
- Redis Geo vs PostGIS. Redis Geo (or a custom cell index) is fast and fits the in-memory hot path; PostGIS is richer and durable but too slow for 375K writes/s on the matching path. Use Redis hot, PostGIS/log for analytics.
- Strong vs eventual consistency on assignment. Driver assignment must be strongly consistent (atomic claim); location data can be eventually consistent and lossy. Don't apply one model to both.
- Nearest vs global-optimal matching — greedy nearest is simple and low-latency; batched optimization improves global efficiency but adds delay. Greedy is the interview default.
10. Common follow-up questions
- "Two riders match the same driver" → atomic compare-and-set on driver status; loser re-matches.
- "Driver moves between cells" → location update re-buckets them; index is overwrite-latest.
- "Surge spikes" → per-cell demand/supply ratio, recomputed every few seconds, applied at quote time.
- "Payment fails after the trip" → idempotent retry on the auth/capture; trip stays COMPLETED, payment retried out of band.
- "ETA accuracy" → road-network routing service, not haversine distance.