Design Netflix / YouTube — full system-design solution.

Video encoding pipeline, adaptive bitrate streaming (HLS/DASH), CDN delivery, recommendations, and view counting at scale.

Cracked Java

1. Functional requirements

Upload a video master (POST /videos), encode it into multiple renditions.
Stream a video with adaptive bitrate (HLS/DASH) across variable networks.
Browse a catalog and get recommendations.
Count views per video.
Resume playback from last position.

2. Non-functional requirements

Scale: 1B users; ~500K hours uploaded/day; massively read-heavy playback.
Latency: start-up (time-to-first-frame) < 2 s; rebuffering rate < 1%.
Availability: 99.99% on the playback path (CDN-served).
Durability: uploaded masters and encoded renditions must not be lost (object storage).
Consistency: catalog/view counts eventually consistent.

3. Capacity estimation

Upload: 500K hours/day ÷ 86,400 s ≈ ~21 hours of video ingested per second; with parallel chunked transcoding this fans out to thousands of concurrent encode jobs.
Playback: 1B users × ~1 hour/day ≈ 1B watch-hours/day. At ~5 Mbps average ≈ 1B × 3600 × 5 Mb ≈ ~18 exabits/day ≈ ~2.25 PB/day of egress → almost all from CDN.
Storage: 1 master × ~6 renditions; 500K hr/day × ~6 renditions × ~1 GB/hr ≈ ~3 PB/day raw → object storage with lifecycle tiering.
View events: 1B+ play events/day ≈ ~12K events/s (×3 peak) → stream-aggregated, approximate.

4. High-level architecture

Video streaming — upload triggers an async transcoding pipeline; playback is served from the CDN via HLS/DASH manifests

5. API design

POST /api/v1/videos                       # returns a pre-signed upload target
  Body: { "title": "...", "visibility": "public" }
  201:  { "videoId": "v_77a2", "uploadUrl": "https://s3/..." }

GET  /api/v1/videos/{videoId}/manifest    # ABR manifest (HLS .m3u8 / DASH .mpd)
  200:  Content-Type: application/vnd.apple.mpegurl
        # lists renditions 240p..2160p with segment URLs (CDN-hosted)

GET  /api/v1/videos/{videoId}             # metadata + encode status
POST /api/v1/videos/{videoId}/views       { "positionSec": 120 }   # fire-and-forget
GET  /api/v1/recommendations?userId=...

Segments themselves are served directly by the CDN, not by these APIs. The manifest just points the player at CDN URLs; ABR logic lives in the player.

6. Data model

CREATE TABLE video (
  video_id    BIGINT PRIMARY KEY,        -- snowflake
  uploader_id BIGINT NOT NULL,
  title       TEXT NOT NULL,
  status      TEXT NOT NULL,             -- UPLOADED|TRANSCODING|READY|FAILED
  master_key  TEXT NOT NULL,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE rendition (
  video_id    BIGINT NOT NULL,
  quality     TEXT NOT NULL,             -- 240p|480p|720p|1080p|2160p
  codec       TEXT NOT NULL,             -- h264|av1
  manifest_key TEXT NOT NULL,            -- key of the HLS/DASH playlist
  PRIMARY KEY (video_id, quality, codec)
);
-- View counts and watch progress live in a fast store (Cassandra/Redis),
-- aggregated from the event stream, not as synchronous UPDATEs.

7. Detailed component design

Encoding pipeline. On upload, the master lands in object storage and emits a Kafka event. A splitter breaks it into chunks (e.g. GOP-aligned segments); a fleet of transcode workers process chunks in parallel into the full rendition ladder (240p→2160p, multiple codecs). A packager then segments each rendition (~2–6 s segments) and writes HLS/DASH manifests. Chunked parallel transcoding is what turns a 4-hour 4K encode from hours into minutes.
Adaptive bitrate (ABR). The manifest exposes every rendition. The player measures throughput and buffer health and requests the next segment at the appropriate bitrate, switching mid-stream without re-buffering. The server is dumb here — intelligence is in the player + manifest.
CDN delivery. Segments are pushed/pulled to a multi-tier CDN with origin shielding; popular titles are pre-warmed. Playback never touches application servers — only the small manifest/metadata calls do.
View counting. Play events go to Kafka and are aggregated in a stream processor into approximate, eventually-consistent counters — synchronous DB increments cannot survive billions/day.
Recommendations. Offline-trained models produce candidate lists served from a low-latency store; ranking happens at request time.

8. Scaling considerations

CDN is the system — ~99% of bytes never reach origin; tiering + origin shield protect storage.
Transcode fleet autoscales off Kafka queue depth; chunking gives embarrassingly parallel encode.
Storage tiering — hot renditions on fast storage, cold/long-tail titles on cheaper tiers; drop unused codecs lazily.
Async pipeline — encoding, packaging, view counting, indexing all off the request path.
Pre-warming — predicted-popular releases are pushed to edge before launch.

9. Trade-offs and alternatives

HLS vs DASH. HLS has the broadest device support (Apple); DASH is codec-agnostic and open. Most systems ship both via the same segments + two manifests.
Build vs managed encoding. Bespoke transcode fleet (FAANG, cost-optimal at scale) vs MediaConvert/managed (EU/regional, fast to ship).
Rendition ladder depth. More renditions = smoother ABR but more storage and encode cost; tune to audience devices.
Exact vs approximate views. Exact counts don't scale; approximate stream-aggregated counts are standard.
Codec choice (H.264 vs AV1). AV1 saves ~30% bandwidth but costs far more to encode — trade egress cost vs compute.

10. Common follow-up questions

How do you transcode a 4-hour 4K master fast? (Chunk + parallel transcode, then stitch manifests.)
How does the player avoid rebuffering on a flaky network? (ABR steps down per segment using buffer + throughput.)
How do you handle a viral video / thundering herd? (CDN tiering, origin shield, pre-warm.)
Live streaming vs VOD — lower-latency packaging (LL-HLS), shorter segments, no full pre-encode.
DRM / signed URLs to protect content.