1. Functional requirements
- Upload a photo with a caption (
POST /media), store it durably, generate thumbnails. - View a home feed of posts from accounts you follow (
GET /feed). - Like and comment on posts; see counts.
- Stories — ephemeral media that expires after 24h.
- Hashtag search — find recent posts by tag.
2. Non-functional requirements
- Scale: 500M DAU; ~100M new photos/day; read-heavy (~100:1 read/write on feed).
- Latency: feed and image load p99 < 200 ms (CDN-served).
- Availability: 99.9%+; reads must survive write-path degradation.
- Durability: an uploaded photo must never be lost (11 9s object storage).
- Consistency: feed and counts may be eventually consistent (seconds).
3. Capacity estimation
- Writes: 100M photos/day ≈ ~1,160 uploads/s (×~3 peak ≈ 3.5K/s).
- Feed reads: 500M DAU × ~20 feed opens/day ≈ 10B/day ≈ ~115K reads/s (×3 peak ≈ 350K/s) → cache aggressively.
- Storage (originals): 100M/day × ~1.5 MB ≈ ~150 TB/day raw; with 3–4 thumbnail renditions add ~30%. Over 5 yr ≈ ~250 PB → object storage, not a DB.
- Metadata: 100M rows/day × ~300 B ≈ ~30 GB/day ≈ ~55 TB / 5 yr → sharded.
- Feed cache: 500M users × ~500 post IDs × ~16 B ≈ ~4 TB in Redis (sharded).
4. High-level architecture
5. API design
POST /api/v1/media # multipart or pre-signed S3 upload
Body: { "caption": "...", "objectKey": "u123/abc.jpg", "tags": ["sunset"] }
201: { "postId": "p_88f3", "renditions": ["thumb","med","orig"] }
GET /api/v1/feed?cursor=<ts_postId>&limit=30
200: { "items": [ { "postId":"p_88f3", "author":"u9", "url":"https://cdn/...", "likes":1240 } ], "next": "..." }
POST /api/v1/media/{postId}/likes # idempotent per (user,post)
POST /api/v1/media/{postId}/comments { "text": "..." }
POST /api/v1/stories { "objectKey": "...", "ttl": 86400 }
GET /api/v1/search/tags/{tag}?cursor=...
Large uploads use a pre-signed S3 URL so bytes go client→S3 directly, never through your app servers.
6. Data model
CREATE TABLE post (
post_id BIGINT PRIMARY KEY, -- snowflake (time-sortable)
author_id BIGINT NOT NULL,
object_key TEXT NOT NULL, -- S3 key for the original
caption TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Shard by author_id.
CREATE TABLE follow (
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
PRIMARY KEY (follower_id, followee_id)
);
CREATE TABLE story (
story_id BIGINT PRIMARY KEY,
author_id BIGINT NOT NULL,
object_key TEXT NOT NULL,
expires_at TIMESTAMPTZ NOT NULL -- TTL index, 24h
);
-- Likes/comments counts live in a counter store (Redis/Cassandra), not as
-- synchronous UPDATEs on post. Hashtag postings live in a search index.
7. Detailed component design
- Upload + encoding pipeline. Client uploads the original to S3 via a pre-signed URL; S3 emits an event to Kafka; encode workers generate thumbnail/medium renditions and write them back to S3. The CDN pulls from S3 on first miss. Nothing blocks the user request on encoding.
- Feed generation (hybrid fan-out). On a normal user's post, the fan-out service pushes the
postIdinto each follower's Redis sorted set (keyfeed:{userId}, score = timestamp) — fan-out-on-write. For celebrities (followers above a threshold) we skip push and instead pull their recent posts at read time and merge them with the user's precomputed feed. This caps write amplification while keeping reads fast. - Counts. Likes/views are sharded counters updated asynchronously (Kafka → aggregator), surfaced as eventually-consistent, sometimes approximate numbers — far cheaper than row locks at 350K/s.
- Hashtag search. Posts are indexed into Elasticsearch (or an inverted index) on tag; recency-ranked, served outside the hot feed path.
- Stories. Same blob pipeline; rows carry a
expires_atTTL so they auto-vanish after 24h.
8. Scaling considerations
- CDN is the main read lever — images and even feed thumbnails are served from edge, offloading ~95%+ of byte traffic.
- Feed cache — Redis sorted sets, sharded by user; jittered TTLs and a capped feed length (~500–1000 IDs) bound memory.
- Metadata DB — sharded by
author_id; reads of a post by ID are point lookups. - Async everything — encoding, fan-out, counts, and indexing are all off the request path via Kafka.
- Hot keys — celebrity posts bypass push to avoid fan-out storms; their content is heavily CDN-cached.
9. Trade-offs and alternatives
- Fan-out-on-write vs on-read. Push gives O(1) reads but huge write amplification for celebrities; pull is cheap to write but slow to read. Hybrid is the accepted answer — state the follower threshold as the tuning knob.
- Approximate vs exact counts. Exact like counts need synchronous coordination and don't scale; approximate eventually-consistent counters are the standard trade-off.
- Pre-signed upload vs proxy upload. Pre-signed keeps bytes off your servers (cheaper, faster) but ties you to the object store's auth model.
- Postgres+Redis+S3 (EU/regional) vs Cassandra/DynamoDB + bespoke feed infra (FAANG) — ops simplicity vs raw scale.
10. Common follow-up questions
- A celebrity with 100M followers posts — how do you avoid a fan-out storm? (Pull-on-read for celebrities.)
- How do you rank the feed (chronological vs ML ranking)? Ranking layer reads candidate IDs from cache, scores them.
- How do stories expire cheaply? TTL index + lazy filtering on read.
- Image optimization — serve WebP/AVIF and device-appropriate renditions from the CDN.
- Preventing abuse / rate limiting uploads (see the rate-limiting topic).