Object vs block vs file storage, and tiered hot/warm/cold… — Cracked Java
// High-Level Design (HLD / Distributed Systems) · Storage Systems — Disk, RAM, Object Storage
MidSystem Design

Object vs block vs file storage, and tiered hot/warm/cold data.

Object vs block vs file storage, and tiered hot/warm/cold data

These three are the fundamental ways to expose bytes to an application, and they are not interchangeable — each has a different access model, latency profile, and cost. Layered on top is the orthogonal idea of storage tiers, matching how often data is touched to how much you pay to keep it fast.

The three storage models

MODEL    ACCESS UNIT          INTERFACE        LATENCY     SCALE / USE
------   ------------------   --------------   ---------   ----------------------------
Block    fixed-size blocks    raw device       lowest      DB volumes, VM disks (EBS)
File     files + directories  POSIX / NFS      low         shared mounts, home dirs (EFS)
Object   immutable blobs+meta HTTP API by key  higher      media, backups, data lake (S3)
Block, file, and object storage — three access models, different latency and scale
  • Block storage (AWS EBS, a raw SAN/NVMe volume) exposes raw, fixed-size blocks with no notion of files. A filesystem or a database lays its own structure on top. Lowest latency, highest IOPS — it's the substrate underneath databases and OS filesystems. Typically attached to one instance.
  • File storage (NFS, AWS EFS, a NAS) presents a POSIX hierarchy of files and directories, mountable and shared by many hosts. Convenient for shared application data and lift-and-shift workloads; the directory tree and locking semantics limit how far it scales.
  • Object storage (AWS S3, GCS, Azure Blob) stores immutable blobs with rich metadata, addressed by a flat key over an HTTP API. Effectively infinite capacity, very high durability (S3 targets 11 nines), cheap per GB, and globally accessible — but higher latency and no in-place edits (you replace the whole object) and no POSIX semantics. The default home for media, backups, logs, and data lakes.

Choosing between them

  • Need a disk for a database or VM, lowest latency → block.
  • Need a shared POSIX filesystem across hosts → file.
  • Need to store lots of large, write-once-read-many blobs cheaply and durably (images, video, backups, exports) → object.

Tiered storage: hot / warm / cold

Within object storage especially, not all data deserves the same speed or price. Match the tier to access frequency:

  • Hot — accessed frequently, needs low latency. RAM/SSD, or S3 Standard. Most expensive per GB, cheapest per access.
  • Warm — accessed occasionally. Infrequent-access tiers (S3 Standard-IA): cheaper storage, a small per-retrieval fee.
  • Cold / archival — rarely accessed, kept for compliance or "just in case." Glacier / Deep Archive / tape: very cheap storage, but retrieval takes minutes to hours and costs more per fetch.

Lifecycle policies move objects down the tiers automatically (e.g., logs → IA after 30 days → Glacier after 90 → delete after 7 years), trading retrieval latency for large, automatic cost savings.

Mark your status