Surrogate keys (UUID/bigserial) vs natural keys — pros an… — Cracked Java
MidTheory

Surrogate keys (UUID/bigserial) vs natural keys — pros and cons.

A surrogate key is a system-generated, meaningless identifier (bigserial, uuid); a natural key is a column that already identifies the row in the real world (email, ISBN, country code). The debate is about stability and coupling: natural keys carry meaning but meaning changes, and a primary key that changes is expensive because every foreign key references it.

Natural keys — pros and cons

Pros: no extra column, the key is human-meaningful, and joins/lookups read naturally (WHERE country_code = 'US'). They also enforce a real-world uniqueness rule for free.

Cons, and why they usually win the argument against:

  • They change. Emails change, ISBNs get reassigned, company names rebrand. Updating a PK cascades to every referencing table — slow, lock-heavy, and risky.
  • They can be wide. A multi-column natural key ((country, region, city)) bloats every FK and every index that copies it.
  • They leak. Using SSN or email as a PK scatters PII across foreign keys and logs.
  • You can't always trust uniqueness you didn't generate ("surely no two people share a phone number" — until they do).

Surrogate keys — pros and cons

Pros: stable forever (never updated, so no cascading PK changes), narrow and uniform (bigint is 8 bytes, fast to join and index), and decoupled from business meaning so a real-world rule change never touches the key.

Cons: an extra column with no inherent meaning; you still need a natural UNIQUE constraint to prevent duplicate real-world rows (a surrogate key happily lets you insert the same customer twice); and an opaque id tells you nothing when you eyeball the data.

The standard answer: surrogate PK + natural UNIQUE

The two are not mutually exclusive — the mature design uses both:

CREATE TABLE users (
  id    bigserial PRIMARY KEY,        -- surrogate: stable, narrow, what FKs point at
  email text NOT NULL UNIQUE,         -- natural key: enforces real-world uniqueness
  name  text NOT NULL
);

The surrogate is what foreign keys reference (stable, cheap to join); the natural UNIQUE constraint still guarantees you can't have two users with the same email. You get integrity and stability.

bigserial vs UUID for the surrogate

bigserial (an 8-byte sequence-backed integer) is compact and index-friendly but predictable and centrally allocated. UUIDs are 16 bytes, globally unique, and can be generated client-side without a round-trip — valuable for distributed systems, sharding, and not exposing row counts. The catch is index locality: random UUID v4 inserts fragment a B-tree, which is exactly why UUID v7 exists (covered in the next question).

Mark your status