What is a GIN index used for? (Full-text search, JSONB, a… — Cracked Java
MidTheory

What is a GIN index used for? (Full-text search, JSONB, arrays.)

GIN — Generalized Inverted Index — is the index for "does this value contain that?" queries over composite values: jsonb, arrays, and full-text. Where a B-tree maps one key to one row, GIN maps each element inside a value (each array member, each jsonb key/value, each lexeme) to the list of rows containing it — an inverted index, exactly like a search engine.

What it indexes

A single column value is decomposed into many keys, and each key points to the rows that hold it. So a query for "rows containing element X" becomes one lookup, not a scan.

-- arrays: "tag contains 'sql'?"
CREATE INDEX ON posts USING gin (tags);
SELECT * FROM posts WHERE tags @> ARRAY['sql'];

-- jsonb: containment and key/path existence
CREATE INDEX ON events USING gin (payload jsonb_path_ops);
SELECT * FROM events WHERE payload @> '{"type":"signup"}';

-- full-text search
CREATE INDEX ON docs USING gin (to_tsvector('english', body));
SELECT * FROM docs
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'postgres & index');

The operators it accelerates

GIN backs the containment / membership operators, not equality on the whole value:

  • arrays: @> (contains), <@ (contained by), && (overlaps).
  • jsonb: @>, ?, ?|, ?& (key existence), and @@/@? jsonpath.
  • full-text: @@ matching tsvector against tsquery.

The trade-off

GIN is built for read-heavy containment workloads. Its weakness is write cost: inserting one row can touch many index keys.

  • Mitigated by the fastupdate mechanism, which buffers new entries in a pending list and flushes them in bulk (during VACUUM or when the list fills).
  • This makes individual writes cheap but means a query may have to scan that pending list too.

If you need nearest-neighbor or range semantics rather than containment, that's GiST territory, not GIN.

Mark your status