What is tsvector and tsquery?

tsvector and tsquery, building a GIN-backed FTS index, the query-builder functions, ranking, language configurations, and when to reach for a dedicated engine instead.

Cracked Java

tsvector is the document side of full-text search and tsquery is the query side; the @@ operator matches one against the other. Both are real PostgreSQL data types, not just function return values, and understanding what they store is the foundation for everything else.

tsvector — the preprocessed document

A tsvector is a sorted list of distinct lexemes, each with the integer positions at which it occurs. Building one with to_tsvector runs the text through a configuration that lowercases, strips stop words, and stems each token to its root lexeme.

SELECT to_tsvector('english', 'A fox was running and runs fast');
-- 'fast':7 'fox':2 'run':4,6

Notice what happened: a, was, and are stop words and vanished; running and runs both stemmed to run and merged, keeping both positions. The vector is lexeme-sorted (not original order), and positions enable phrase search and cover-density ranking.

tsquery — the parsed search expression

A tsquery holds lexemes combined with boolean operators: & (AND), | (OR), ! (NOT), and <-> (followed-by, for phrases). to_tsquery parses and normalizes the input with the same configuration so its lexemes line up with the document's.

SELECT to_tsquery('english', 'running & !cat');   -- 'run' & !'cat'
SELECT to_tsquery('english', 'quick <-> brown');  -- 'quick' <-> 'brown' (adjacent)

@@ — the match operator

SELECT to_tsvector('english', 'The fox runs fast')
     @@ to_tsquery('english', 'run & fox');   -- true

Matching is on lexemes, which is exactly why runs in the document matches running in the query — both normalize to run. This is the whole point of FTS over LIKE.