Both heuristics decide the same thing: when merging two sets, which root becomes the parent of the other. The goal is identical — keep trees shallow so find stays fast — they just measure "bigger" differently.
Union by size
Track the number of elements in each tree; attach the smaller set under the larger. Intuition: fewer nodes get their depth increased.
boolean union(int a, int b) {
int ra = find(a), rb = find(b);
if (ra == rb) return false;
if (size[ra] < size[rb]) { int t = ra; ra = rb; rb = t; }
parent[rb] = ra;
size[ra] += size[rb]; // maintain the count
return true;
}
Union by rank
Track rank — an upper bound on tree height, not an exact count; attach the lower-rank tree under the higher-rank one. Rank increments by 1 only when two equal-rank trees merge.
boolean union(int a, int b) {
int ra = find(a), rb = find(b);
if (ra == rb) return false;
if (rank[ra] < rank[rb]) { int t = ra; ra = rb; rb = t; }
parent[rb] = ra;
if (rank[ra] == rank[rb]) rank[ra]++; // tie -> height grew by 1
return true;
}
How they differ
- Size stays exact and is independently useful — you often need "how big is this component?" anyway, so it comes for free.
- Rank is only an upper bound on height once path compression starts flattening trees: compression lowers actual height but never decreases stored rank. That is fine — rank only needs to bound the merge decision, not stay precise.
Both deliver the same O(log n) standalone tree height, and both combine with path compression to reach the O(α(n)) amortized bound. In interviews, either is fully acceptable.
| Operation | Best | Average | Worst | Note |
|---|---|---|---|---|
| Union (by rank or size) | O(α(n)) | O(α(n)) | O(α(n)) | amortized, with path compression |
| Find | O(α(n)) | O(α(n)) | O(α(n)) | amortized, with path compression |
| Tree height (heuristic alone) | — | — | O(log n) | no compression |