Problem. What is Timsort, and why is it fast on real-world data?
The idea
Timsort (Tim Peters, 2002, originally for Python; adopted by Java for Arrays.sort(Object[]) and Collections.sort) is an adaptive, stable MergeSort tuned for the data programs actually sort: mostly-ordered arrays, concatenations of sorted blocks, and short inputs. Its worst case is O(n log n), but on partially sorted data it approaches O(n).
How it works
- Find natural runs. Scan the array for maximal already-sorted segments (ascending, or strictly descending — which it reverses in place). Real data is full of these.
- Extend short runs with binary insertion sort. If a run is shorter than a computed
minrun(32–64), it's extended by binary-insertion-sorting the next few elements. Insertion sort is near-optimal on tiny, nearly-sorted arrays. - Merge runs by a stack discipline. Run lengths are pushed on a stack that maintains invariants (each run roughly the sum of the two above it) so merges stay balanced, keeping the overall cost O(n log n).
- Galloping mode. When merging, if one run keeps "winning," Timsort switches to exponential (binary) search to jump ahead, skipping element-by-element comparison. This is what makes merging structured data dramatically faster.
Why real-world-fast
Theoretical sorts assume random input; real input is rarely random. Timsort is adaptive — it detects existing order and does proportionally less work, hitting O(n) on already-sorted input. It's stable, satisfying the object-sort contract. And minrun + binary insertion sort keep small subproblems cache-friendly.