Comparing sorting algorithms isn't terribly original, but I didn't think comparing AVL trees to RB trees was either, so I thought I'd do it anyway and see what the real results were.

Quick refresher. Most computer scientists know big O notation, but we tend to forget big Omega and big Theta. Big O is the upper bound, big Omega is the analogous lower bound and big Theta is used when the two are the same. Got that? I often hear people state a "best case" as "big O of", but I want to promulgate correct usage.

In testing, quicksort was of course the fastest, mergesort next and heapsort last. Though I wrote selection and bubble as they have their own uses, I didn't even consider \Omega(N^2) algorithms for speed testing. Just as a reminder before we look at the results here's the boundaries and properties for each:

- Quick Sort
- O(N^2)
- \Omega(Nlog(N))
- In place, not stable

- Merge Sort
- \Theta(Nlog(N))
- Not in place, stable

- Heap Sort
- \Theta(Nlog(N))
- In place, not stable

But, lets talk some real numbers. I did two tests, one with 10000000 element sort run 100 times, and one with 100 element sort run 10000000 times. I'll call the first "large" sorts, and the second "small". I did a little more testing to confirm that the results are relatively stable within those intuitive categories.

- Quick Sort:
- Merge Sort
- Large test: 16% slower
- Small test: 9% slower

- Heap sort
- Large test: 122% slower
- Small test: 19% slower

Okay, so these were basically the results we all expected right? There are a couple of interesting details though.

First, because it just jumps out at you, what the heck is with heapsort? It certainly does more operations than the other two, but that wouldn't account for the difference between small and large. My guess is that as the heap spreads out basically every lookup in the array is a cache-miss, this is what bheap was attempting to improve for a normal heap algorithm, but the constant factors came out even worse.

Now, lets talk about the two algorithms who's speed don't immediatly knock them out of the running. Their are two commonly cited reasons for using Quick Sort over Merge Sort. The first is that it's in place... I did some further testing and on my machine (a modern linux distro), and with a clean heap, doing the allocation for merge sort only adds another 1% overhead for both small and large cases. Admittedly since we alloc and free the same size over and over again we're using malloc like a slab allocator, but then that's also the point... allocation speed can be worked around. The second reason is that quicksort has slightly better constant factors. Here I've shown that slightly means ~9-16%. If moves were expensive this might go up a little, but if moves are that expensive you probably shouldn't be directly sorting the data anyway.

Now consider that if you use quicksort your sort will sometimes take N^2 time. That's the sort of thing causes stutters every few seconds or minutes in a videogame, a network stack, etc. 10%-15% is below what's often considered "user noticeable" speed
difference (that line usually being drawn around 20%), but they will
almost certainly notice the stutter when it takes 100% longer one time.

Conclusion:

Following the philosophy I keep pushing, Merge Sort is probably a better default sort algorithm to use than Quick Sort. Using modern mallocs like tcmalloc allocation time becomes less relevent even with a "dirty" heap. In highly optimized applications dynamic allocation itself is often avoided (since it can cause occasional delays as well), in such cases worst-case is almost always the most critical factor, and additionally it's worth the effort to set the ram aside so being "in-place" isn't that critical.

Eventually I'd really like to microbenchmark some of the algorithms I've been testing so as to actually measure the near-worst-case operation. For now all I have is practical experience and theoretical bounds with which to demonstrate it to others.

Further work:

I'm currently playing with hashtables as well, continuing the tree comparison testing. Of course the hashtable is much faster than my best tree, but I want to pursue some solutions to the hashtable worst-case problems and see how those fair as well.