To be fair, QuickSort is particularly suitable for actual computation hardware which has hardware features like numerically addressed, stateful random access memory. Imperative arrays happen to feature "in-place modification of arrays" not because it's a useful language feature but because it's a straightforward abstraction of the actual machine being programmed.
That distinction is really important, and something I think many language nerds completely fail to understand. Sure, often all that's "really" important is the abstraction of the "Human Design Space" idea being implemented, and that's a regime where high level language abstractions have lots of value.
But sometimes you want to actually direct the hardware. And there, as everywhere, it helps to use a language with the proper abstractions. C has them, Haskell doesn't. There's absolutely no shame in that!
But somehow people get fooled into thinking that somehow Haskell (or whatever language is involved in the argument, but it seems like Haskell people fall into this hole more than most) doesn't need those abstractions to be "as fast a C". And that's just insane.
To be fair to mergesort, it's particularly suitable for actual computation hardware too. It's mostly linear reads and writes, which almost all i/o systems heavily favor. GNU sort uses mergesort (in C).
GNU sort is designed to sort on disk though. If you actually watch it working on a giant file, you'll see it generating lots and lots of files in /tmp containing the merge lists.
This is because quicksort obviously requires fast random access to the whole arrays, and (at least when it was written) /usr/bin/sort would be expected to be able to quickly sort data that can't fit in memory.
So... if anything I think this is evidence to my point: in the performance regime, you have to match the algorithm choice to the hardware. Specifically here: sort is making up for the lack of memory by exploiting the comparatively (relative to virtual memory) fast streaming behavior of storage media.
Notably, stateful filesystem I/O is also an area where Haskell tends to fall down. I'm not at all sure a Haskell merge sort of the form used in GNU sort would do well at all, though I'm willing to be proven wrong.
> GNU sort is designed to sort on disk though. If you actually watch it working on a giant file, you'll see it generating lots and lots of files in /tmp containing the merge lists.
The number of files created is just an implementation detail; you could easily do everything in one file if you wanted to, or at most 2N files if you're not allowed to seek or overwrite files at all, and you want to do N-way merges.
Maybe GNU sort is just trying to avoid creating files larger than 2 GB (which is a problem on legacy file systems).
> Specifically here: sort is making up for the lack of memory by exploiting the comparatively (relative to virtual memory) fast streaming behavior of storage media.
Yes, also the fact that GNU sort can sort multi-terabyte files on 32-bit platforms, which otherwise wouldn't fit in memory. (Though systems with smaller process memory space than disk storage space are an oddity that will soon be outdated, I suppose.)
> I'm not at all sure a Haskell merge sort of the form used in GNU sort would do well at all
Well, even the on-disk sort starts by sorting chunks in-memory, which we have already established Haskell is bad at. However, the merge sorting part of the algorithm may well be I/O bound in practice, in which case Haskell should be able to do it just fine.
The only good reason I can think of why C could be faster, is that if you can merge more efficiently, you can choose a higher value of N for your N-way merge,
> stateful filesystem I/O is also an area where Haskell tends to fall down
What do you mean by this? AFAIK Haskell can read/write files just fine through the I/O monad.
That distinction is really important, and something I think many language nerds completely fail to understand. Sure, often all that's "really" important is the abstraction of the "Human Design Space" idea being implemented, and that's a regime where high level language abstractions have lots of value.
But sometimes you want to actually direct the hardware. And there, as everywhere, it helps to use a language with the proper abstractions. C has them, Haskell doesn't. There's absolutely no shame in that!
But somehow people get fooled into thinking that somehow Haskell (or whatever language is involved in the argument, but it seems like Haskell people fall into this hole more than most) doesn't need those abstractions to be "as fast a C". And that's just insane.