To be fair, QuickSort is particularly suitable for *actual computation hardware*...

bcoates · on Jan 21, 2013

To be fair to mergesort, it's particularly suitable for actual computation hardware too. It's mostly linear reads and writes, which almost all i/o systems heavily favor. GNU sort uses mergesort (in C).

ajross · on Jan 21, 2013

GNU sort is designed to sort on disk though. If you actually watch it working on a giant file, you'll see it generating lots and lots of files in /tmp containing the merge lists.

This is because quicksort obviously requires fast random access to the whole arrays, and (at least when it was written) /usr/bin/sort would be expected to be able to quickly sort data that can't fit in memory.

So... if anything I think this is evidence to my point: in the performance regime, you have to match the algorithm choice to the hardware. Specifically here: sort is making up for the lack of memory by exploiting the comparatively (relative to virtual memory) fast streaming behavior of storage media.

Notably, stateful filesystem I/O is also an area where Haskell tends to fall down. I'm not at all sure a Haskell merge sort of the form used in GNU sort would do well at all, though I'm willing to be proven wrong.

sltkr · on Jan 21, 2013

> GNU sort is designed to sort on disk though. If you actually watch it working on a giant file, you'll see it generating lots and lots of files in /tmp containing the merge lists.

The number of files created is just an implementation detail; you could easily do everything in one file if you wanted to, or at most 2N files if you're not allowed to seek or overwrite files at all, and you want to do N-way merges.

Maybe GNU sort is just trying to avoid creating files larger than 2 GB (which is a problem on legacy file systems).

> Specifically here: sort is making up for the lack of memory by exploiting the comparatively (relative to virtual memory) fast streaming behavior of storage media.

Yes, also the fact that GNU sort can sort multi-terabyte files on 32-bit platforms, which otherwise wouldn't fit in memory. (Though systems with smaller process memory space than disk storage space are an oddity that will soon be outdated, I suppose.)

> I'm not at all sure a Haskell merge sort of the form used in GNU sort would do well at all

Well, even the on-disk sort starts by sorting chunks in-memory, which we have already established Haskell is bad at. However, the merge sorting part of the algorithm may well be I/O bound in practice, in which case Haskell should be able to do it just fine.

The only good reason I can think of why C could be faster, is that if you can merge more efficiently, you can choose a higher value of N for your N-way merge,

> stateful filesystem I/O is also an area where Haskell tends to fall down

What do you mean by this? AFAIK Haskell can read/write files just fine through the I/O monad.

davorak · on Jan 22, 2013

> Notably, stateful filesystem I/O is also an area where Haskell tends to fall down.

Can you give an example? I do not understand what you mean. The filesystem is stateful, but I do not see that impeding Haskell.