There are a couple of common hash join algorithms that databases use: symmetric hash join and asymmetric hash join. The latter is actually simpler. Take the first (smaller) table and read it into a hash table (assuming it can fit in memory for simplicity's sake). Then stream all rows from the second table, looking up the join key for each input row in the hash table containing the first table. If you get a match, emit an output row. For symmetric hash join, you stream both input tables simultaneously: for each input row, you first check the other input table's hash table for a match (emitting an output row for each match), and then add the input row to that input table's hash table. (When you've exhausted one of the input tables, you can actually delete the other input table's hash table from memory, since you won't produce any more matches from it.) Again I've oversimplified in assuming that the input tables fit in memory, but the algorithms for spilling to disk are pretty simple (basically hash the inputs on the join key into disk partitions which do fit into memory and build hash tables from each partition).
Parallel databases also use hash join for performing distributed joins: each node hashes its inputs on the join key to distribute onto the other nodes, so that all inputs with the same join key end up on the same node. Then you can apply one of the hash join algorithms above.
Since the case where the smaller table fits in RAM is easy and obvious, it's the other case that I'm interested in.
It sounds like you might be saying when two large hash-based tables need to be joined, you're basically starting from scratch in that you're not taking advantage of the existing hash data structure. (At least that's how I interpret "build hash tables", in contrast to somehow using what already exists on disk.)
This sounds pretty slow to me compared to a merge join of ordered lists. There seems to be a lot of I/O (including writes, temp files, and a few passes) whereas with a merge join it's just reads and they're more or less sequential I/O.
So this would be a reason for databases to lean toward ordered storage. But only if disk-based hash joins are as slow as I think they are, which is the part I'm not sure about.
It might not be clear that symmetric hash join is a streaming, non-blocking operator (i.e., it can start producing results immediately without waiting for a hash table to be built). Unlike merge join, though, it requires all input rows to be kept in memory (at least until the other table is exhausted).
I suspect query optimizers haven’t taken hash indexes into account for joins because they’re so rare in practice, but it’s probably worth considering.
Parallel databases also use hash join for performing distributed joins: each node hashes its inputs on the join key to distribute onto the other nodes, so that all inputs with the same join key end up on the same node. Then you can apply one of the hash join algorithms above.