There are a couple of common hash join algorithms that databases use: symmetric ...

adrianmonk · on Dec 10, 2019

Since the case where the smaller table fits in RAM is easy and obvious, it's the other case that I'm interested in.

It sounds like you might be saying when two large hash-based tables need to be joined, you're basically starting from scratch in that you're not taking advantage of the existing hash data structure. (At least that's how I interpret "build hash tables", in contrast to somehow using what already exists on disk.)

This sounds pretty slow to me compared to a merge join of ordered lists. There seems to be a lot of I/O (including writes, temp files, and a few passes) whereas with a merge join it's just reads and they're more or less sequential I/O.

So this would be a reason for databases to lean toward ordered storage. But only if disk-based hash joins are as slow as I think they are, which is the part I'm not sure about.

senderista · on Dec 10, 2019

It might not be clear that symmetric hash join is a streaming, non-blocking operator (i.e., it can start producing results immediately without waiting for a hash table to be built). Unlike merge join, though, it requires all input rows to be kept in memory (at least until the other table is exhausted).

I suspect query optimizers haven’t taken hash indexes into account for joins because they’re so rare in practice, but it’s probably worth considering.