Is what the article is essentially saying is that [1] is faster than [2]? I have...

baix777 · on March 20, 2021

The relative performance of these two queries will vary by data volume. Swap in a sales table for the weights table, and make that a massive sales table at that, joining it to much smaller prices can be much faster than a group by. Stated differently, a join can be faster than a group by. This is even more true when the small table can fit into memory and a hash join can be used, and the data in the group by can't fit into memory.

fifilura · on March 20, 2021

Yes I'd say that I would intuitively do that only if the tables were both sufficiently large.

I assume somewhere there is a similar assumption in TFA.

croes · on March 20, 2021

This queries have different results. [2] retrieves only the vegetable_ids which are in both tables, [1] gives all ids which are in prices or weights. If vegetable_id null exists in either table [1] result in an extra row with id null, this doesn't happen for query [2]

fifilura · on March 20, 2021

If I replace JOIN with FULL OUTER JOIN, you'll get what you describe. It was just an quick example, but you are right.

There are also things to say about what happens if either table has duplicate vegetable_id:s. At some point it is assumed that you have processed it in such a way that the table is well formed for the purpose.

sandfly · on March 20, 2021

Do you lose the indices on the columns after the UNION?

ComputerGuru · on March 21, 2021

Definitely, but that is more than made up for by the much smaller size.