Is what the article is essentially saying is that [1] is faster than [2]?
I have used [1] many times for that reason although [2] is probably more intuitive for what you want to do.
[1]
SELECT vegetable_id, SUM(price) as price, SUM(weight) as weight
FROM
(
SELECT vegetable_id, price, NULL as weight
FROM prices
UNION ALL
SELECT vegetable_id, NULL as price, weight
FROM weights
)
GROUP BY vegetable_id
[2]
SELECT vegetable_id, price, weight
FROM prices
JOIN weights
ON price.vegetable_id = weights.vegetable_id
The relative performance of these two queries will vary by data volume. Swap in a sales table for the weights table, and make that a massive sales table at that, joining it to much smaller prices can be much faster than a group by. Stated differently, a join can be faster than a group by. This is even more true when the small table can fit into memory and a hash join can be used, and the data in the group by can't fit into memory.
This queries have different results. [2] retrieves only the vegetable_ids which are in both tables, [1] gives all ids which are in prices or weights.
If vegetable_id null exists in either table [1] result in an extra row with id null, this doesn't happen for query [2]
If I replace JOIN with FULL OUTER JOIN, you'll get what you describe. It was just an quick example, but you are right.
There are also things to say about what happens if either table has duplicate vegetable_id:s. At some point it is assumed that you have processed it in such a way that the table is well formed for the purpose.
I have used [1] many times for that reason although [2] is probably more intuitive for what you want to do.
[1]
[2]