I remember reading about Kahan summation, and I naively thought it would help im...

kardos · on Dec 24, 2019

I've employed this Kahan summation in a different physics simulation that was giving parallel vs serial discrepancies due to global sums that sum in different orders in parallel. It did help, but I ultimately concluded that it isn't a complete solution for this issue... in short, I think it just bumps the problem several orders of magnitude down.

For what it's worth, and in the event that your issue was indeed the differing order of summing-- there is a solution that /does/ work, which is to cast the numbers to a large (eg, 256 or 512 bit) fixed point representation and leverage the reproducibility of integer sums.

snovv_crash · on Dec 24, 2019

The issue here is that unless you have checks that your values dont over or underflow at every stage of the calculation, you can never actually know if your answers are accurate, even if they might be repeatable.

tgflynn · on Dec 24, 2019

I would think that in most cases you should be able to prove an upper bound on the sum and make sure the accumulator is large enough to accommodate it. My guess is that 512 bits should be enough to span all physically/computationally relevant scales.

pascal_cuoq · on Dec 24, 2019

This article lists some real examples that require more than 512 bits: https://res.mdpi.com/d_attachment/mathematics/mathematics-03...

Note that digits of precision are expressed in decimal unless explicitly given as “bits”, so that the phrase “10,000-digit arithmetic” really means around 33,000 bits.

tgflynn · on Dec 24, 2019

Certainly if you have errors that grow exponentially all bets are off. I wouldn't think that would apply to simply summing a set of numbers though.