Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I remember reading about Kahan summation, and I naively thought it would help improve the accuracy/consistency of a n-body code I was using for my PhD. I was trying to resolve a discrepancy between summations when performed serially versus in parallel using OpenMP. In the end I abandoned the approach, since it seems the issue lay elsewhere.


I've employed this Kahan summation in a different physics simulation that was giving parallel vs serial discrepancies due to global sums that sum in different orders in parallel. It did help, but I ultimately concluded that it isn't a complete solution for this issue... in short, I think it just bumps the problem several orders of magnitude down.

For what it's worth, and in the event that your issue was indeed the differing order of summing-- there is a solution that /does/ work, which is to cast the numbers to a large (eg, 256 or 512 bit) fixed point representation and leverage the reproducibility of integer sums.


The issue here is that unless you have checks that your values dont over or underflow at every stage of the calculation, you can never actually know if your answers are accurate, even if they might be repeatable.


I would think that in most cases you should be able to prove an upper bound on the sum and make sure the accumulator is large enough to accommodate it. My guess is that 512 bits should be enough to span all physically/computationally relevant scales.


This article lists some real examples that require more than 512 bits: https://res.mdpi.com/d_attachment/mathematics/mathematics-03...

Note that digits of precision are expressed in decimal unless explicitly given as “bits”, so that the phrase “10,000-digit arithmetic” really means around 33,000 bits.


Certainly if you have errors that grow exponentially all bets are off. I wouldn't think that would apply to simply summing a set of numbers though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: