Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Exposing Floating Point (2019) (ciechanow.ski)
71 points by eindiran on June 26, 2023 | hide | past | favorite | 29 comments


Pretty solid article. I'm glad there was mention of the non-uniform-distribution of floats, though my biased view wishes there were a little more emphasis that floats really really want to be "near 1", to keep precision. The practice of normalizing data to a [0,1] space, doing your math there, then transforming back to "big" space is useful not just for conceptual simplicity, but also for maintaining precision in intermediate results.

Also, there's a pretty good range of integers that can be represented exactly by floats; one of the reasons the Lua developers were resistant to adding an integer type for many years. I believe you can represent about 24 bits of exact integers in a 32 bit float.

I've been wrangling with fixed-point representations a lot lately, so the differences are stark in my mind (:


Floating point is a piecewise linear approximation of a exponential function where each piece has fixed precision. For FP32, which has 23 mantissa bits, you divide each piece into 2^23 evenly spaced segments e.g. between 1 and 2 there are 2^23 segments and between 2 and 4 there are 2^23 segments, etc.

The strength of floating point is that it bounds % error rather than absolute error. Similar to how scientific notation has a % error represented in the form of number of significant figures.


> The strength of floating point is that it bounds % error rather than absolute error. Similar to how scientific notation has a % error represented in the form of number of significant figures.

Not just similar, precisely the same: scientific notation with a fixed number of significant figures is pencil-and-paper decimal floats.

Having mentioned decimal floats, though, I have to note that they have worse precision bounds because the pieces in the piecewise approximation are larger.

For example, with four binary digits, one unit of last place expressed as a relative error lies from binary 0.001 / 1.111 to binary 0.001 / 1.001, that is to say from about 2^-4 to about 2^-3, a factor of two, whereas with four decimal digits it will be from 0.001 / 9.999 to 0.001 / 1.001, or about 10^-4 to about 10^-3, a factor of ten. The usual phrasing is that the wobble of the number system is smaller with a smaller radix.

Trading more wobble for less digits is reasonable when calculating by hand, but makes no sense on a computer. Even on paper some books will tell you to refrain from rounding down to ...1 when a value ends in ...11, ...12, or ...13 in order to avoid the worst of the wobble.


> floats really really want to be "near 1", to keep precision.

The number of significant digits is identical for (nearly) the entire range of FP values. There's no value to keeping it "near 1" for IEEE 754 floats - the precision is exactly the same regardless whether near 1 or near 1 trillion. This makes them ideal for general computation and modeling physical properties.

In contrast, posits, the unum alternative to IEEE 754, are highly sensitive to absolute scale. Posits lose precision as the magnitudes increase. Otoh, for small values, you get much higher precision which is why they getting some attention from the AI world where normalized weights are everywhere.


Hm, maybe my terminology is not quite right; let me try again:

With float32, if I want to do some "small-scale" math with a clump of numbers near 1 trillion, I will have a bad time. I can't even add 1 to them, reliably. Might get x+1==x, and soforth.

But If I instead transform my space down to something near 1 (say, subtracting 1 trillion from everything), then I have available the fine-grained results of that math.

It's true that when it gets transformed back to "origin = 1trillion", that detail will be lost, but during the time I'm doing those intermediate calculations, the error is staying small in absolute value.

So, probably "precision" was the wrong word to use. Maybe relative vs. absolute error?

Compare that to fixed-point, where it doesn't matter where the cluster of numbers is, as long as things stay representable (which can indeed be a problem). You'll get the same absolute error either way.

Hopefully that makes sense; my grasp of the specific terms, etc is a little tenuous at times (:


But that doesn't want to be near 1, it wants you to use the smallest numbers possible. If you're calculating millimeters, you want to be using .001 and .002, not 1.001 and 1.002.

I'd phrase it more like: "you lose accuracy when you have multiple numbers that share a big offset compared to their relative scale".

Also, to properly consider a trillion in fixed point for a moment: Let's say you have a 44.20 fixed point format, with a range of ±8.8 trillion and a precision of about 1 millionth. Double precision floats will match it in precision around 10 billion. Around the max, floats will have 10 fewer bits of precision. Around 10 million floats will have 10 more bits of precision, around 10 thousand floats will have 20 more bits of precision, etc.


> If you're calculating millimeters, you want to be using .001 and .002, not 1.001 and 1.002.

I think if millimeters are what's important, one should represent them as '1' and '2', no? That's what I meant by keeping things near 1 (apologies for my clumsy language). I mean whatever unit you care about should be approximately "1 unit" in its float representation.

But yes, thank you for helping enunciate these things (:


`0.001` and `0.002` are essentially equally accurate as `1` and `2`. `1.001` and `1.002` are worse.

In general, multiplicative scaling is useless with floating-point (*); but shifting the coordinate offset (additive, i.e. translation) can be highly useful. You want to move the range of numbers you are dealing with so that is becomes centered around zero (not near 1!). E.g. in Kerbal Space Program, the physics simulation of individual parts within the rocket needs to use a coordinate system centered on the rocket itself; it would be way too inaccurate to use the global coordinate system centered on the sun.

(*) The exception is if you need to keep decimal fractions exact, e.g. if dealing with money. In this case, (if a better suited decimal floating-point is unavailable) you want to scale multiplicatively to ensure a cent is 1, not 0.01.


> `0.001` and `0.002` are essentially equally accurate as `1` and `2`. `1.001` and `1.002` are worse.

Well let me just be 100% clear: I never meant to suggest the `1.001` encoding, at any point in this exchange (:

> You want to move the range of numbers you are dealing with so that is becomes centered around zero (not near 1!)

Yes, I think I like that terminology better - "centered around" rather than "near".

The reason I didn't say 0 originally is because keeping numbers "near zero" in an absolute sense is not the goal. If your numbers are all close to 1e-8, you would do well to scale them so that "1 float unit" is the size of the thing you care about, before doing your math. I think that is what you are saying in your cents example, too. So, the goal is about what "1 unit" means, not being specifically near a certain value. That's where the "1" in my original phrasing comes from; sorry for the confusion.


I don't really agree with how you're framing things. If "1" is the size you care about, then in single precision you can use numbers up to the millions safely, and in double precision you can use numbers up to a quadrillion safely. (Or drop that 10x if you want lots of rounding slack.) You're not trying to stay near 1 or centered around anything. You're trying to limit the ratio between your smallest numbers and your biggest numbers. And it works the same way whether the unit you care about is 1 or the unit you care about is 1e-8. If you kept your smallest numbers around 1e-8 there wouldn't be any downside in terms of calculation or accuracy.


I suppose implicit in my assumptions is that if "1" is the number I care about, that's the sort of values I'm going to be working with w/regard to my target data.

So, if I am doing some +1/-1 sort of math on a bunch of numbers, and those numbers are "far away" (eg: near 1e+8 or near 1e-8), then it is better to transform those numbers near "1 space", do the math, then transform it back, rather than trying to do it directly in that far-away space.

But yes, I suppose in your phrasing, that does come down to the ratio of the numbers involved — 1 vs 1e±8. You want that ratio to be as near 1 as possible, I think is what you mean by "limit the ratio"?


Well "1" won't consistently be "the typical amount you add/subtract" and "the typical number you care about" at the same time.

Like, a bank might want accuracy of a 1e-4 dollars, have transactions of 1e2-1e5 dollars, and have balances of 1e5-1e8 dollars.

That's three ranges we care about, and at most one of them can be around 1.0. But which one we pick, or picking none at all, won't affect the accuracy. The main thing affecting accuracy is the ratio between biggest and smallest numbers which in this case is 1e12.

If you set pennies to be 1.0, or basis points to be 1.0, or a trillion dollars to be 1.0, you'd get the same accuracy. Let's say some calculation is off by .0000003 pennies from perfect math. All those versions will be off by .0000003 pennies. (Except that there might be some jitter in rounding based on how the powers align, but let's ignore that for right now.)


There's something I'm not quite getting, here.

Let's take your bank example, with 32-bit floats. Since you say it doesn't matter, lets set "1" to be "1 trillion dollars" (1e12). A customer currently has a balance of 1 dollar, so it's represented as 1e-12. Now they make 100 deposits, each of a single dollar. If we do these deposits one-at-a-time, we get a different result than if we do a single deposit of $100, thanks to accumulated rounding errors. Ok, fine.

Now we choose a different "1" value. You say "which one we pick, or picking none at all, won't affect the accuracy," but I think in this case it _does_? In this second case, we set "1" to be 1 dollar, and we go through the same deposits as above. In this case, both algorithms (incremental and +$100 at once) produce identical results — 101, as expected.

I agree that there can be multiple ranges that we care about, which can be tricky, but I don't agree that it doesn't matter what "1" we pick.

But I am probably misinterpreting you in some way (:


If you can avoid rounding then your answers will be more accurate. But that's almost entirely separate from how big your "1" is.

If you set "1" to be "2^40 dollars (~1.1 trillion)", then $1 is represented as 2^-40. Adding that up 100 times will have no rounding, and give you exactly the same result as a deposit of $100.

On the opposite side of things, setting "1" to be "3 dollars" or "70 cents" would cause rounding errors all over, even though that's barely different in scale.


Okay, I think we are basically on the same page.

But since I'm finding this helpful ... (:

We've been talking about addition so far, and relative scales between numbers. But suppose we just consider a single number, and multiply it by itself some times.

Certainly if that number is 1, we can keep doing it forever without error.

But the further we get away from 1 (either 1e+X or 1e-X), the more quickly error will be generated from that sequence, eventually hitting infinity or zero.

I'm just trying to express through this example that there is still something "special" about 1 in scale (likewise 0, in offset), where you want to be "close to" it, in the face of doing some arbitrary math, in order to produce better results. It doesn't even need to involve relative sizes between 2 different numbers.


It depends on what math you're doing. Very often you're likely to find that a base number of 1e-6 makes you less likely to hit your exponent limits than a base number of 1.

1 is special in that half the positive floats are above it and half are below. That doesn't mean your use case wants half.


Then would it be fair to say that if you don't know what calculations might be coming, all other things being equal, 1 is a good choice since it is "unbiased" in this sense?


It's perfectly fine, but because addition is very common it's unlikely to be optimal.


I feel like I'm having trouble picturing this. Istm like in most cases you're either not taking the big number as an input (in which case you can just make the modification to the big number after the calculation is done) or you are taking the big number as an input, and there's no difference in error.

E.g., say you're adding some velocity to an objects position, there no reason to do that "scaled down" since you get garbage in and garbage out regardless?

Imo the right advice is to either store things in some sort of offset format e.g., relative to this very far away thing, the position is X meters, or to store the inputs separately e.g., if you're doing position of an object under some constant velocity, store time and velocity separately instead of accumulating it so errors don't add up. But maybe I'm missing something? I guess you'd also want to avoid using larger number in calculations, so e.g., the midpoint trick of adding half the distance between two points to the smaller instead of adding them together and dividing by 2.


> Imo the right advice is to either store things in some sort of offset format

Yeah, I agree that's what it comes down to, basically. It's more or less what I meant by "keep things near 1". I almost said "keep things near 0", but I didn't want to suggest that 1e-12 is somehow better than 1e+12. Really, it's more like "try to keep '1 float unit' to be about the size of the stuff you care about in your calculation".

> E.g., say you're adding some velocity to an objects position

As I interpret this example, I think it would be beneficial to re-center things to a common nearby origin, rather than doing them in some far-away-from-0 space. The more calculations being done (each one introducing some error), the more it matters.

For the specific case of "add X to each number" maybe it wouldn't matter, but I'm talking about arbitrary sequences of math being done.

If each of your calculations (applying velocity, maybe some rotation, etc) introduces some floating point roundoff error, then that is a potentially large accumulated absolute value when far away from the origin (eg: could be off-by-kilometers rather than off-by-meters). But if you normalize things down to small relative numbers first, produce all your error in that space, then transform back to faraway-space, that's only one opportunity to produce big-size error, rather than several accumulated. In other words, you've taken your local off-by-meters value, transformed it into global kilometer-size space, and so maybe your final result is off-by-1-kilometer, due to some rounding or whatnot. Whereas if you did all your computation in kilometer-space, then each step of math you do can introduce additional off-by-kilometer error, and you could be off-by-10s-of-kilometers in the end.

This whole conversation has got me doubting myself, so I made an example in godbolt that hopefully is clear: https://godbolt.org/z/MfP7e7Tbh


It's 25 bits if you take the sign bit into account.


> It seems that there is no way to configure the printing of floating point values to automatically maintain exact number of decimal digits needed to accurately represent the value.

C++17 has to_ chars/from_chars and c++20 has std::format which will output the shortest output the shortest output required to round trip the floating point value exactly.


Anyone know how he generates those illustrations?

If he's not keeping it a secret, I'd love to have that in my toolbox.


I'm not certain of what tool he is using to generate the SVG files but a great option is Inkscape, which is FOSS and free-as-in-beer. If you want an illustration tool, Inkscape is definitely the best non-Adobe option I've used (and as someone with a monthly subscription to all of the Adobe tools, probably the best option overall).


IIRC, custom WebGL stuff that he writes himself. So he's a 10x technical writer and a 10x developer.


You can view the source on the page, it's all there.


It's literally in the page JS source.


I don't entirely follow...

Most of the diagrams in this article are SVG files. Yes, you can look at the source (a big pile of numbers), but that doesn't say how they were created.

I realize that a lot of this author's other articles have animated WebGL stuff, and the source code is illuminating, but not so much here, unless I'm missing something.


You are right, my comment applies to other articles by this author but not this one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: