Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any programming language that makes units a first class part of its type system would allow for defining custom units just as easily as defining other data types. Nobody's saying that SI has to be the only units expressible, just that it has to be the foundation of the unit system. Likewise, unitless numerical quantities will necessarily still be expressible, but using those types for variables that represent a value with a unit should be considered extremely poor practice, just like when communicating those values on paper.

There's really not a good reason to argue against using only SI units for data interchange formats. It's trivial to map to the preferred units on import or display, if and only if you know what units the input data is in. I've dealt with too many bugs where interacting programs have differing assumptions about meters, centimeters, and millimeters to believe that the flexibility of storing different units on disk is ever worth the trouble.



I use Python. If I use one of the third-party packages for units then I can do what you say I can, but at extreme cost. Every single operation checks for unit conversion, on the off-chance that the values aren't compatible. The system, in trying to be nice to me, ends up making things invisibly slow.

(In practice, the performance code runs in C, so the Python/C boundary would have to negotiate the array types for full unit safety.)

In my work the base unit of length is angstroms. I've used nanometers a few times, and never used any other length unit, though I know that GROMAC's xtc format uses picometers. Saying something has a volume of 600 cubic angstroms is much more useful than 6E-28 cubic meters. While I can appreciate that other fields closer to human scale use may like to standardize through SI, I don't want your preferences enforced on my field. All I see is the chance to make things worse, and slower, and don't see any advantages.

One of my data formats has coordinates in angstroms, like "8.420 50.899 85.486". How would you suggest that I write that in an exchange format? As "8.420E-10 50.899E-10 85.486E-10"? (Or the last two normalized to E-11.) At the very least that's a lot of data for very little gain. It gets worse for trajectories, which might save 1 million time steps x 10,000 atoms/time step x 3 coordinates/atom = 3 billion coordinates to an exchange file. I see no advantage to doing that in SI units.

In practice those distance coordinates will likely internally represented in angstroms. Consider that the Lennard-Jones potential is sometimes written as A/r^12 - B/r^6 , with expected values of r around 1E-10m. The denominator of the first will go to 1E-120 in intermediate form, and not be representable in 32-bit float. While not relevant for Python, which uses 64 bit floats, some molecular dynamics programs will use 32 bit float. (Eg, for older GPU machines, or to save space.)

My other example was the atomic mass unit, another non-SI unit. I have only used amu (for chemistry) or dalton (for biology) in my work, not kilograms. It seems pointless to require that I store the mass of a carbon as 1.9926467051999998e-26 kg instead of 12 amu.

I therefore disagree, and believe there are good reasons to argue against SI units for some data interchange formats. I agree that I want to store a single distance unit on disk, only that unit is the non-SI unit angstrom and amu, and not the tremendously huge meter or kg.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: