Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I find you missed the point of the post and the issues described in it.

In my estimation, libraries like boost are way too big and way too clever and they create more problems than they solve. Also, they don't make me happy.

You're overfocusing on a "problem" that is almost completely irrelevant for most of programming. Big endian is rare to be found (almost no hardware to be found, but some file formats and networking APIs have big-endian data in them). Where you still meet it, you don't do endianness conversions willy-nilly. You have only a few lines in a huge project that should be concerned with it. Similar situation for dealing with aligned reads.

So, with boost you end up with a huge slow-compiling dependency to solve a problem using obscure implicit mechanisms that almost no-one understands or can even spot (I would never have guessed that your line above seems to handle misalignment or byte swapping).

This approach is typical for a large group of C++ programmers, who seem to like to optimize for short code snippets, cleverness, and/or pedantry.

The actual issue described in the post was the UB that is easy to hit when doing bit shifting, caused by the implicit conversions that are defined in C. While this is definitely an unhappy situation, it's easy enough to avoid this using plain C syntax (cast expression to unsigned before shifting), using not more code than the boost-type cast in your above code.

The fact that the UB is so easy to hit doesn't call for excessive abstraction, but simply a revisit of some of the UB defined in C, and how compiler writers exploit it.

(Anecdata: I've written a fair share of C code, while not compression or encryption algorithms, and personally I'm not sure I've ever hit one of the evil cases of UB. I've hit Segmentation faults or had Out-of-bounds accesses, sure, but personally I've never seen the language or compilers "haunt me".)



Do you use UBSAN and ASAN? When you write unit tests do you feed numbers like 0x80000000 into your algorithm? When you allocate test memory have you considered doing it with mmap(4096) and putting the data at the end of the map? (Or better yet, double it and use mprotect). Those are some good examples of torture tests if you're in the mood to feel haunted.


Every day I spend futzing around with endianness is a day I'm not solving 'real' problems. These things are a distraction and a complete waste of developer time: It should be solved 'once' and only worried about by people specifically looking to improve on the existing solution. If it can't be handled by a library call, there's something really broken in the language.

(imo, both c and cpp are mainly advocated by people suffering from stockholm syndrome.)


But that's the point: No one spends a day futzing around with endianness, and there are in fact functions for swapping endianness. You can just call them, no need to hide the swap in a pointer cast expression to a type that has the dereferencing operator overloaded.


I agree with the bulk of this post.

Re the anecdata at the end. Have you ever run your code through the sanitizers? I have. CVE-2016-2414 is one of my battle scars, and I consider myself a pretty good programmer who is aware of security implications.


Very little, quite frankly. I've used valgrind in the past, and found very few problems. I just ran -fsanitize=undefined for the first time on one of my current projects, which is an embedded network service of 8KLOC, and with a quick test covering probably 50% of the codepaths by doing network requests, no UB was detected (I made sure the sanitizer works in my build by introducing a (1<<31) expression).

Admittedly I'm not the type of person who spends his time fuzzing his own projects, so my statement was just to say that the kind of bugs that I hit by just testing my software casually are almost all of the very trivial kind - I've never experienced the feeling that the compiler "betrayed" me and introduced an obscure bug for something that looks like correct code.

I can't immediately see the problem in your CVE here [0], was that some kind of betrayal by compiler situation? Seems like strange things could happen if (end - start) underflows.

[0] https://android.googlesource.com/platform/frameworks/minikin...


This one wasn't specifically "betrayal by compiler," but it was a confusion between signed and unsigned quantities for a size field, which is very similar to the UB exhibited in OP.

Also, the fact that you can't see the problem is actually evidence of how insidious these problems are :)

The rules for this are arcane, and, while the solution suggested in OP is correct, it skates close to the edge, in that there are many similar idioms that are not ok. In particular, (p[1] << 8) & 0xff00, which is code I've written, is potentially UB (hence "mask, and then shift" as a mantra). I'd be surprised if anyone other than jart or someone who's been part of the C or C++ standards process can say why.


> the fact that you can't see the problem is actually evidence of how insidious these problems are

I've looked for a while now, but still can't see it, would you be willing to share?

> (p[1] << 8) & 0xff00

With p[1] being uint8_t? Because then I cannot imagine why, and also fail to see a reason to apply the 0xff00 mask here.

If this is for int8_t instead, the problem you are alluding to is sign extension? If p[1] gets promoted to an int in the negative range, (then its representation has the high order bit set), and shifting that to the left is UB.


Yes, I was assuming it was char *, as in the OP, which can be signed. And any left shift of a negative quantity is UB in C (I'm not sure if this is fixed in recent C++), it doesn't have to be what's commonly thought of as overflow.


Raph, clearly you're just not as good a programmer as you think you are.


Why thank you Vitali. Coming from you, that is high praise indeed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: