Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So this is nice, but is it super fast/efficient? It would seem to me that this sort of thing is where inline assembly, on a per processor basis, would really make a huge difference. Is this sort of thing well enough known that compilers already do it for us? Anyone know a lot about this sort of thing?


In general, this will probably get you close to maximum performance. Most CPUs implement shift, rotate, AND, OR, XOR, and NOT instructions, and usually not much else in the way of bit manipulation (besides maybe a few very specialized intrinsics for popular encryption algorithms/hash functions (see SSE)). DSPs and more domain-specific microcontrollers generally have more bit-banging instructions, and you might be able to make better use of them with assembly or intrinsics.

A smart compiler could theoretically transform your crappy implementation of "is the Nth bit set?" into something more intelligent, but it is rarely profitable enough to do so, as programmers who write code in which bit-banging is the bottleneck typically know how to do this themselves; in fact, most of the macros linked are what I consider to be the most parsimonious implementation.

One thing where an intrinsic will definitely beat a C-level implementation is population count (how many bits are set in this int/long?) Many recent x86 parts and many DSPs implement a POPCOUNT/BC instruction which will outperform even a smart LUT-based implementation (without the memory capacity/bandwidth requirement too). There's an interesting anecdote about that instruction here (http://www.moyogo.com/blog/2005/09/secret-opcodes.html ), no idea if it's true.


Thanks for the good reply!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: