Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Unfortunately my machine (i7-7700) can't run anything beyond AVX2.

You machine has BMI2. It's not SIMD, but it handles 8 bytes at a time, and very suitable for packing and unpacking these bits in this case.

https://godbolt.org/z/xcT3exenr



Ugh, you were correct. I did copy and paste your code to my testing framework and it instantly crashed at that time, but it seems that I put a wrong offset to the output. The resulting code was slightly faster (by 2--4%) than my AVX2 code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: