Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Recent Intel/AMD CPUs have a POPCNT instruction, which seems like the logical way to do this. Would be interested to see how that performs compared to these implementations.


FWIW, in gcc/g++ there are compiler intrinsics which (should) map to that instruction on CPUs where it's available: __builtin_popcnt, __builtin_popcountl and __builtin_popcountll for unsigned ints, unsigned longs and unsigned long longs respectively. Visual C++ provides equivalent functions for Windows (but I can't remember what they're called).

It does seem odd that the article misses this approach out.


Unfortunately __builtin_popcnt isn't emitting a popcnt instruction with the GCC I've got here, even using -msse4.2. I believe that very recent GCC does get this right.


GCC 4.4 isn't very recent, but it generates a popcnt with -msse4.2.

GCC 4.5, using popcnt on my Core-i7 860 takes the trivial loop mentioned using "Complement and Compare" from ~10.5s to ~7.5s


Also, is there any pure C code that an optimizing compiler could make into a POPCNT instruction? I would be quite impressed if a compiler could recognize some of the snippets in the article, and compile them into a POPCNT and a compare with 1.


Intel has included BSF and BSR (bit scan forward / reverse) since 386, which would presumably also give you the magnitude of the power of 2.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: