- According to Stackoverflow, it doesn't matter if __popcnt() or _mm_popcnt_u32() is used, they should be the same.
- Popcount isn't part of SSE4.2, but was released around that. To be on the safe side, I only added it for the AVX kernel and above, to avoid some systems without support for it fallback to SSE3.
- Some quick benchmarks on Windows with bistro.blend show no difference to the previous code, but since this solved a todo it might be still worth adding. GCC should already use the intrinsic via __builtin_popcount() when compiled with sse4 but worth double checking.
Details
Details
- Reviewers
Brecht Van Lommel (brecht)
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Running further benchmarks, this actually results in ~1-2% slower render with latest master, needs further investigation.