Page MenuHome

WIP: Cycles: Use popcount intrinsic on Windows
AbandonedPublic

Authored by Thomas Dinges (dingto) on Dec 10 2022, 7:12 PM.

Details

Summary
  • According to Stackoverflow, it doesn't matter if __popcnt() or _mm_popcnt_u32() is used, they should be the same.
  • Popcount isn't part of SSE4.2, but was released around that. To be on the safe side, I only added it for the AVX kernel and above, to avoid some systems without support for it fallback to SSE3.
  • Some quick benchmarks on Windows with bistro.blend show no difference to the previous code, but since this solved a todo it might be still worth adding. GCC should already use the intrinsic via __builtin_popcount() when compiled with sse4 but worth double checking.

Diff Detail

Event Timeline

Thomas Dinges (dingto) requested review of this revision.Dec 10 2022, 7:12 PM
Thomas Dinges (dingto) created this revision.
Thomas Dinges (dingto) retitled this revision from Cycles: Use popcount intrinsic on Windows to WIP: Cycles: Use popcount intrinsic on Windows .
This revision is now accepted and ready to land.Dec 12 2022, 5:16 PM

Running further benchmarks, this actually results in ~1-2% slower render with latest master, needs further investigation.