Page MenuHome

Color Management: Parallelize ImBuf conversion to float
ClosedPublic

Authored by Lukas Stockner (lukasstockner97) on Oct 22 2022, 4:55 AM.

Details

Summary

Motivated by long loading times in T101969, reduces render preparation time from 14sec to 6sec.

Another possible improvement would be to use C++ and template based on OCIO vs. sRGB,
but moving the file to C++ seems nontrivial (and opens up the question whether ocio_capi
makes any sense then or we should just use OCIO directly) so I left it at a direct 1:1
parallelization of the existing code for now.

Diff Detail

Repository
rB Blender
Branch
parallel-float-cm (branched from master)
Build Status
Buildable 24378
Build 24378: arc lint + arc unit

Event Timeline

Lukas Stockner (lukasstockner97) requested review of this revision.Oct 22 2022, 4:55 AM
Lukas Stockner (lukasstockner97) created this revision.

Calling ocio once per pixel the just calling overhead will eat you alive ocio surely must have some buffered variants we could call?

Calling ocio once per pixel the just calling overhead will eat you alive ocio surely must have some buffered variants we could call?

Hm, looks like they do, I'll check if it works for this usecase.

Calling ocio once per pixel the just calling overhead will eat you alive ocio surely must have some buffered variants we could call?

Hm, looks like they do, I'll check if it works for this usecase.

Surprisingly, OCIO_cpuProcessorApply is slower than the current version, even when applying it in 32-scanline blocks. I'm getting 5.92sec with P3271, compared to 5.43sec with the current version here.

This is strictly better than what we were doing before so I think this can be committed.

Would be good to understand why it's slower for bigger blocks though, if it's something in our C wrapper or in OpenColorIO. It's not obvious to me from checking the code.

This revision is now accepted and ready to land.Nov 8 2022, 7:50 PM

Would be good to understand why it's slower for bigger blocks though, if it's something in our C wrapper or in OpenColorIO. It's not obvious to me from checking the code.

My best guess would be caching - this version does all processing steps on one pixel, while the other version loops over the entire image multiple times and therefore has to load/store each pixel to memory every time.

Maybe doing it in smaller blocks would be faster then, but may need to avoid the memory allocation from OCIO_createOCIO_PackedImageDesc somehow then.

Whatever the cause, not a blocker for this commit to land.