First of all, this started out as just dealing with those two warns during a CUDA build regarding redefinitions of FLT_MIN and FLT_MAX and snowballed out of control.
Little bit of background information for @Patrick Mours (pmoursnv) , @Brecht Van Lommel (brecht) / @Sergey Sharybin (sergey) feel free to skip this part
[Small history lesson on how we got here]
nvcc historically was(perhaps still is) real slow supporting new compilers and kept us from using MSVC 2017/2019 update 1/2/3/whatever since it just went unsupported compiler! bye bye!
To work around this issue we added a compiler back-end based on nvrtc, given we have no host code in our kernels this should be no issue, however since we have no host compiler to rely on with nvrtc there was no math.h and we had to define FLT_MIN / FLT_MAX our selves, which made the 2 warnings.
Now a while ago, the nvcc team realized that the MS team was gonna push out a new compiler every 6 weeks and with the cuda release schedule being 1-2 times a year that be a problem, so they relaxed the compiler version check, nvcc just kept on working with new MSVC updates and cycles_cubin_cc kinda fell into disarray.
[/Small history lesson on how we got here] @Brecht Van Lommel (brecht) / @Sergey Sharybin (sergey) pick it up from here!
Now to get rid of the 2 warnings, it's easy just detect if it's nvcc or cycles_cubin_cc and not define the floating point min/max, easy!
However then came testing, no issues with nvcc but cycles_cubin_cc support had rotted a tiny bit and had been disabled for cuda 10 due to "issues"
I'd like to get this back into working order and this diff is the first stab at this.
Issues addressed in this patch
- I had cuda 10.2, cmake decided MSVC did not support 10.2 and enabled cycles_cubin_cc and later decided cycles_cubin_cc was not supported on cuda 10.x and disabled it again.....riiiiight...
The first issue is fixed by adding a better version check < 11.0 rather than having to add new cuda versions to `intern/cycles/CMakeLists.txt'
The nvrtc path not being supported I'd like to keep for now and added a (temporary) WITH_CYCLES_CUBIN_COMPILER_OVERRRIDE cmake flag to bypass the disabling of cycles_cubin_cc
- cycles_cubin_cc was never updated deal with the optix code path.
Some small tweaks were needed to stop at just PTX phase, nothing big there, and some cmake changes to wire it all up.
- cycles_cubin_cc was just failing to build the kernels, it didn't know how to deal with static_assert .
Not sure if we just didn't have them before when cubin_cc worked, or something else changed, but either way I added a no-op define for now in util_static_assert.h for cubin_cc
- FLT_MIN/MAX warning (ha! finally!)
Now that cycles_cubin_cc was working again, I added a -DCYCLES_CUBIN_CC for when it was used, and properly protected the math defines, walk in the park!
- FLT_EPSILON added
There's some new code that uses it, so we needed a define in both the cuda and optix compat header files.
Testing
Allright sofar so good, and then came testing,I have an GTX1660 and the cubin_cc generated sm_75 kernel rendered the bmw_27 scene seemingly fine, but that's hardly a conclusive test.
Issues
- Unit tests, I'm struggling to run the unit tests on the GPU
I have a script [1] that forces the GPU to the be used which works for a normal CLI render, but when setting CYCLESTEST_ARGS=-P c:\gpu.py and running ctest, it seemingly still runs on the CPU. (ie cpu load is high, gpu is idling)
with "issues" already being known with the code nvrtc path it be real nice to know if any tests are failing and which ones....
- Optix testing.
Blender officially does not support optix on a GTX1660, I added a (temporary?) bypass using the CYCLES_OPTIX_RTX_BYPASS environment variable and it seemingly runs fine with nvcc build kernels on my GTX1660.
- nvtrc optix
Not as much luck there, cubin_cc outputs a ptx kernel but when blender loads it craps out with
OptiX error OPTIX_ERROR_INVALID_PTX in optixModuleCreateFromPTX(context, &module_options, &pipeline_options, ptx_data.data(), ptx_data.size(), nullptr, 0, &optix_module), line 401
not entirely sure yet what is wrong here. @Patrick Mours (pmoursnv) fixed it.
Open questions
- Are we keeping cycles_cubin_cc? (i'm OK with shooting it in the face, but this may come back to haunt us when MSVC/GCC releases new versions)
- if we're keeping it I'm gonna need some help with the unit tests, D5602 does some work in this area but still has rather substantial issues of it's own.
- The bypasses are OK for testing, ideally cubin_cc gets back to supported status and WITH_CYCLES_CUBIN_COMPILER_OVERRRIDE can go away, do we want to keep CYCLES_OPTIX_RTX_BYPASS ?
bonus question for @Patrick Mours (pmoursnv)
I have tried just copying the kernels over and that did not work, but would it be possible for a sm_75 card to load and execute a < sm_75 kernel?
In cuda 9.x we had an issue where a bad kernel was generated for sm_30 but the newer kernels were fine, given I don't have a box with every GPU known to man in it, it be real nice if we could unit tests all sm_30..sm_75 kernels on a sm_75 system, is this feasible? or more 'yeah that does not work that way, you're outta luck there' kind of thing?