Page MenuHome

Cycles: Fix 2 silly cuda warnings / Restore cycles_cubin_cc
ClosedPublic

Authored by Ray Molenkamp (LazyDodo) on Mar 12 2020, 7:38 PM.

Details

Summary

First of all, this started out as just dealing with those two warns during a CUDA build regarding redefinitions of FLT_MIN and FLT_MAX and snowballed out of control.

Little bit of background information for @Patrick Mours (pmoursnv) , @Brecht Van Lommel (brecht) / @Sergey Sharybin (sergey) feel free to skip this part

[Small history lesson on how we got here]

nvcc historically was(perhaps still is) real slow supporting new compilers and kept us from using MSVC 2017/2019 update 1/2/3/whatever since it just went unsupported compiler! bye bye!

To work around this issue we added a compiler back-end based on nvrtc, given we have no host code in our kernels this should be no issue, however since we have no host compiler to rely on with nvrtc there was no math.h and we had to define FLT_MIN / FLT_MAX our selves, which made the 2 warnings.

Now a while ago, the nvcc team realized that the MS team was gonna push out a new compiler every 6 weeks and with the cuda release schedule being 1-2 times a year that be a problem, so they relaxed the compiler version check, nvcc just kept on working with new MSVC updates and cycles_cubin_cc kinda fell into disarray.

[/Small history lesson on how we got here] @Brecht Van Lommel (brecht) / @Sergey Sharybin (sergey) pick it up from here!

Now to get rid of the 2 warnings, it's easy just detect if it's nvcc or cycles_cubin_cc and not define the floating point min/max, easy!

However then came testing, no issues with nvcc but cycles_cubin_cc support had rotted a tiny bit and had been disabled for cuda 10 due to "issues"

I'd like to get this back into working order and this diff is the first stab at this.

Issues addressed in this patch

  1. I had cuda 10.2, cmake decided MSVC did not support 10.2 and enabled cycles_cubin_cc and later decided cycles_cubin_cc was not supported on cuda 10.x and disabled it again.....riiiiight...

The first issue is fixed by adding a better version check < 11.0 rather than having to add new cuda versions to `intern/cycles/CMakeLists.txt'

The nvrtc path not being supported I'd like to keep for now and added a (temporary) WITH_CYCLES_CUBIN_COMPILER_OVERRRIDE cmake flag to bypass the disabling of cycles_cubin_cc

  1. cycles_cubin_cc was never updated deal with the optix code path.

Some small tweaks were needed to stop at just PTX phase, nothing big there, and some cmake changes to wire it all up.

  1. cycles_cubin_cc was just failing to build the kernels, it didn't know how to deal with static_assert .

Not sure if we just didn't have them before when cubin_cc worked, or something else changed, but either way I added a no-op define for now in util_static_assert.h for cubin_cc

  1. FLT_MIN/MAX warning (ha! finally!)

Now that cycles_cubin_cc was working again, I added a -DCYCLES_CUBIN_CC for when it was used, and properly protected the math defines, walk in the park!

  1. FLT_EPSILON added

There's some new code that uses it, so we needed a define in both the cuda and optix compat header files.

Testing

Allright sofar so good, and then came testing,I have an GTX1660 and the cubin_cc generated sm_75 kernel rendered the bmw_27 scene seemingly fine, but that's hardly a conclusive test.

Issues

  1. Unit tests, I'm struggling to run the unit tests on the GPU

I have a script [1] that forces the GPU to the be used which works for a normal CLI render, but when setting CYCLESTEST_ARGS=-P c:\gpu.py and running ctest, it seemingly still runs on the CPU. (ie cpu load is high, gpu is idling)

with "issues" already being known with the code nvrtc path it be real nice to know if any tests are failing and which ones....

  1. Optix testing.

Blender officially does not support optix on a GTX1660, I added a (temporary?) bypass using the CYCLES_OPTIX_RTX_BYPASS environment variable and it seemingly runs fine with nvcc build kernels on my GTX1660.

  1. nvtrc optix

Not as much luck there, cubin_cc outputs a ptx kernel but when blender loads it craps out with

OptiX error OPTIX_ERROR_INVALID_PTX in optixModuleCreateFromPTX(context, &module_options, &pipeline_options, ptx_data.data(), ptx_data.size(), nullptr, 0, &optix_module), line 401

not entirely sure yet what is wrong here. @Patrick Mours (pmoursnv) fixed it.

Open questions

  1. Are we keeping cycles_cubin_cc? (i'm OK with shooting it in the face, but this may come back to haunt us when MSVC/GCC releases new versions)
  1. if we're keeping it I'm gonna need some help with the unit tests, D5602 does some work in this area but still has rather substantial issues of it's own.
  1. The bypasses are OK for testing, ideally cubin_cc gets back to supported status and WITH_CYCLES_CUBIN_COMPILER_OVERRRIDE can go away, do we want to keep CYCLES_OPTIX_RTX_BYPASS ?

bonus question for @Patrick Mours (pmoursnv)

I have tried just copying the kernels over and that did not work, but would it be possible for a sm_75 card to load and execute a < sm_75 kernel?

In cuda 9.x we had an issue where a bad kernel was generated for sm_30 but the newer kernels were fine, given I don't have a box with every GPU known to man in it, it be real nice if we could unit tests all sm_30..sm_75 kernels on a sm_75 system, is this feasible? or more 'yeah that does not work that way, you're outta luck there' kind of thing?

[1] https://blender.stackexchange.com/a/156680

Diff Detail

Repository
rB Blender
Branch
tmp_cycles_cubin_cc (branched from master)
Build Status
Buildable 7158
Build 7158: arc lint + arc unit

Event Timeline

Ray Molenkamp (LazyDodo) retitled this revision from First of all, this started out as just dealing with those two warns during a CUDA build regarding redefinitions of `FLT_MIN` and `FLT_MAX` and snowballed out of control. to Cycles: Fix 2 silly cuda warnings / Restore cycles_cubin_cc .Mar 12 2020, 7:39 PM
Ray Molenkamp (LazyDodo) edited the summary of this revision. (Show Details)
Ray Molenkamp (LazyDodo) edited the summary of this revision. (Show Details)

Nice!

Regarding issue numero 3: Running with --debug-cycles should increase verbosity of the information you get from OptiX, which might towards why it fails to load. But I'll check this later too.

Regarding the bonus question: It is not possible to load a final CUDA binary for an incompatible SM version. Since this is the actual device code, only GPUs which support the ISA (instruction set architecture) of that code will be able to run it. Sometimes GPU architectures can be backwards compatible to a certain degree (e.g. Turing can run Volta code), but one should not rely on this.
The only exception are binaries compiled for "virtual" SM versions ("compute_XX"). The cubin will contain PTX instead of the final device code in that case, which means the driver can JIT those modules to the actual target device code.

intern/cycles/kernel/CMakeLists.txt
523

I'd move this down into the add_custom_command line, similar to how you added -target 30 for the cubin compiler.

intern/cycles/util/util_static_assert.h
27

This could just be #if defined(__KERNEL_OPENCL__) || defined(CYCLES_CUBIN_CC).

Btw., the split kernel makes use of INT_MAX, which is missing with cycles_cubin_cc too. Nobody uses that currently, but could just fix that too while at it.

intern/cycles/kernel/CMakeLists.txt
558

This is the reason you see OPTIX_ERROR_INVALID_PTX. You are compiling the CUDA kernel with cycles_cubin_cc here, not the OptiX kernel.

Changing this to -i ${CMAKE_CURRENT_SOURCE_DIR}/${input} fixes that.
It unearths some other problems though:

  1. OptiX needs C++11 in kernel code, so cycles_cubin_cc has to add the --std=c++11 argument to NVRTC.
  2. The OptiX headers include standard headers (optix_7_types.h includes stddef.h), which are not available in cycles_cubin_cc.
Ray Molenkamp (LazyDodo) retitled this revision from Cycles: Fix 2 silly cuda warnings / Restore cycles_cubin_cc to Cycles: Fix 2 silly cuda warnings / Restore cycles_cubin_cc.
  • Fix optix building with cubin_cc
Ray Molenkamp (LazyDodo) edited the summary of this revision. (Show Details)

Allright thanks for the pointers! the optix kernel now builds and runs on my 1660 using cubin_cc

Ray Molenkamp (LazyDodo) marked an inline comment as done.
  • Merge remote-tracking branch 'origin/master' into arcpatch-D7136
  • Update with feedback
  • Renamed optix environment varaible to be in line with the one we have for OpenCL

Minor update, i'll deal with testing in D5602

This revision is now accepted and ready to land.Mar 26 2020, 2:22 PM