Page MenuHome

Prevent a stray "-I" if there is no path to the Optix SDK.
ClosedPublic

Authored by Gon Solo (gonsolo) on Oct 20 2022, 10:31 PM.

Details

Summary

If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I" in the arguments.
Detect if the include path is empty and act accordingly.

Diff Detail

Repository
rB Blender
Branch
stray (branched from master)
Build Status
Buildable 24362
Build 24362: arc lint + arc unit

Event Timeline

Gon Solo (gonsolo) requested review of this revision.Oct 20 2022, 10:31 PM
Gon Solo (gonsolo) created this revision.
Gon Solo (gonsolo) edited the summary of this revision. (Show Details)Oct 20 2022, 10:34 PM
Gon Solo (gonsolo) added a project: Cycles.
Brecht Van Lommel (brecht) requested changes to this revision.Oct 21 2022, 8:31 PM

If we are compiling an OptiX kernel, I would expect it to fail compiling without the OptiX headers. So we shouldn't be compiling at all in that case.

In OptiXDevice::load_kernels we are checking for the existence of the OptiX include directory before compiling the kernel. Is this not working somehow, or is there another case where we are missing the check?

This revision now requires changes to proceed.Oct 21 2022, 8:31 PM

Further inspection reveals the following:

The check for an empty optix include dir is in optix/device_impl.cpp:478.
Before that CUDA kernels are loaded via CUDADevice::load_kernels at optix/device_impl.cpp:385
CUDADevice::load_kernels calls compile_kernel at cuda/device_impl.cpp:421
CUDADevice::compile_kernel calls compile_kernel_get_common_cflags at cuda_device_impl.cpp:284
which is a virtual function and since the device is an OptixDevice the wrong compile_kernel_get_common_cflags (the Optix one, not the CUDA one) gets called. Since there is no Optix include directory there is a stray "-I" and nvcc fails.

So I believe the root cause is that OptixDevice::load_kernels calls CUDADevice::load_kernels to load CUDA
kernels before compiling the Optix kernel, but then uses the wrong compile_kernel_get_common_cflags.

Stack trace:
#0 0x0000000002842980 in ccl::OptiXDevice::compile_kernel_get_common_cflags(unsigned int) ()
#1 0x0000000002823d9d in ccl::CUDADevice::compile_kernel(unsigned int, char const*, char const*, bool) ()
#2 0x0000000002828d93 in ccl::CUDADevice::load_kernels(unsigned int) ()
#3 0x0000000002842b8b in ccl::OptiXDevice::load_kernels(unsigned int) ()
#4 0x0000000002837525 in ccl::MultiDevice::load_kernels(unsigned int) ()

Thanks for investigating that.

I think we should disable runtime kernel compilation entirely for OptiX devices, if the OptiX include directory is not found. Because if even if it succeeds compiling the CUDA-only kernels, it would fail compiling the OptiX kernels later.

Probably with a new virtual method to check if runtime compilation is possible.

Gon Solo (gonsolo) updated this revision to Diff 57076.EditedOct 25 2022, 12:27 PM
  • Revert "Prevent a stray "-I" if there is no path to the Optix SDK."
  • Make compile_kernel_get_common_cflags non-virtual.
  • Ubuntu Kinetic has GCC 12. Allow it.

Instead of adding a new virtual method to check if runtime compilation is possible
I opted for a minimal solution: Make compile_kernel_get_common_cflags non-virtual.
This way always the right flags are used for a device (Optix or Cuda) and the behaviour
is not changed, e.g. if there are ptx files.

The Optix device calls 'CUDADevice::compile_kernel_get_common_cflags' anyway so there is really no point in making it virtual.

One disadvantage is that Cuda is compiled before the (Optix-related) error message appears.

For HIP nothing changes since nothing derives from it.

I included the GCC 12 patch just for local testing and can remove it if wanted.

  • Hoist getting cflags out of compile_kernel.

compile_kernel is a non-virtual method in CUDADevice.
Hoisting compile_kernel_get_common_cflags out of it makes
sure that the right flags are used for Optix and Cuda kernels.

Brecht Van Lommel (brecht) requested changes to this revision.Oct 26 2022, 10:16 AM

One disadvantage is that Cuda is compiled before the (Optix-related) error message appears.

I want to avoid this. We should not have users wait a while for CUDA kernels to compile, and then see it fail later.

intern/cycles/device/cuda/device_impl.cpp
354 ↗(On Diff #57077)

This addition is not explained.

This revision now requires changes to proceed.Oct 26 2022, 10:16 AM

One disadvantage is that Cuda is compiled before the (Optix-related) error message appears.

I want to avoid this. We should not have users wait a while for CUDA kernels to compile, and then see it fail later.

I can add a check at the beginning of the function. Are you ok with the rest of the changes?

intern/cycles/device/cuda/device_impl.cpp
354 ↗(On Diff #57077)

Ubuntu Kinetic's default compiler is gcc-12 which is unsupported by nvcc at the moment.
I'm getting rid of this.

  • Move check for Optix before compiling CUDA kernels.
  • Remove unsupported-compiler flag.
Gon Solo (gonsolo) marked an inline comment as done.Oct 26 2022, 8:47 PM

Thanks for the updates, I like where the implementation ended up.

I'll commit this with one tweak, to avoid compiling OptiX ray-tracing kernels if we are only using the device for denoising.

This revision is now accepted and ready to land.Nov 8 2022, 7:36 PM