This patch is based on D8063 and rBf63da3dcf59f87b34aa916b2c65ce5a40a48fd92.
In order to build binary kernels for Ampere the CUDA toolkit needs to be updated to CUDA 11.1 (which adds sm_86 support). That toolkit drops support for sm_30 though, so need to continue building that using CUDA 10.
With this patch the build system checks whether the "CUDA10_NVCC_EXECUTABLE" CMake variable is set and if so will use that to build sm_30 kernels. Similarily for sm_8x kernels it checks "CUDA11_NVCC_EXECUTABLE". All other kernels are built using the default CUDA toolkit.
This makes it possible to use either the CUDA 10 or CUDA 11 toolkit by default and only seletively use the other for the kernels where its a hard requirement. In case only the CUDA 11 toolkit is installed, the build system will simply skip sm_30 kernels. Same with only the CUDA 10 toolkit, but for sm_8x kernels.
The sm_35 and sm_50 architectures still build with CUDA 11, albeit being deprecated, so I left them as is for now.