**Overview**
AMD Radeon RDNA2 architecture adds support for hardware accelerated ray intersection and BVH traversal. This is exposed in OpenCL via a “hardware intrinsic”. We will use this to modify the existing OpenCL path in Cycles to use a separate hardware accelerated code path when device support is present.
From a user point of view, they continue using the OpenCL version of cycles, and we automatically enable hw raytracing if available.
**Design **
Enablement:
- A flag in the make file
```
WITH_CYCLES_DEVICE_AMD_HW
```
- If the cmake flag is set, device capabilities are assessed for HW raytracing and if available, HW RT is enabled
Host side, BVH
- Addition of a new BVH layout mask (bvh_amd)
- Addition of a new bvh class to generate hw compatible bvh and pack instances accordingly
-- QBVH based with top level bounding boxes, up to depth 5, with full precision and the rest with half precision
Host side, Device
- Use of existing OpenCL device management framework
- Querying the vendor and device ID to enable the HW ray tracing feature (a flag is passed to the compiler)
- Updating the device memory alignment to 256 (as required by HW RT)
Device side
- Use of the existing kernel structure
- Addition of BVH kernels to direct the intersection calls to AMD HW intrinsic intersection functionality
- Use of existing intersection logic for curves and motion triangle (due to HW limitation)
- LDS enablement is a work in progress
**OpenCL hardware intrinsic **
Two new shader instructions are added to the HW to accelerate ray tracing: “image_bvh_intersect_ray” and “image_bvh64_intersect_ray”. Both instructions do identical operations except for the fact that the image_bvh64_intersect_ray variant takes in a 64 bit node pointer while the image_bvh_intersect_ray instruction takes in a 32 bit node pointer. When these instructions are issued, the HW gathers ray data, a bvh node pointer, and a BVH resource descriptor from the shader and fetches a single BVH node. The HW then tests the BVH node against the ray and returns the intersection results of the ray with that one BVH node back to the shader.
**Detecting device compatibility **
When AMD HW raytracing flag is set, the hardware ray tracing capability is queried in the OpenCLDevice constructor. The result of the query is checked when assigning the BVH layout and when kernels are compiled. The BVH layout routes the BVH creation to BVHAMD class and later in the BVH building, methods of this class are used to pack nodes and generate the hardware compatible BVH. The compiler options routes the intersection calls to the appropriate BVH function where AMD hardware supported intersection functionality is used.
(Patch to be public soon)