This patch adds required math functions for float8 to make it possible using float8 instead of float3 for color data.
Details
- Reviewers
Brecht Van Lommel (brecht) - Group Reviewers
Cycles - Commits
- rB793d20313952: Cycles: add math functions for float8
Diff Detail
- Repository
- rB Blender
- Branch
- cycles-add-float8-functions (branched from master)
- Build Status
Buildable 23134 Build 23134: arc lint + arc unit
Event Timeline
| intern/cycles/util/types_float8.h | ||
|---|---|---|
| 15–18 | There's a problem when compiling on CUDA caused by ccl_try_align macro not defined. Is this macro required here or should it be replaced with something else? | |
What is the bigger goal this patch is leading to? How using float8 will help for color operations?
| intern/cycles/util/types_float8.h | ||
|---|---|---|
| 15–18 | On a CPU it is required to be aligned. I do not think CUDA supports 256bit operations and perhaps the float8 is to be implemented as a pair of float4 in which case it seems that we can skip alignment qualifier here. | |
This will be used for spectral rendering to calculate 8 wavelengths per ray. I've submitted a similar patch D15318 for float4 as well.
Ideally we would not need 8 channel colors, since it will give significant memory overhead on the GPU, or complexity due to support different number of channels at runtime. But regardless it's good to be able to experiment with this.
| intern/cycles/util/types_float8.h | ||
|---|---|---|
| 15–18 | You can do: #ifdef __KERNEL_GPU__ struct float8 #else struct ccl_try_align(32) float8 #endif On any currently supported GPU architecture, implementing this as a pair of float4 should make no performance difference compared to single floats. So probably easiest to just keep it as individual floats. | |
| intern/cycles/util/types_float8.h | ||
|---|---|---|
| 15–18 | That's an interesting aspect of GPU compute. One thing I forgot to mention above is float4 having a benefit for the pre-AVX2 CPUs. We still support those. For the initial patch for the float8 operations and making it available on all platforms is not essential. We can look into further optimizations after the initial work is done. | |
| intern/cycles/util/types_float8.h | ||
|---|---|---|
| 15–18 | Fair point, I forgot about pre-AVX2 CPUs, but indeed optimizing for that case is not essential. | |