Page MenuHome

Cycles: Enable OpenCL 2.0 when available
ClosedPublic

Authored by Jeroen Bakker (jbakker) on Nov 14 2019, 8:53 AM.
Tokens
"Love" token, awarded by Andrea_Monzini."Burninate" token, awarded by charlie."Love" token, awarded by ponomarovmax."Like" token, awarded by Loner."Like" token, awarded by threedslider."Love" token, awarded by ucupumar."Love" token, awarded by amonpaike.

Details

Summary

The OpenCL 2.0 compiler of AMD improves the render time, even without using
an OpenCL 2.0 feature. Additional performance can be added later when
using specific OpenCL 2.0 features.

This patch adds a setting in Preferences -> System -> Cycles Render Devices -> OpenCL where OpenCL 2.0 compilation can be enabled.
By default we don't enable the OpenCL 2.0 compilation as it requires more RAM to finish compilation (+16GB) due to the multi process compilation.

Performance impact in seconds Linux AMD Vega 64

**Master**

bmw27: 89
barbershop_interior: 673
classroom: 173
fishy_cat: 196
koro: 207
pavillon_barcelona: 414

**Patch Applied**

bmw27: 84
barbershop_interior: 497
classroom: 153
fishy_cat: 190
koro: 202
pavillon_barcelona: 417

Diff Detail

Repository
rB Blender
Branch
T71479 (branched from master)
Build Status
Buildable 6427
Build 6427: arc lint + arc unit

Event Timeline

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 8:53 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 9:59 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Added comment

intern/cycles/device/opencl/opencl_split.cpp
1879

we don't support opencl 1.1 so perhaps just remove this

Reverted disabling out of process compilation

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 14 2019, 1:02 PM
Jeroen Bakker (jbakker) marked an inline comment as not done.

Can I ask if you're using ROCm to test this?

I'm thinking particularly of this article: https://www.phoronix.com/scan.php?page=news_item&px=AMD-OpenCL-2.0-ROCm-2.0-Work

I've been testing Blender with a Radeon VII on Linux and I've had disappointing performance with amdgpu-pro. (And few distros supported as well)

No I don’t use ROCm in my default test set as it is a wrapper and added more complexity to the landscape. There is a small user base using it. I don’t have a Radeon 7 to test myself.

It would be helpful to test this patch and send some comparison data before and after. So we can keep track of the current state

It would be helpful to test this patch and send some comparison data before and after. So we can keep track of the current state

I'm happy to do that! Although the current state I had when I tested it about a week ago was waiting and around an hour for Cycles kernels to compile for the default cube and giving up, so I'm not sure that's the best data. I may not have set it up properly, or perhaps Ubuntu 19.10 made it unhappy.

In any case I'm happy to create whatever environment you'd find useful and test it there before and after this patch. The machine I have access to has 32GB of RAM and a R7 2700x.

@Hans Goudey (HooglyBoogly) Please upload your system-info.txt. By default the OpenCL drivers installed by ROCm are not mature enough for running cycles at all... They are optimized for micro kernels, cycles isn't.

Here's that. Yeah I guess the whole situation isn't ideal, but I'm curious to see how it works out.

Here's where it seems to get stuck compiling the kernels (on master):

Cycles: compiling OpenCL program split_subsurface_scatter...
Cycles: compiling OpenCL program split_do_volume...
Cycles: compiling OpenCL program split_shadow_blocked_dl...
Cycles: compiling OpenCL program split_shadow_blocked_ao...
Cycles: compiling OpenCL program split_holdout_emission_blurring_pathtermination_ao...
Cycles: compiling OpenCL program split_lamp_emission...
Cycles: compiling OpenCL program split_direct_lighting...
Cycles: compiling OpenCL program split_indirect_background...
Cycles: compiling OpenCL program split_shader_eval...
Cycles: compiling OpenCL program split_bundle...
Cycles: compiling OpenCL program base...
Cycles: compiling OpenCL program background...
Kernel compilation of base finished in 7.31s.
Kernel compilation of split_do_volume finished in 7.76s.
Kernel compilation of split_indirect_background finished in 8.19s.
Kernel compilation of split_holdout_emission_blurring_pathtermination_ao finished in 8.35s.

I'm happy to run more tests and I'll try with this patch on Monday.

@Hans Goudey (HooglyBoogly)
Sorry, you Driver isn't mature enough. It uses the X11 implementation and that implementation is just not able to run cycles with acceptable performance (compile and execution wise). I wished companies would invest in OpenSource drivers for these kind of technologies.

@Jeroen Bakker (jbakker) Interesting. I was under the impression that AMD was working ROCm as an open source answer to CUDA and a replacement to their proprietary OpenCL driver stack. I also don't have a great understanding of this landscape and I haven't found much material out there about it.

You mentioned there was a small user base using ROCm. Do you have any information about that or about your test set up?

If you're still interested in tests with the Radeon VII, like I said I'm happy to install whichever driver you recommend on whatever distribution to run some tests.

Brecht Van Lommel (brecht) added inline comments.
intern/cycles/device/opencl/opencl_split.cpp
1871

Why use this NVIDIA specific code for all platforms?

1874

As far as I know noinline is not an OpenCL 2.0 feature, but rather something up to the compiler that can work with any OpenCL versions.

Is it OpenCL 2.0 or __CL_NOINLINE__ that makes the biggest difference for performance? And memory usage?

I'd expect noinline to use less memory if it works.

intern/cycles/device/opencl/opencl_split.cpp
1874

You're right, noinline isn't from the OpenCL 2.0 specs. I enabled it only for OpenCL2.0 as currently it was disabled for all OpenCL platforms, and haven't been able to find the reason. I detected that the heavier scenes needed both option to be on par with 2.80. I will split this patch into two separates so we can keep better track to the changes.

I did detect memory changes when enabling OpenCL2.0 the compiler needed much more memory (32GB when doing out of process compilation). OpenCL2.0 has more memory and work sync features we could utilize. We still need to do a lot of benchmark to find out what the actual benefits are.

Jeroen Bakker (jbakker) marked 2 inline comments as done.
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Cleaned up the patch

  • removed the inlining
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 27 2019, 8:49 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Nov 27 2019, 8:51 AM
Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Only enables the OpenCL 2.0 context when set in the debug flags.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Dec 12 2019, 8:53 AM
Jeroen Bakker (jbakker) planned changes to this revision.Dec 12 2019, 8:55 AM
Jeroen Bakker (jbakker) added inline comments.
intern/cycles/blender/addon/properties.py
787

This is a scene setting. Move this option to preferences.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)Dec 12 2019, 9:29 AM
Jeroen Bakker (jbakker) requested review of this revision.Jan 8 2020, 1:35 PM
Jeroen Bakker (jbakker) added inline comments.
intern/cycles/blender/addon/properties.py
787

Cycles hasn't got a good location for these kind of preferences. Eg

  • Add it as debug option: Hard to find for users
  • Add a new DeviceType: A lot of work, not really reusable.
  • Put it in param: params is more scene related, not platform related
  • Just enable it by default: 16 GB of memory is required to compile using OpenCL2 (this needs a more mature parallel compilation.)
intern/cycles/blender/addon/properties.py
787

It can be a BooleanProperty in class CyclesPreferences, that is only visible to the user when the active device type is OpenCL.

Also, what do you mean more mature parallel compilation? Like limiting the number of threads used for compilation based on available memory?

intern/cycles/blender/addon/properties.py
787

I tried that. That was not the hard part. How to get that preference to the device itself (DeviceInfo => Device) can be an option but wasn't clean enough for me, hence this comment. The cyclespref is currently only available when creating the deviceinfos. The setting can be stored in all device infos and then in all opencldevices. where it can be read.

Another way is to put it in the debug flags during the creation of the device infos. Gives cleaner flow, but could be seen as misusing the debug flags.

and yes on my system I cannot compile the kernels under normal curcomstances (have to close all applications and start blender). I would to suggest to add a mem intensive task scheduler and put the compilation tasks there.

Jeroen Bakker (jbakker) edited the summary of this revision. (Show Details)

Moved setting to preferences
Use opencl device version.

intern/cycles/device/opencl/opencl_split.cpp
1875

revert these changes as they aren't needed for this change

Cleaned up the patch:

  • reverted unneeded change
  • spelling in comments

Missed an empty line reverting unneeded change

Micro optimization: Do not retrieve device opencl version when we don't want to compile for opencl 2.0

I can't reproduce a memory usage increase on either Windows or Linux. For me memory usage is about the same with and without OpenCL 2.0.

I can see that enabling all features, some kernels take around 3 GB of memory to compile. And since the number of high memory kernel compilations that run in parallel are somewhat random, it may work one time but fail in other cases. That's something we should try to solve independent of this patch.

I'm just going to commit this without any preference, and then address the memory usage problem later.

This revision is now accepted and ready to land.Mar 24 2020, 7:53 PM