Page MenuHome

[Cycles/msvc] Get cycles_kernel compile time under control.
ClosedPublic

Authored by Ray Molenkamp (LazyDodo) on Apr 3 2017, 12:33 AM.

Details

Summary

Ever since we merged the extra texture types (half etc) and spit kernel the compile time for cycles_kernel has been going out of control.

It's currently sitting at a cool 1295.762 seconds with our standard compiler (2013/x64/release)

I'm not entirely sure why msvc gets upset with it, but the inlining of matrix near the bottom of the tri-cubic 3d interpolator is the source of the issue, this patch excludes it from being inlined.

This patch bring it back down to a manageable 186 seconds. (7x faster!!)

with the attached bzzt.blend that @Sergey Sharybin (sergey) kindly provided i got the following results with builds with identical hashes

58:51.73 buildbot
58:04.23 Patched

it's really close, the slight speedup could be explained by the switch instead of having multiple if's (switches do generate more optimal code than a chain of if/else/if/else statements) but in all honesty it might just have been pure luck (dev box,very polluted, bad for benchmarks) regardless, this patch doesn't seem to slow down anything with my limited testing.

Diff Detail

Event Timeline

intern/cycles/kernel/kernel_compat_cpu.h
328

Indentation.

363

We don't use space after keyword in Cycles.

Same applies to the cases below.

364

Indentation.

421

Indentation.

498

Seems to be remained from some experiments.

intern/cycles/kernel/kernels/cpu/kernel_cpu_image.h
2 ↗(On Diff #8527)

Unrelated change which ruins indentation.

24 ↗(On Diff #8527)

You can not use __declspec, it will obviously break on non-msvc compilers. If simple ccl_device_noinline is not enough we should then add ccl_device_forcenoinline.

27 ↗(On Diff #8527)

Code style and indetnation.

46 ↗(On Diff #8527)

We don't usually use camel case and avoid first capital for the variable name.

47 ↗(On Diff #8527)

Same two points as above.

65 ↗(On Diff #8527)

Same as above.

Thomas Dinges (dingto) requested changes to this revision.Apr 3 2017, 1:58 PM

I don't see why you changed the if/else to switch statements, you said that the matrix inlining is responsible for the slow compilation?

I'd rather not have different code for CPU / GPU and keep the definitions in util_texture.h equal for all architectures. Plus, during my work on the Texture system I found it useful to being able to quickly change the amount of Half / Float / Byte Textures to something different than 1024, which then would break your new code.

This revision now requires changes to proceed.Apr 3 2017, 1:58 PM

I don't see why you changed the if/else to switch statements, you said that the matrix inlining is responsible for the slow compilation?

I'd rather not have different code for CPU / GPU and keep the definitions in util_texture.h equal for all architectures. Plus, during my work on the Texture system I found it useful to being able to quickly change the amount of Half / Float / Byte Textures to something different than 1024, which then would break your new code.

I was trying to counter balance the not in-lining of tri-cubic interpolator with a slightly more efficient code-path to get there , (if's generate a chain of cmp/jz's while switches either use a chain of dec/jz or a jump table, , on top of that a switch is nicer to read) but i'm not overly committed to it, the perf change seems to be immeasurable , if you don't like it i'll happily take it out.

Ray Molenkamp (LazyDodo) edited edge metadata.

resolved feedback.

Ray Molenkamp (LazyDodo) marked 3 inline comments as done.

missed one.

Ray Molenkamp (LazyDodo) marked an inline comment as done.Apr 3 2017, 3:44 PM
Ray Molenkamp (LazyDodo) added inline comments.
intern/cycles/kernel/kernel_compat_cpu.h
410

given the problem is very msvc specific, what are the thoughts of an ccl_msvc_never_inline ? i mean no need to punish gcc for msvc's weirdness?

Thank you for changing it, I prefer code readability and the ability to quickly debug stuff there over some very small speedup. :)

This revision is now accepted and ready to land.Apr 3 2017, 3:46 PM

Great find, looks good to me.

I wouldn't worry about GCC, these function are sufficiently complicated that they can be uninlined anywhere.

Please kill your IDE's auto-format.

intern/cycles/kernel/kernel_compat_cpu.h
410

Don't do that. Otherwise you'll end up with nasty things like ccl_inline_for_cuda_unline_for_gcc_forceinline_for_msvc.

This is a big enough function which doesn't make much sense to inline in any of compilers.

440

Indentation (x was aligned to the same column in original code).

469

Indentation and space around {}. It used to be

const int yc[4] = {width * piy,
                   width * iy,
                   width * niy,
                   width * nniy};
479

Indentation:

/* Note thew asterisk are at the same column.
 * (they became at the beginning of the line).
 */

redid the code changes with notepad , to make sure vs didn't ruin the codestyle.

This revision was automatically updated to reflect the committed changes.
Ray Molenkamp (LazyDodo) marked an inline comment as not done.