Page MenuHome

Cycles-X: Fix incorrect performance calculation at certain navigation resolutions
ClosedPublic

Authored by Alaska (Alaska) on Sep 1 2021, 9:51 AM.

Details

Summary

During viewport navigation the algorithm that calculates the resolution divider assumes that the at every resolution division the sample count will be 4. This is not true for cases where the resolution division is equal to 2 * pixel_size and pixel_size. In those cases the sample count is 1. As a result the algorithm assumes 2 * pixel_size and pixel_size are too expensive in a lot of cases and thus doesn't use it.

This patch addresses this issue by informing the algorithm of changes in sample counts between resolution divisions meaning it can more properly gauge the performance of various resolution divisions. As a result the algorithm is more likely to pick the correct resolution divider when performance of the scene is in the range that 2 * pixel_size and pixel_size are viable options all while keeping the performance target at 30fps like before.

Along with this patch the sample count while the resolution divider is equal to 2 * pixel_size has been increased from 1 sample per pixel to 2 samples per pixel. Without an increase to this value there are some negative side effects. I have chosen the value of 2 and not 4 because this values means at a resolution divider of 2 * pixel_size the rendering is still cheaper than a resolution divider of pixel_size.

Diff Detail

Repository
rB Blender
Branch
cycles-x-fix-incorrect-performance-calculation (branched from master)
Build Status
Buildable 16801
Build 16801: arc lint + arc unit

Event Timeline

Alaska (Alaska) requested review of this revision.Sep 1 2021, 9:51 AM
Alaska (Alaska) created this revision.
Alaska (Alaska) edited the summary of this revision. (Show Details)Sep 1 2021, 9:53 AM
Alaska (Alaska) edited the summary of this revision. (Show Details)
Alaska (Alaska) retitled this revision from Fix incorrect performance calculation at certain navigation resolutions to Cycles-X: Fix incorrect performance calculation at certain navigation resolutions.Sep 1 2021, 9:58 AM
  • Clean up code and fix issues with comments
  • Fix small punctuation error

Base patch off Cycles-X again

This looks ok. Ideally I think a using something that is non-iterative to determine the 'resolution_divider' would be better.

intern/cycles/integrator/render_scheduler.cpp
1042–1057

Can you explain how this works in the comments? I looked at this on my machine and the integer part of actual_time/desired_time seemed to match the answer produced but that maybe just for my setup.

Ideally I think a using something that is non-iterative to determine the 'resolution_divider' would be better.

Yeah, however I'm not 100% sure on the approach to take on this. Prior to taking into consideration the change in sample count a non-iterative approach would be relatively easy, but now taking into consideration change in sample count it might be harder. I will need to investigate it further tomorrow.


Can you explain how this works in the comments? I looked at this on my machine and the integer part of actual_time/desired_time seemed to match the answer produced but that maybe just for my setup.

while (actual_time > desired_time) {
  double pre_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
  resolution_divider = resolution_divider * 2;
  double post_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
  actual_time /= 4.0 * pre_resolution_division_samples / post_resolution_division_samples;
}

When you say ...integer part of actual_time/desired_time... I assume you mean something like this: actual_time = 0.24, desired_time = 0.1, actual_time/desired_time = 2.4 integer part = 2. I do not believe this approach works, but I will investigate it tomorrow just to check.

The way the system works is like this:

  1. The code enters the while loop because actual_time > desired_time
  2. pre_resolution_division_samples is calculated. What this does is it finds the number of samples that will be rendered at the current resolution divider at this point in the loop. while (actual_time > desired_time) {
  3. The resolution divider is advanced. This occurs because actual_time > desired_time and thus you can not achieve desired_time at the current resolution. resolution_divider = resolution_divider * 2;
  4. post_resolution_division_samples is calculated. Same as before, finds the number of samples that will be used for rendering at the resolution divider used at this point in the loop.
  5. actual_time is adjusted. The formula used contains two main components. 4.0 and the other part. actual_time is divided by 4.0 because the number of pixels has decreased by 4 due to the resolution_divider advancing in step 3. As such the computational load is 4 times lower and thus the actual_time of rendering should also be 4 times lower. The other part, pre_resolution_division_samples / post_resolution_division_samples calculates the computational difference between the two sample counts acquired from steps 2 and 4. This used to give a more accurate gauge of what actual_time will be based on the changing sample sample count between resolution divider steps. The two are then combined in that equation.

The reason why pre_resolution_division_samples and post_resolution_division_samples are needed at all is because the sample count changes as you change the resolution divider. If the resolution divider is 1, then the sample count is 1. If the resolution divider is 2, then the sample count is 1 (with this patch the sample count has been changed to 2). If the resolution divider is 4 or higher, then the sample count is 4.

The prior algorithm assumes that a consistent sample count (4 based on previous code) is used at all resolution dividers. So it assumed that at a resolution divider of 2, it was using 4 samples. As such in many cases it would assume that a resolution divider of 2 was too computationally expensive for a lot of different situations and thus wouldn't use it. But the reality is that at a resolution divider of 2, the sample count is 1 (or 2 depending on patch), and as such the computational expense is way lower than the algorithm is predicting, so logically it can be used. This new algorithm takes into consideration the changes in sample counts and the effects it has on computational requirements. As such the system more accurately knows when it can use resolution dividers like 1 and 2 that do not follow the sample counts of the other resolution divisions.

Hopefully this helps, just ask if you have any questions, sometimes I'm bad at explaining things.

William Leeson (leesonw) requested changes to this revision.Sep 1 2021, 2:43 PM

I think if you put your explanation into the code and remove those member functions back to how they were as they have not really changed. This is probably good to go. It seems to work fine on my machine. You are correct given the heuristic nature of the get_num_samples_during_navigation function it's probably not so simple to create a non-iterative version. Especially since the ratios between pre and post can change with each iteration step. I guess you might be able to break it into a few cases where the ratios are 4, 2, 1 given the combinations of the values from get_num_samples_during_navigation as (1, 1/4, 1/2, 2/4)*4 but it's probably not worth it.

intern/cycles/integrator/render_scheduler.cpp
1062–1085

These do not need to be made member functions and can be left as they were.

This revision now requires changes to proceed.Sep 1 2021, 2:43 PM
  • Add comment and clean up code
Alaska (Alaska) added a comment.EditedSep 2 2021, 2:47 AM

I had a few questions related to other potential optimizations that could be made in this area:

int resolution_divider = 1;
while (actual_time > desired_time) {
  int pre_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
  resolution_divider = resolution_divider * 2;
  int post_resolution_division_samples = get_num_samples_during_navigation(resolution_divider);
  actual_time /= 4.0 * pre_resolution_division_samples / post_resolution_division_samples;
}

Instead of int resolution_divider = 1 we could have:

int resolution_divider = pixel_size_;
actual_time /= pow(4.0, resolution_divider - 1);

As a result we skip some iterations within the loop if the pixel_size is greater than 1.

The only issue is I don't know if doing it this way is cheaper than iterating through the loop and I'm not sure how to measure the performance difference.

Maybe we could also add an if statement to skip the pow function when the pixel_size is 1 if that improves performance:

if (pixel_size_ != 1) {
  int resolution_divider = pixel_size_;
  actual_time /= pow(4.0, resolution_divider - 1);
}
else {
  int resolution_divider = 1;
}

Using actual_time /= pow(4.0, resolution_divider - 1); is safe here because the sample count doesn't change until the resolution_divider is greater than pixel_size and so the assumption that a resolution division is 4 times cheaper is a safe assumption at this point.


Edit: This optimization has been implemented.

Another optimization I was thinking of was changing while (actual_time > desired_time) { to something like
while (actual_time > desired_time && resolution_divider < default_start_resolution_divider_) {
This skips extra iterations in the loop when it is not required on slow devices/complex scenes. These extra iterations are not required because default_start_resolution_divider_ is the maximum value the resolution divider can go to based on this (see below) in render_scheduler.cpp at around line 940:

/* Don't let resolution to go below the desired one: better be slower than provide a fully
 * unreadable viewport render. */
start_resolution_divider_ = min(new_resolution_divider, default_start_resolution_divider_);

If we do make this change we can also change start_resolution_divider_ = min(new_resolution_divider, default_start_resolution_divider_); to something that doesn't include the min() function to simplify the code a little (there are a few other small changes that can be made if this change is used)

Alaska (Alaska) updated this revision to Diff 41358.EditedSep 2 2021, 12:30 PM
  • Optimize "while loop" to stop if extra iterations will not result in any benefit
Alaska (Alaska) marked an inline comment as done.Sep 2 2021, 12:32 PM
William Leeson (leesonw) requested changes to this revision.EditedSep 2 2021, 3:52 PM

I tried this out an it seems to work pretty well and goes to the full pixel resolution when expected but also performs well using just one core :-) You should run make format as there are a few format changes needed. My only issue is if that while loop could ever get stuck but I don't think so and I can't seem to do anything to cause it to happen. So once the formatting is done this is good to go I think.

This revision now requires changes to proceed.Sep 2 2021, 3:52 PM
Alaska (Alaska) marked an inline comment as done.Sep 2 2021, 7:29 PM
Alaska (Alaska) requested review of this revision.EditedSep 2 2021, 7:33 PM

You should run make format as there are a few format changes needed.

Running make format does not result in any format changes on the code I have adjusted so I have not updated the diff to reflect that.

My only issue is if that while loop could ever get stuck but I don't think so and I can't seem to do anything to cause it to happen.

With the resolution_divider < default_start_resolution_divider_ part added to the "while loop" condition, if it does ever get "stuck", it will stop once the resolution divider reaches default_start_resolution_divider_ rather than infinitely iterating.

  • Adjust and clarify comments
  • Fix small grammatical error in comment
intern/cycles/integrator/render_scheduler.cpp
762–763

Should be resolution_divider

intern/cycles/integrator/render_scheduler.h
413

Should this be 'brings'

William Leeson (leesonw) requested changes to this revision.Sep 3 2021, 12:57 PM

I noticed a change in the update of actual_time_per_update that you may not have intended.

intern/cycles/integrator/render_scheduler.cpp
942

Should you have removed the *num_samples_during_navigation or is path_trace_per_sample badly named?

This revision now requires changes to proceed.Sep 3 2021, 12:57 PM
Alaska (Alaska) marked an inline comment as done.Sep 3 2021, 1:03 PM

I noticed a change in the update of actual_time_per_update that you may not have intended.

This is intended. num_samples_during_navigation has been removed from this step as it now calculated within the "while loop". Using num_samples_during_navigation from outside the "while loop" resulted in the issue that this patch attempts to resolve.

  • Fix small issues in comments
Alaska (Alaska) marked 2 inline comments as done.Sep 3 2021, 1:11 PM

Ok this is good to go then

This revision is now accepted and ready to land.Sep 3 2021, 1:27 PM