Page MenuHome

Cycles: Moved the background map averaging into the kernel, saving some memory
Needs ReviewPublic

Authored by Lukas Stockner (lukasstockner97) on Jun 25 2015, 4:49 PM.

Details

Summary

This commit just moves the luminance calculation in the background CDF calculation
inside the kernel, which allows to use floats instead of float3s for the buffer.

Diff Detail

Repository
rB Blender
Branch
arcpatch-D1377

Event Timeline

Lukas Stockner (lukasstockner97) retitled this revision from to Cycles: Moved the background map averaging into the kernel, saving some memory.

Overlooked a float4 pointer, didn't matter for functionality, but it's still wrong.

Sergey Sharybin (sergey) added inline comments.
intern/cycles/kernel/kernel_bake.h
442

What is the reason of changing argument type from float4 to float and then typecast here?

444

float3_to_float4() would work i guess?

intern/cycles/kernel/kernel_bake.h
442

The background code below uses only float, not float4, so one of the two must have a cast. IMO passing the pointer as float* and then casting to float4* here, like it's done for the buffer for the path tracing kernels, is cleaner than passing float4* and casting to float* below.

444

Yes, I'll change it.

Using float3_to_float4 in kernel_bake.h, rebase to current master.

Sergey Sharybin (sergey) requested changes to this revision.Jul 23 2015, 11:29 AM

Still not convinced such a casting is a good thing to have, it's more an error prone approach. Also it's a bit arbitrary to write float4 for displacement and luma for background for just rather a niche application.

Not sure if such optimization gives any improvement in terms of actual memory peak, but still think it's better to have more explicit control on what shader evaluation is expected to return. Couple of ideas:

  • Have kernel_shader_evaluate(), kernel_shader_evaluate_luma()
  • Have an argument defining type of an output, like SHADER_EVAL_RGBA, SHADER_EVAL_LUMA
This revision now requires changes to proceed.Jul 23 2015, 11:29 AM
Lukas Stockner (lukasstockner97) edited edge metadata.

Okay, to revive this diff, I've redone the patch on top of the current master.
Actually, keeping the pointer as float4 and only casting it for the background case does indeed seem to be the cleaner solution, this way the place where the code might be confusing can be clearly commented.
Functionality-wise, nothing changes, just like in the old diff.

Still feel bad about such pointer magic. Anything against such plan:

  • Apply D1702
  • Revert changes outside of light.cpp from this patch
  • Apply
    1diff --git a/intern/cycles/render/light.cpp b/intern/cycles/render/light.cpp
    2index 590fcf8..af54d3e 100644
    3--- a/intern/cycles/render/light.cpp
    4+++ b/intern/cycles/render/light.cpp
    5@@ -37,7 +37,7 @@ static void shade_background_pixels(Device *device, DeviceScene *dscene, int res
    6 int height = res;
    7
    8 device_vector<uint4> d_input;
    9- device_vector<float4> d_output;
    10+ device_vector<float> d_output_luma;
    11
    12 uint4 *d_input_data = d_input.resize(width*height);
    13
    14@@ -52,18 +52,18 @@ static void shade_background_pixels(Device *device, DeviceScene *dscene, int res
    15 }
    16
    17 /* compute on device */
    18- d_output.resize((width*height + 3)/4);
    19- memset((void*)d_output.data_pointer, 0, d_output.memory_size());
    20+ d_output_luma.resize(width*height);
    21+ memset((void*)d_output_luma.data_pointer, 0, d_output_luma.memory_size());
    22
    23 device->const_copy_to("__data", &dscene->data, sizeof(dscene->data));
    24
    25 device->mem_alloc(d_input, MEM_READ_ONLY);
    26 device->mem_copy_to(d_input);
    27- device->mem_alloc(d_output, MEM_WRITE_ONLY);
    28+ device->mem_alloc(d_output_luma, MEM_WRITE_ONLY);
    29
    30 DeviceTask main_task(DeviceTask::SHADER);
    31 main_task.shader_input = d_input.device_pointer;
    32- main_task.shader_output = d_output.device_pointer;
    33+ main_task.shader_output_luma = d_output_luma.device_pointer;
    34 main_task.shader_eval_type = SHADER_EVAL_BACKGROUND;
    35 main_task.shader_x = 0;
    36 main_task.shader_w = width*height;
    37@@ -77,15 +77,15 @@ static void shade_background_pixels(Device *device, DeviceScene *dscene, int res
    38 foreach(DeviceTask& task, split_tasks) {
    39 device->task_add(task);
    40 device->task_wait();
    41- device->mem_copy_from(d_output, task.shader_x, 1, task.shader_w, sizeof(float4));
    42+ device->mem_copy_from(d_output_luma, task.shader_x, 1, task.shader_w, sizeof(float));
    43 }
    44
    45 device->mem_free(d_input);
    46- device->mem_free(d_output);
    47+ device->mem_free(d_output_luma);
    48
    49 d_input.clear();
    50
    51- float *d_output_data = reinterpret_cast<float*>(d_output.data_pointer);
    52+ float *d_output_data = reinterpret_cast<float*>(d_output_luma.data_pointer);
    53
    54 pixels.resize(width*height);
    55

IMO this will keep things really clear, avoiding possible issues following the code and perhaps some compiler optimization crazyness.

Lukas Stockner (lukasstockner97) edited edge metadata.

Rebased to curent master, now the patch uses the changes of D1702 to pass the average value to the host side.

Sorry, I somehow overlooked P306 :/