Current implementation uses a single core. During implementation it was always kept in mind that we should do the processing in a threaded way. It should also be possible to render the next sample during the integration of the previous sample in the accumulation buffer.
Another solution would be to do the sorting on the GPU using compute shaders. D10913: GPU: Compute Pipeline.