This patch contains the parallel implementation of Sparse 3D Transform-Domain Collaborative filter (BM3D). The parallization is based on OpenMP and OpenMP SIMD. This implementation is more efficient than Dabov or Lebrun implementation (2560x1440 - used memory: Lebrun - 13.5GB, Ours - 330 MB, 7680x4320: Lebrun - 101GB, Ours - 2.1GB).
The unofficial build and the descritpion can be found here: https://code.it4i.cz/blender/builds.