This patch contains the parallel implementation of Sparse 3D Transform-Domain Collaborative filter (BM3D). The parallization is based on OpenMP and OpenMP SIMD. This implementation is more efficenty than Dabov or Lebrun implementation (2560x1440 - used memory: Lebrun - 13.5GB, Ours - 330 MB, 7680x4320: Lebrun - 101GB, Ours - 2.1GB).