This patch contains work to split the OpenCL mega-kernel into separate kernels to obtain better GPU utilization and therefore performance.
A description of the optimizations included in this patch is located at https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit?usp=sharing