Target speedup: 10-12%.
My Ubuntu installation is not in a good shape. Ubuntu 14.10 with shiny 3.16 kernel, Cuda 6.5, GCC 4.9. But strange thing is going on with cycles. For example, BMW scene for master branch is now 2:10 for me (previously it was 1:40). Unfortunately, Intel profiler does not work with my current setup, so I am not trying to find the root of this problem. Still, this patch improves the performance of Cycles up to 15% for various files.
Cuda performance is yet to be checked for this patch.