Checking the particle systems code I located a hardcoded value that limited the maximum amount of particles handled per task to 256.
After suspecting that this was problematic and probably and old hard limit I decided to explore it with great success.
In Blender master as it is right now creating 4 million particles with an i7-7700HQ took 31 seconds, with the initiall modifications done by this patch it took 4 seconds.
@Oscar 'Nebe' Abad (OscarNebeAbad) helped me with the tests and @SavMartin and Zebus3D helped me with the first concepts of this patch.
So far I implemented some logic to balance the amount of particles that different CPU's can handle correctly, so far is working great in the different systems we used to test it :
i7-950 with 48Gb of RAM
i7-7700HQ with 24 Gb of RAM
Threadripper 2990WX with 64Gb of RAM
There is a notable speed increase, at first I thought the speed increase will just affect particle creation time, but it seems it's affecting too simulation baking time, I have not explored that part of the code yet, but I wanted to keep this patch simple since this is improving particle creation a lot.
So far is pretty stable with our tests, but any testing would be very welcome.
It's my first diff so allow me to express my happiness with this, even if in the end this is not included in master.
I hope this is included even taking into account that particles are going to be rewritten, but for the time being this is a huge boost in particle performance, @Sebastián Barschkis (sebbas) tested this concept and he also noticed an speed improvment with mantaflow, @Martin Felke (scorpion81)
will also test this but our first impressions is that as it is right now this is a huge improvement over the hardcoded value we had in the past.