Page MenuHome

Multithreading for SPH solver
Closed, ArchivedPublicPATCH

Description

This patch uses OpenMP to add multithreading to the SPH solver. Both collisions and SPH integration are multithreaded, so the speed-up can be considerable. The following are benchmarking results using the BasicSPH.blend test from #27442[1]. Each test was run twice: with 20 constant subframes (first time), and with adaptive subframes with a Courant target of 0.1 (second time).
- Blender trunk r42721:
93s, 20s
- Patch applied to r42721, single-threaded (with OMP_NUM_THREADS=1):
94s, 19s
- Patch applied, with 4 threads:
26s, 5s
- Patch applied, with 8 threads:
21s, 4s

So in this case, a speed up of about 4.4x was achieved on a machine with 4 cores with Hyperthreading, or 3.5x without Hypterthreading.

The patch builds on the work done in #29661[2]. Two versions of the patch are attached:
- SPH_omp_diff29661_1_svn.patch, which applies on top of #29661
- SPH_omp_r42721_1_svn.patch, which includes #29661 and applies against trunk. This is provided for convenience of testing only; it is expected that patch #29661 will be committed separately.

[1] http://projects.blender.org/tracker/?group_id=9&atid=127&func=detail&aid=27442
[2] http://projects.blender.org/tracker/index.php?func=detail&aid=29661&group_id=9&atid=127

Event Timeline

By making a contribution to this project, I certify that the contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file.

Signed-off-by: Alex Fraser <adfries@vpac.org> on behalf of VPAC Ltd.

Benchmark tests were run on a Intel Core i7 920 CPU (quad-core with Hyperthreading).

Tried this against r42761 . That is the patch that applies both SPH_omp_r42721_1_svn.patch

It seems to work well and massively improves times in line with that reported. I ran the test with the suggested test file but increased particle numbers to 10,000 emission end time =100 and simulation time to 250 frames. i7 2.8Ghz 4core plus hyperthreading, OSX 10.7.2 using 4.6.1 GCC 64 bit build with SCONS.

I tested the unmodified vs the modified and found that with the longer simulation and more particles involved produced visible differences in the end result. Results using different # of threads AND with the modified code seemed to be consistent and repeatable. All test were with no subframes.

Would it matter so much if older simulations did not converge exactly as they did before?

I confirm it works on Ubuntu 64bits rev42771.
Speed improvment is really sensible on my quad core.

Alex, could you help stephen to make a multhreaded version of his surface polygonizer patch when this one would be accepted ?
http://pfs.planetblender.org/?p=35

Thanks for testing!

@Mike, I'm glad to hear it works well on OSX. I had heard that OpenMP doesn't work so well with some configurations under OSX. I have been testing it on GNU+Linux (Ubuntu), and Windows.

The different particle locations compared to an un-patched Blender are due to the bug fix patch, #29661. That patch changes which particle state is read from when calculating the next frame, which I think was in error previously. It also changes how many neighbours can affect a particle. Both of those bug fixes will change the shape of the fluid (it's more correct now). The old code was not thread safe.

@Ronan, hmm, that looks cool. I'd like to help, but I'll have to see how much time I have for it.

A quick note on collisions: The collision tests are multithreaded, but there is a tree-building step that happens once per time step which is not. So this patch mostly gives speed improvements when simulating a large number of particles, and not so much for a very heavy mesh. If the BVH tree building functions are ever made thread-safe, this could be improved.

Committed with Janne's approval, r43069.

Alex Fraser (z0r) changed the task status from Unknown Status to Unknown Status.Jan 2 2012, 1:12 PM