Page MenuHome

Improve proxy building performance
ClosedPublic

Authored by Richard Antalik (ISS) on Mar 16 2021, 12:50 AM.

Details

Summary

There are minimal changes to current code:

  • Use h264 codec for output
  • Specify number of threads for encoding to be same as system thread count
  • Specify same nuber of threads for decoding. This may work only with some codecs(only h264 tested so far), but performance gain in encoding improves overall performance by big margin still. I have tested variety of codecs, and all were transcoded properly.

This is much simpler and straightforward patch than previous two, and this was in fact first thing I have tried to do in the beginning, but there was no improvement unless I have removed following lines:

rv->c->thread_count = BLI_system_thread_count();
rv->c->thread_type = FF_THREAD_SLICE;

I am not even sure how I found that these two lines were problematic.


Performance vs filesize comparison, source file was h264, 1920 x 1080, 2002 frames, 51M filesize
Old MJPEG implementation

sizetranscoding timefilesize
2513.7s44M
5017.2s116M
7520.1s197M
10021.4s297M

Results for this patch

sizetranscoding timefilesize
252.0s30M
503.7s74M
756.2s127M
1008.0s192M

Transcoding VP9 coded resulted in performance and filesize.

Quality vs filesize for same source file:

qualityfilesize
157M
1063M
2078M
3086M
40106M
50117M
60143M
70158M
80192M
90211M
100253M

These tests were done on 16 core machine

Diff Detail

Repository
rB Blender
Branch
arcpatch-D10731 (branched from master)
Build Status
Buildable 13537
Build 13537: arc lint + arc unit

Event Timeline

Richard Antalik (ISS) requested review of this revision.Mar 16 2021, 12:50 AM
Richard Antalik (ISS) created this revision.

I think I have found better solution for proxy building performance. Patch is very simple and there should be no undesired issues.
I can do more testing tomorrow, so far it processed everything I threw at it.

Point is that i think this would be still OK to commit

How does the encoding times compare between this, the previous and the built-in solutions?

How does the encoding times compare between this, the previous and the built-in solutions?

This depends on core count, on my machine it's 2x faster than previous patch as there was diminishing returns with thread count. that doesn't seem to be case for this patch.

Depending on a codec used, different threading strategy might indeed be needed for best performance.

Can you please share some quantified numbers? Timing, disk space usage, things like that. Think this is the biggest missing bit of information for me. The code seems fine otherwise.

Depending on a codec used, different threading strategy might indeed be needed for best performance.

Yes this should be optimized per each (reasonable) codec. Currently this is optimized for h264 input files as these are most popular.
As I said in description boost done by encoding will improve performance significantly still.

Can you please share some quantified numbers? Timing, disk space usage, things like that. Think this is the biggest missing bit of information for me. The code seems fine otherwise.

Will gather some data and add to description

  • Forgot to implement quality range from slider.
Richard Antalik (ISS) edited the summary of this revision. (Show Details)Mar 16 2021, 4:01 PM
Richard Antalik (ISS) edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Mar 16 2021, 4:56 PM
Richard Antalik (ISS) edited the summary of this revision. (Show Details)Mar 16 2021, 6:18 PM
This revision was automatically updated to reflect the committed changes.