Page MenuHome

Tests: performance testing framework
ClosedPublic

Authored by Brecht Van Lommel (brecht) on Jun 21 2021, 6:20 PM.

Details

Summary

This is an evolution of scripts I have been using locally for Cycles
benchmarking. The idea is to make this available to other developers, and to
make benchmarking of other areas of Blender easy as well. Eventually this
should then also run on the buildbot to track Blender performance over time.

Main features:

  • User created configurations to quickly run, re-run and analyze a selected subset of tests.
  • Supports both benchmarking with existing builds, and automatic building of specified git commits, tags and branches.
  • Generate HTML page with bar and line graphs from test results.
  • Controlled using simple command line tool.

See the README for more details on how to use it.

Notes:

  • The need to create configuration files adds some complexity, but I found this to be important. I usually re-run the same set of tests many times for a given optimization or project I'm working on. Manually repeating steps is tedious and error prone.
  • The automatic building of revisions can be convenient but is rather fragile and may need users to tweak build configurations. I find it quite convenient for quickly testing a patch and the revision before, it without disturbing my regular source and build directories. But not entirely sure we should keep this.
  • Formating of printed results on the command line could be improved, to more easily compare results between revisions.
  • For continuous integration, more design and implementation will be needed to integrate in the buildbot. The output is written to JSON files, which could be aggregated somewhere. The code to generate a graph from JSON files was written to not rely on any configuration or Blender build, and can generate a single HTML page using results aggregated from multiple machines.
  • Currently lib/benchmarks is used as a source of .blend files and assumed to be available. There's currently only cycles scenes there, and even those are not yet set up to work well for cycles-x and quick benchmarking. How to manage this is unclear still.
  • This is not currently designed to interop with Open Data. It could be made more compatible, but it's unclear if it's worthing tying these things together. I feel comparing hardware and measuring Blender performance over time are quite different things with different design decisions.

Ref T74730

Diff Detail

Repository
rB Blender
Branch
performance
Build Status
Buildable 15364
Build 15364: arc lint + arc unit

Event Timeline

Brecht Van Lommel (brecht) requested review of this revision.Jun 21 2021, 6:20 PM
Brecht Van Lommel (brecht) created this revision.

Configuration file cycles-x/config.py:

devices = ['OPTIX']
categories = ['cycles']
builds = {
  'cycles-x': '/home/brecht/dev/build_linux/bin/blender',
  'master': '/home/brecht/dev/worktree_build/bin/blender',
}

Terminal output:

$ ./benchmark run cycles-x
barbershop_interior                      cycles-x                       7.5451s
barbershop_interior                      master                         40.3464s
bmw27                                    cycles-x                       8.3284s
bmw27                                    master                         10.2636s
classroom                                cycles-x                       11.4427s
classroom                                master                         25.2872s

Graph:

Brecht Van Lommel (brecht) planned changes to this revision.Jun 21 2021, 6:23 PM

Some code cleanup and commenting is needed still. Not sure who wants to review this. Mainly looking for feedback on the overall design at this point, and understanding if others would be interested in using this.

Nice work, and great to see the "in-house" tools are becoming more available for all developers.

The configuration file seems tricky. What is going to happen if one have multiple OptiX-capable devices?

For the CI type of things, you also don't want to maintain the configuration on every machine. Is it possible to configure the script to get benchmark on every available compute device?

Multi GPU device support

For multiple devices, I made it work like this now.

$ ./benchmark list
DEVICES
AMD Ryzen Threadripper 2990WX 32-Core Processor (Linux)            CPU
NVIDIA RTX A6000 (Linux)                                           CUDA_0
NVIDIA RTX A6000 (Linux)                                           OPTIX_0

TESTS
...

And then in the configuration you could for example have:

devices = ['CPU']
devices = ['OPTIX_0']
devices = ['OPTIX_1']
devices = ['OPTIX_*']
devices = ['CPU', 'OPTIX_*', 'CUDA_*']
devices = ['*']

Test and category names support similar wildcards already.

From the feature creep department: Running the benchmarks on regular intervals is great, but to make the data a little more useful something more may be needed, @Dalai Felinto (dfelinto) already has a grafana+influxdb instance setup at https://metrics.blender.org/ . Influxdb is incredibly easy to work with, posting data is usually nothing more than a curl command. (but a python lib is available if required)

That being said, I don't think not having that integration should stop this from landing

InfluxDB + Grafana are quite a big ask for a developer to set up locally, just to generate some graphs. The JSON + Google Charts used now requires no setup, but is not as powerful. I'm not sure if we should support just one, or both.

There are also some complexities:

  • Devices, test categories and tests will change over time. The number of graphs shown needs to dynamically adapt to that.
  • Sometimes you need to remove a bad run, or drop old results because some change in Blender or tests files makes comparison before/after invalid.
  • The old implementation supported running tests multiple times and displaying error bars, I would like to restore that.
  • There also used to be support for showing the rendered image for each test/revision on the web page. Probably will not restore that, but maybe.

However we do this, probably best to figure out how to integrate this with CI and potentially InfluxDB + Grafana as a separate step.

@Brecht Van Lommel (brecht), Lovely!

For the Graphana story: is all interesting and something we should investigate. The way I was always imagining this is to have a "core" of benchmark shared across different end-points, so that we can ensure developers and CI metrics are always calculated the same way. But one thing to keep in mind: simple things should be simple, complicated things should be possible. For the local development it should be possible to easily setup the benchmark and use it, without extra dependencies.

@Brecht Van Lommel (brecht), Unfortunately, didn't have time to apply it and test, but am i right the script can already replace the benchmark scripts we are using for Cycles X? If so, let's make all the effort of pushing it to the "production" :)

For the review: on a top-level it seems fine, but some local aspects makes me a bit sad :( For example, we shouldn't mix old-style os.path with the more modern/easier pathlib. Not sure this is something you've been planning to do still, or want some help with addressing things like that.

Comments, type hints, license headers, replace os.path by pathlib.

  • Only create a lib folder symlink when using git init --build flag.
  • Put configs in top level benchmark folder instead of benchmark/configs.

@Brecht Van Lommel (brecht), Unfortunately, didn't have time to apply it and test, but am i right the script can already replace the benchmark scripts we are using for Cycles X? If so, let's make all the effort of pushing it to the "production" :)

It can replace the scripts, but I had to locally modify the contents of the lib/benchmarks folder to place the cycles-x .blend files there. That is the main thing to address before committing I think.

What we could do is:

  • Commit this to master, with a simple .blend file load timing test using the existing lib/benchmarks/cycles files. Then other developers will be able to add tests for animation playback, mesh editing, etc.
  • Create a lib/benchmarks/cycles-x folder with the new cycles-x test files.
  • In the cycles-x branch, enable benchmarking with those new files.
  • When merging to master, update lib/benchmarks/cycles to have a single set of .blend files and remove the cycles-x folder.

Does that seem reasonable? I'm unsure if it would be fine to continue committing more big files there, like a couple gigabytes of production files, or if that will put too much strain on the svn server.

For the review: on a top-level it seems fine, but some local aspects makes me a bit sad :( For example, we shouldn't mix old-style os.path with the more modern/easier pathlib. Not sure this is something you've been planning to do still, or want some help with addressing things like that.

I removed the usage of os.path now, just a few places I forgot. Along with other cleanups, it's ready to be reviewed at that level now.

Does that seem reasonable?

Sounds perfectly fine to me.

For the big files concern: I am not sure it will be different for SVN compared to how the libraries are handled. I don't think we should have that much of production files, as I don't think adding more of them doesn't really mean improved quality of benchmarking. From the disk usage on the server point of view, it wouldn't really matter that much if something is on "ftp" folder or in SVN. The possible downside is that with SVN there will be an "extra" copy of the files in the ".svn" folder. But all factors combined, I'd say we should just stick to SVN: just makes it easier to (a) setup the environment (b) make sure everything is up to date.

I removed the usage of os.path now, just a few places I forgot. Along with other cleanups, it's ready to be reviewed at that level now.

Cool. Left couple of inlined notes.

Not really a python person, but isn't it preferred to use typed classes instead of dicts (could be so that Entry should become own class) ?

Again, just on a note level. Don't really think those are stoppers or anything.

I'm almost tempted to say that we should go ahead and commit it, and solve possible issues later. Is not particularly fun to apply patches on the SVN checkout :)

tests/performance/api/config.py
117

Is there something like self.devices.append(device) ?

If there is, there are seems other places when it can be used.

tests/performance/tests/cycles.py
73

Shouldn't be reached.

LGTM for master. Will perform a test run here.

tests/performance/api/config.py
126
180

When quickly reading over the code it is not clear where the if statement ends and the body begins. indent is also not helping here. Perhaps use if (...)

tests/performance/api/device.py
34

Fails when running on machines that don't support Optix.

TypeError: bpy_struct: item.attr = val: enum "OPTIX" not found in ('NONE', 'CUDA', 'OPENCL')
Brecht Van Lommel (brecht) marked 4 inline comments as done.

Address comments.

Brecht Van Lommel (brecht) marked an inline comment as done.Jun 23 2021, 7:59 PM

Not really a python person, but isn't it preferred to use typed classes instead of dicts (could be so that Entry should become own class) ?

It was a dict to match JSON, but converting it to a dataclass is easy so I did that.

tests/performance/api/config.py
117

I like this syntax but I think .append() is more standard, will change it.

180

Using () aligns them even more. I now added a comment in the middle.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 5 2021, 12:44 PM
This revision was automatically updated to reflect the committed changes.