Page MenuHome

Cancelable long running operations (Prototype).
Needs ReviewPublic

Authored by Jacques Lucke (JacquesLucke) on Nov 13 2021, 4:05 PM.
Tags
Tokens
"Love" token, awarded by Limarest."Like" token, awarded by GeorgiaPacific."Love" token, awarded by HEYPictures."Love" token, awarded by franMarz."Love" token, awarded by Yuro."Love" token, awarded by mindinsomnia."The World Burns" token, awarded by EAW."Love" token, awarded by Alaska."Burninate" token, awarded by kioku."Burninate" token, awarded by HooglyBoogly."Burninate" token, awarded by manitwo."Burninate" token, awarded by erik85.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

One long-standing issue with Blender is that when the user accidentally starts a long-running operation, there is no way to stop it without force-closing Blender. We support cancellation in a few places already, most importantly rendering and baking. However, for everything else we don't really have a system to make that work. Typically, one would try to split up the ui thread from the processing thread to keep the ui responsive. Unfortunately, this approach is very hard to use in Blender, because generally all ui code expects everything to be evaluated. Data corruption would be very likely, even more so when some Python operator is running while we want to redraw the ui.

Fortunately, there seems to be another approach, which is not as good but solves the biggest problems as well: Instead of separating the ui from processing thread, we can try to separate the user-input from the blender-main thread. This would allow us to retrieve events from the user while Blender appears frozen. The user-input thread could request task-cancellation from the processing threads via a global variable.

This patch is a prototype that implements this separation using a new ghost event consumer that runs on a separate thread. It gets all the events even when the main thread is still doing something else. I already found that the current implementation does not work on windows. The reason seems to be that only the thread that created the window can also retrieve the events. So a better solution seems to be to move more of the ghost interaction to a separate thread, which still seems doable with a bit more work.

Overall, given that this patch already works in linux, it feels very achievable to get it working on other platforms as well. I think the benefits of having cancelable operations is huge, even if we don't have the even-better ui thread separation.

To test the patch (on linux) do the following:

  1. Create a new script in Blender like the one below.
  2. Start it and note that Blender is frozen.
  3. Hit shift+esc to stop the script.
  4. Blender is now in a "cancelled" state which allows the user to "fix" the scene.
  5. Click the "Enable Processing" button in the topbar to make Blender behave like usual again.
import bpy

while True:
    if bpy.context.cancel_requested:
        break

Some feedback on the general approach would be appreciated. Maybe someone also has a different/better idea for how to get that working?

Discussion Points:

  • How to retrieve events from the OS in a separate thread in a way that works on every platform?
  • What to do when an operation has been cancelled? Operators should just be undone, but cancelling the depsgraph is harder because Blender expects there to be an evaluated depsgraph for drawing.

Diff Detail

Repository
rB Blender
Branch
temp-process-cancel (branched from master)
Build Status
Buildable 18644
Build 18644: arc lint + arc unit

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Nov 13 2021, 4:05 PM
Jacques Lucke (JacquesLucke) created this revision.

Why would this be preferred over our wmJob system (which is what most cancelable background jobs are currently using)? It is designed for background jobs that keep the UI unblocked. Of course this could be improved, and could be exposed to Python in some way. (Scripts could also show progress bars then.)
Bringing threads into the already complex event management sounds like asking for trouble. AFAIK we had plenty of issues in the past with multiple event queues interfering with each other or not syncing as needed. Brecht may be able to tell you more.

Typically, one would try to split up the ui thread from the processing thread to keep the ui responsive. Unfortunately, this approach is very hard to use in Blender, because generally all ui code expects everything to be evaluated.

I'm not sure if this is actually much of a problem. At least not without concrete examples. Like said we do it in a bunch of places already.

Blender constructs UIs declaratively, meaning it always represents the state of the application at the time of drawing. Persistent state is minimized and mostly just internal UI state. This should make asynchronous processing and display much easier already. Many modern UI frameworks supporting high parallelism/decentralization were designed to be declarative (React, Flutter, SwiftUI, Android's new Jetpack Compose).
Of course designing/implementing such a threaded operation is still not trivial, but that would be the same with either approach.

Maybe I was not clear enough, or I am missing something. I'm mostly talking about things like modifier evaluation, so the operations that happen during depsgraph evaluation. It often happens that people e.g. accidentally generate too much geometry which effectively freezes Blender until. The goal is to allow the user to stop the freezing without having kill Blender and loosing work.

To my knowledge, separating the entire depsgraphs evaluation from the ui thread is quite a bit more complex than just using wmJob.

[Edit:] Also, just to be clear, if there is a good way for us to separate the depsgraph evaluation from the ui thread, I'd totally go for it. That would be much better, but I haven't found a good approach to make that work yet.

I don't think wmJob is the right solution to this. That is designed for asynchronous jobs that the user also experiences as such. However most operations should be experienced synchronously by users, but still support cancelling and progress display if they happen to be slow.

I think we are concerned with the following operations being potentially slow when they work on scene data:

  • Operators
  • Depsgraph evaluation
  • Drawing

These operations in general are reading/writing the same scene data, and cannot be done in parallel to each other. One because it would require additional memory and code complexity in many operations, more than is practical. But also because they really must be done in order often, because e.g. a drawing a button value might depend on a driver through depsgraph evaluation, or a selection operator depends on an object transform through depsgraph evaluation.

One design question then is, how should potentially long running operations report progress, be cancelled quickly, and then recover to a valid state again?

  • For operators, I think you need to manually add progress reporting and cancellation tests in various places, like loops over objects that the operator is working on. Then for recovery, sometimes the operator will be able to cancel and restore the previous state already (like a modal transform operator). I think we also need a new operator return code that makes it restore the previous state using the undo system, since otherwise it would be a lot of work to implement that for almost every operator.
  • For depsgraph evaluation, you again need progress reporting and cancellation tests in various places. Some in the generic depsgraph scheduling, some inside specific operations that are known to be slow. Safely cancelling a depsgraph operation is not always simple, there may be a following operation or drawing code that depends on a certain state being initialized correctly. This can be approached from two sides. We can only support cancelling certain safe operations and continue executing others, or make cancelling more safe in general. Or destroy a cancelled depsgraph and block most operators and drawing until it has been re-created and evaluated fully.
  • For drawing, probably not worth going into this soon.

The other questions is how to handle threading and synchronization. I would start with the assumption that we don't execute operators, depsgraph evaluation and drawing working with scene data in parallel to each other. What we can do in parallel to those things is handle events, draw progress bars and cached editor display buffers without accessing scene data, or redraw an individual button whose value is being edited.

I think you need two separate threads for that, one for potentially slow operations on scene data, and one for operations that are safe to do in parallel to them. Either could be the main thread or an additional thread. However keeping most operators on the main thread seems like the least invasive change, also for Python API compatibility.

The secondary thread could then perform cancel event handling as in this patch, and later also display a progress bar or other message indicating that Blender is working and not frozen. Handling events directly from GHOST level as in this patch is easiest as you don't need to worry about e.g. a file load operator destroying the window in parallel. If you want to display something in the window, you will need some mutex/synchronization between the two threads to avoid that problem. That gets more tricky, and my comment is already getting too long.

Overall the idea behind this patch seems to align with my thinking, but I have not looked into the specifics of handling events in multiple threads or safe depsgraph eval cancellation.

Yuro (Yuro) added a subscriber: Yuro (Yuro).

Very interested in the outcome of this and how such a system might be tapped into by addon developers like myself to offer some form of progress during execution of operators and feedback / option to cancel or pause operations.

In my case this is particularly of great interest because of an addon I've started developing recently, that must perform an extremely long task, that in some cases can take hours to finish.

I've been struggling with is the lack of options to offer a progress bar or cancel button. Right now the only option is to force close Blender.

(For my particular use case) I don't require the user to be able to still fully interact with the entire Blender interface, I only require the ability to present some kind of blocking modal over the window, that allows the user to see what is currently happening, the current state of progress with an option to pause or cancel the long running task.

Something like this basically, but obviously Blender UI theme in appearance:

@Brecht Van Lommel (brecht) Perhaps the option to cancel long running operations could be automatically presented based on the length of time an operation has been running. Whenever a long running task is executing, either an operator, or depsgraph evaluation, a timer could monitor how long the task has been running, and if it exceeds some kind of threshold (for example 400ms, based off "Doherty Threshold" from https://www.lawsofux.com ) , present the progress modal automatically over the UI with an option to cancel / undo the current running task and revert back a step. This would work nicely for situations where a user has accidentally typed a massive or tiny number into a modifier or node and immediately knows what mistake they've made when they hit enter, giving them the opportunity to revert back a step. It also has another benefit, not showing progress bars for all tasks, if a task executed quickly enough to avoid the threshold.

+1 for this capability.

  • It could be enabled in isolation (operators that want to check for cancel could enable event consumer and thread instead of idling continuously).
  • It would be interesting to support cancelling script execution (console & text editor), making Escape trigger a SIGINT (as if Ctrl-C was pressed in the terminal).

Note that rB37b256e26feb454d9febd84dac1b1ce8b8d84d90 added support for threaded event handling but doesn't directly impact this patch as far as I can see.

One of the main issues I ran into was that on Windows only the thread that created the window can retrieve the events (I'm not really deep into this topic, so maybe I'm missing something). In this case, creating a separate thread in ghost to handle events while Blender is doing something else didn't work (I tried it in this patch). Instead we'd have to move at least window creation to that separate thread as well.

If I understand rB37b256e26feb: Fix T100855: Input while Blender is unresponsive exits under Wayland correctly, it also creates a new thread in a way that would not work for Windows.