Page MenuHome

getting GPU render crashes on all versions after (and including) 2.83
Closed, ArchivedPublic

Description

System Information
Operating system: Windows-10-10.0.19041-SP0 64 Bits
Graphics card: GeForce GTX 1070/PCIe/SSE2 NVIDIA Corporation 4.5.0 NVIDIA 456.55

Blender Version
Broken: version: 2.83.0, branch: master, commit date: 2020-06-03 14:38, hash: rB211b6c29f771
Worked: (newest version of Blender that worked as expected)

Short description of error
Blender usually crashes mid way through GPU render

Exact steps for others to reproduce the error
Might be a denoising issue. I've already posted a report about render crashes elsewhere for 2.9+ but I was hoping to get help with this because, due to the crashes in 2.9x I'm having to go back to 2.83 and I have client work to do, so it's costing me a lot of time.
Blender used to crash due to the issue that's somewhat fixable using the regedit method of setting the TdrDelay value to something like 16. That sorted it for a while but it's come back and I can confirm the TdrDelay value is still there.
One other symptom - sometimes Blender with get incredibly slow at GPU rendering, going from something like 20 minutes per render to several hours. When I look at task manager, it shows a huge draw on RAM and almost none on GPU.
A computer restart will fix this. Just something else to add to the list.

Event Timeline

Robert Guetzkow (rjg) changed the task status from Needs Triage to Needs Information from User.EditedNov 1 2020, 8:02 PM

Just like in your other issue, we will need the crash log and debug logs. Please follow the instructions provided in your other ticket. The issue with OptiX denoising was limited to 2.90.1 and is fixed in 2.91, so that is most likely not the reason.

been trying to get it to crash but it won't. But I do get this:

usually this is resolved by restarting Blender. It's rendering again but the big issue is that it's just not using the GPU. I can select it and when it renders, the RAM is maxed out and GPU is around 0% and renders take 10 x as long.

If you're using all of your RAM and it occasionally crashes, it seems that you're running out of memory. As for the GPU, the error you're seeing can have many causes (see Blender's manual). This may be caused by exceeding the Timeout Detection and Recovery Delay (TDR-Delay), after which the OS resets the state of the graphics driver because the device appears unresponsive.

It should also be noted that the Task Manager doesn't show CUDA usage by default. You will have to switch to the Performance tab, select the GPU then change one of the panels to CUDA by clicking on the name.

@Andrew Walsh (andywalshart) Have you checked if you're running out of memory and whether GPU truly isn't used?

So I can confirm that the GPU is being used as per the check you mentioned in the Task manager. So that's cool.

But as for the TDR delay, I mean that's a hell of a lot of studying I'd have to do to figure that one out. All I know is, my registry looks like this after I applied a simple 'fix' I saw online somewhere:

But in all the blurb here: (https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys) it's just over my head. I mean, what can be done for the average Blender user who's seeing these crashes? Maybe a diagnostic utility would be useful. I do know that my scenes are always pushed to near breaking point. Never less than 5 million faces and there's a few 4k textures, lots of 2k textures etc.

Not sure how common it is for people's GPUs to be tested that much. But if it is common, it would seem necessary to me for Blender to have some kind of interface between the scene's 'heaviness' and the system's GPU, like a resource monitor or something that gave you a red warning because certain assets were weighing things down etc. Feels like I'm too in the dark as to how my scene is impacting my GPU.

@Andrew Walsh (andywalshart) Running out of memory and a TDR delay that is too short, are two different issues. The former is what will cause a crash (the entire application terminates), the latter should usually just stop the rendering and give you an error message, e.g. cuCtxSynchronize or cuCtxCreate.

When you are using a memory intensive application the OS can provide more memory than is physically available by a mechanism known as virtual memory and paging. Essentially it uses the hard drive to temporarily store memory pages that don't fit into RAM. However, even virtual memory has its limits since the page file has a maximum size it is allowed to have. Windows will terminate any application that tries to allocate more virtual memory than it can supply.

Now GPUs have their own memory. Blender supports that if you run out of memory on the GPU when using CUDA, the system's RAM can be used to compensate. When you run out of physical RAM, the OS uses paging to disk. Each type of memory in these steps gets slower and slower to access. Additionally, it increase the need to synchronize large amount of data between GPU, RAM and disk. Hence if you have a scene that doesn't fit in the GPUs memory, you may still be able to render it provided you have enough RAM, but it will be slow. If you need more memory than the system can give you, the application is terminated.

The TDR delay is safety measure against an application hogging GPU resources, freezing and thus locking the system. The OS will reset the graphics driver state after this time.

Blender will already show you the used memory during rendering and you can of course use the task manager on Windows. The TDR delay can be adjusted depending your needs, but I agree this is not something the average user would know how to do. It's not something that Blender can fix though, since this is a OS configuration issue. The guide provided on this website should be quite helpful for this though.

Robert Guetzkow (rjg) closed this task as Archived.EditedNov 19 2020, 10:49 AM
Robert Guetzkow (rjg) claimed this task.

I'm closing this ticket for now since this appears to be either a system configuration issue or the system is running out of memory.

For future bug reports we need instructions to reproduce the problem on our system. If this issue is limited to a particular project, we will need a minimal version of it as well.

ok, thanks for all the info. Sorry if I'm unable to devote more time to it, I just get really busy. But I think in future I'll try to keep texture sizes as small as possible and see if that helps.