Page MenuHome

BGE - LibLoad/LibFree crash and leak (caching issue?)
AbandonedPublic

Authored by Jorge Bernal (lordloki) on Jan 16 2015, 10:56 PM.

Details

Summary

The crash bug of LibLoad/LibFree calls is caused by dangling pointers left in the material caching maps by the KX_BlenderSceneConverter::LinkBlendFile function.
The crash itself pops out in RAS_BucketManager but the bucket manager looks fine.
Disabling material caching in the ui solves the issue.

The pointed values are freed at KX_BlenderSceneConverter::FreeBlendFile (~1374 and ~1391)

At that point the two caching maps m_mat_cache and m_polymat_cache might still contain pointers to the same values but the maps are not cleaned up.

The pointer to the materials are created by RAS_MaterialBucket* material_from_mesh (BL_BlenderDataConversion.cpp ~911), starting from KX_BlenderSceneConverter::LinkBlendFile.

And there is the second issue. When KX_BlenderSceneConverter::LinkBlendFile links the scene, it creates a temporary scene, KX_Scene* other, whose content is merged with the active scene. The data pointed by other is then released and I didn't find any place where the value "other" might have been stored for later use.
The problem is that when the temporary scene is created, that KX_Scene* value looks to be used as the key for the cached materials (in the BL_BlenderDataConversion.material_from_mesh function). If I'm right (big if) and that "other" is not some way kept somewhere, there is no way for the cached materials to be accessed by key.
Initially I thought that the correct thing to do - in the diff below - was to remove the cached values associated to the temporary scene when the pointer was released.

But that also solves the crash problem of the dangling pointers. This oddity makes be believe that simply removing the "other" key from the caching maps is like disabling the entire caching system.

Another approach - and probably the correct one - could be to do as KX_BlenderSceneConverter does in its MergeScene function, that is re-linking also the cached materials.

I'd like to hear some thoughts on this stuff before to start looping through maps of maps of lists of tuples trying not to destroy everything.

The diff just marks the key points of the problem.

Diff Detail

Event Timeline

Pierluigi Grassi (pgi) retitled this revision from to BGE - LibLoad/LibFree crash and leak (caching issue?).
Pierluigi Grassi (pgi) updated this object.

This second diff contains the way I was thinking to deal with the pointers.
The diff works in that it prevents both the memory leak caused by materials and the sisgsegv but, for some reason, I have this feeling that it is a workaround and not a proper fix.

Update. I did more tests and I found that in diff 3179, if I remove the calls:

m_polymat_cache.erase(other);
m_mat_cache.erase(other);

and leave the calls:

removeCachedMaterial(...
removeCachedPolyMaterial(...

the engine still crashes with a SIGSEGV. That wasn't supposed to happen - with no "erase" calls the caching maps should go on leaking memory but there should be no dangling pointers in them, because those have been removed by the "remove" functions.

Nice, my remove functions in 3179 don't remove a thing from the map. That might be why they don't address the "you have to remove things" issue. I'll fix that, sorry.

Previous diff had a copy constructor issue (= creates copies, who would have guessed that).
This fixes the crash bug by removing stale pointers in the cache maps. I have purposedly avoided in place removal in the cleaning functions.
I'll do more tests to see if this patch also addresses the memory leak, even though I think that it should be handled in a different way.

3184 doesn't change the leaking problem. It was kinda illogical to think so, because the growth of the cache maps is due to "other", begin added over and over again. Some time its value is recycled but that doesn't cause any issue because the remove functions ensures that the corresponding sub-maps are emptied. Probably.
One workaround is to add to the remove function the code to also remove a map from the cache when that map contains no values.
I've checked it, it works.
Yet "other" remains as key of the cache maps and it is still stale. It cannot be used to access any cached material info - because there is none - but it is still an error: it's a dangling pointer after all.
I think that the correct way to address the leak issue is to ensure that MergeScene also merges the cached values, removing the stale key after it's done. I will test that.

Hi @Pierluigi Grassi (pgi) can you attach a sample file to help testing the patch?

Sure, here it is. Open mainfile.blend, wait for sigsegv (it happens spuriously but always happens). All bug reports about crashing during libload/libfree are - I believe - related to this so those are valid test cases too.

I'm going to close this revision as it was solved. (D1278)

Thanks pgi for your time and code to catch the issue.