Fix for incomplete retrieval of system/user resource paths due to a Blender installation path with non-ASCII chars.
The proposed changes are in line with the approved patch D595.
Differential D604
Fix for incomplete retrieval of system/user resource paths due to a Blender installation path with non-ASCII chars Authored by Tamito Kajiyama (kjym3) on Jun 18 2014, 7:10 AM. Tags None Subscribers None
Details
Fix for incomplete retrieval of system/user resource paths due to a Blender installation path with non-ASCII chars. The proposed changes are in line with the approved patch D595.
Diff Detail Event TimelineComment Actions Not sure about this patch, D595 made sense because file could contain characters from within a blend file text-block. If there are valid paths on the system which PyUnicode_DecodeFSDefault can't use, I think we should report the bug upstream to Python guys. Comment Actions BLI_get_folder() returns a retrieved path after checking if the path actually exists by using BLI_is_DIR(), which in turn relies on BLI_exists() that takes a UTF-8 path as its argument. So I assumed BLI_get_folder() was expected to return a UTF-8 path. Since the file system default encoding may or may not be UTF-8, PyC_UnicodeFromByte() seems the right function to use. Comment Actions Noticed that in BPy_init_modules() the return value of BLI_get_folder() is assumed to be UTF-8 encoded. const char * const modpath = BLI_get_folder(BLENDER_SYSTEM_SCRIPTS, "modules");
if (modpath) {
:
PyObject *py_modpath = PyUnicode_FromString(modpath);
:
}I think PyUnicode_FromString() should be used through out bpy.c rather than PyC_UnicodeFromByte(). As far as I can confirm, the only chance that a folder name may be a non-ASCII, MBCS-compatible string (and not a UTF-8 encoded string) is when environmental variables such as BLENDER_USER_SCRIPTS contain non-ASCII characters and the system locale is set to an MBCS code page. A possible fix of it is to check the return value of getenv() and immediately transcode to UTF-8 before the string is passed around in the code. I guess Blender environmental variables are merely used on Windows (I have never seen problem reports related to these environmental variables), so we can leave it open for future work. Comment Actions Replaced PyC_UnicodeFromByte() with PyUnicode_FromString(). All paths being passed around inside Blender are supposed to be UTF-8 encoded. The only exception is when non-ASCII, MBCS-compatible paths are specified through Blender environmental variables such as BLENDER_USER_SCRIPTS. This patch does not address this case. Comment Actions @Tamito Kajiyama (kjym3). this isnt correct, paths in blender dont have to be utf8, thats why PyC_UnicodeFromByte was added. Comment Actions The best practice is to convert incoming (outgoing) strings to (from) UTF-8 immediately. It's not a good idea at all to allow UTF-8 incompatible strings to circulate in the Blender code base. Such junk strings will be major sources of bug reports due to encoding errors here and there. As to the present patch, the only related issue I can confirm is that Windows getenv() (called within BLI_get_folder()) may return a string in MBCS (e.g., code page 932 on Japanese Windows). I would propose a MBCS to UTF-8 encoding conversion immediately after the call of getenv() rather than letting the MBCS string come into the code without an encoding check. Comment Actions @Tamito Kajiyama (kjym3) - issue is some paths aren't always utf8, some common latin1 directory names are common. PyC_UnicodeFromByte - I tested to work well on Linux with such paths, (importers, exporters), so I rather use same function here. Probably a correct solution here (for windows only), would be to follow what BLI_fopen, BLI_stat... etc, see ufopen, and run UTF16_ENCODE on byte strings, then convert the utf16 to a Python string. But Id rather not do this unless we have no alternative. |