Page MenuHome

Fix for incomplete retrieval of system/user resource paths due to a Blender installation path with non-ASCII chars
AbandonedPublic

Authored by Tamito Kajiyama (kjym3) on Jun 18 2014, 7:10 AM.

Details

Summary

Fix for incomplete retrieval of system/user resource paths due to a Blender installation path with non-ASCII chars.

The proposed changes are in line with the approved patch D595.

Diff Detail

Event Timeline

Campbell Barton (campbellbarton) requested changes to this revision.Jun 19 2014, 8:11 AM

Not sure about this patch, D595 made sense because file could contain characters from within a blend file text-block.
But the paths here are all system paths.

If there are valid paths on the system which PyUnicode_DecodeFSDefault can't use, I think we should report the bug upstream to Python guys.

BLI_get_folder() returns a retrieved path after checking if the path actually exists by using BLI_is_DIR(), which in turn relies on BLI_exists() that takes a UTF-8 path as its argument. So I assumed BLI_get_folder() was expected to return a UTF-8 path. Since the file system default encoding may or may not be UTF-8, PyC_UnicodeFromByte() seems the right function to use.

Noticed that in BPy_init_modules() the return value of BLI_get_folder() is assumed to be UTF-8 encoded.

const char * const modpath = BLI_get_folder(BLENDER_SYSTEM_SCRIPTS, "modules");
if (modpath) {
      :
    PyObject *py_modpath = PyUnicode_FromString(modpath);
      :
}

I think PyUnicode_FromString() should be used through out bpy.c rather than PyC_UnicodeFromByte().

As far as I can confirm, the only chance that a folder name may be a non-ASCII, MBCS-compatible string (and not a UTF-8 encoded string) is when environmental variables such as BLENDER_USER_SCRIPTS contain non-ASCII characters and the system locale is set to an MBCS code page. A possible fix of it is to check the return value of getenv() and immediately transcode to UTF-8 before the string is passed around in the code. I guess Blender environmental variables are merely used on Windows (I have never seen problem reports related to these environmental variables), so we can leave it open for future work.

Tamito Kajiyama (kjym3) updated this revision to Unknown Object (????).Jun 23 2014, 9:47 AM

Replaced PyC_UnicodeFromByte() with PyUnicode_FromString().

All paths being passed around inside Blender are supposed to be UTF-8 encoded.

The only exception is when non-ASCII, MBCS-compatible paths are specified through Blender environmental variables such as BLENDER_USER_SCRIPTS. This patch does not address this case.

@Tamito Kajiyama (kjym3). this isnt correct, paths in blender dont have to be utf8, thats why PyC_UnicodeFromByte was added.

The best practice is to convert incoming (outgoing) strings to (from) UTF-8 immediately.

It's not a good idea at all to allow UTF-8 incompatible strings to circulate in the Blender code base. Such junk strings will be major sources of bug reports due to encoding errors here and there.

As to the present patch, the only related issue I can confirm is that Windows getenv() (called within BLI_get_folder()) may return a string in MBCS (e.g., code page 932 on Japanese Windows). I would propose a MBCS to UTF-8 encoding conversion immediately after the call of getenv() rather than letting the MBCS string come into the code without an encoding check.

@Tamito Kajiyama (kjym3) - issue is some paths aren't always utf8, some common latin1 directory names are common.

PyC_UnicodeFromByte - I tested to work well on Linux with such paths, (importers, exporters), so I rather use same function here.

Probably a correct solution here (for windows only), would be to follow what BLI_fopen, BLI_stat... etc, see ufopen, and run UTF16_ENCODE on byte strings, then convert the utf16 to a Python string. But Id rather not do this unless we have no alternative.

Committed fix for T40766, use PyC_UnicodeFromByte instead