Page MenuHome

Proposed fix - T43221
Needs RevisionPublic

Authored by Pedro Silva (psilva) on Dec 9 2017, 1:25 AM.

Details

Summary

Hi!

As suggested in T43221, wrote string compare functions (BLI_strcasecmp_utf8, BLI_strcasestr_utf8) using Py_UNICODE_TOLOWER within a helper function (BLI_str_utf8_lower_as_unicode_and_size) in order to compare lowercase Unicode characters. Since the Py CAPI is used, these functions are only safely defined (and called) when Python is in the build environment, which meant having to add some ifdef guards and make some changes to the blenlib CMakeLists.txt.
Note that Py_UNICODE_TOLOWER has been deprecated since Python 3.3, but I found no equivalent function exposed by the API.

Diff Detail

Repository
rB Blender

Event Timeline

Pedro Silva (psilva) created this object with visibility "All Users".

Thanks for the fix, would really rather avoid adding deprecated functionality to new code.

Having BLI call into python is quite bad, discussed with @Sergey Sharybin (sergey) and we agreed this is the way to go:

  • Add an extern/ library that abstracts Python unicode API out.
  • When Python is used, the extern library can use Pyhon as-is.
  • When Python isn't used or the function is removed, the API will use its own definitions.

Checking Python's unicode files, this is fairly simple to extract unicode functions without pulling in the rest of Python.


An alternative to this is using an existing library - eg: https://stackoverflow.com/questions/313555

Campbell Barton (campbellbarton) requested changes to this revision.Dec 13 2017, 12:27 PM
This revision now requires changes to proceed.Dec 13 2017, 12:27 PM

It may be simplest & best to use tolower_l (versions of this are available on posix & win32) as long as this is only used for UI code, different results on each system depending on the locale & minor differences in implementation won't matter.

Campbell,

Thanks for the help!

On my system, tolower_l does not return the correct equivalent to uppercase Cyrillic characters (nor does tolower after setlocale) . It might for some systems, but I'd imagine this would be a hassle to maintain even if it did.
Regarding your previous comment, I'd imagine using a nice library like utf8proc (around 2M of source) would make for a cleaner solution than cherry-picking CPython code whenever needed.
Do you have any preference for a solution?

@Pedro Silva (psilva) - am not too keen on adding 2mb library to Blender for a single function that already exists (even if it's a bad level call).

Posted an issue to CPython since their intention isn't clear here: https://bugs.python.org/issue32354

Think it could be worth spending some time to investigate if tolower can be made to work as well.
Maybe some simple trick to set the right locale. Try ask on https://stackoverflow.com - with a simple test case for eg.

Pedro Silva (psilva) changed the visibility from "All Users" to "Public (No Login Required)".Dec 18 2017, 7:47 PM

Campbell,

I see your point. Thank you for filing the bug.
I'll see what I can do with tolower_l.