Page MenuHome

Win32 IME: Replace Usage of Language IDs
ClosedPublic

Authored by Harley Acheson (harley) on Aug 5 2021, 8:26 PM.

Details

Summary

Use of Language IDs on Windows is deprecated with capital letters and exclamation marks, for many reasons outlined below.

This patch replaces our (internal to Win32 IME only) usages of Language IDs with ISO-639 strings like "zh" for Chinese.

Microsoft's words of warning:

// ** DEPRECATED ** DEPRECATED ** DEPRECATED ** DEPRECATED ** DEPRECATED **
//
//  DEPRECATED: The Language ID  concept is deprecated, please use
//  Locale Names instead, eg: "en" instead of a LANGID like 0x09.
//  See the documentation for GetLocaleInfoEx.
//
//  Note that the named locale APIs (eg GetLocaleInfoEx) are preferred.
//
//  WARNING: Not all locales/languages have unique Language IDs
//
//  The following two combinations of primary language ID and
//  sublanguage ID have special semantics:
//
//    Primary Language ID   Sublanguage ID      Result
//    -------------------   ---------------     ------------------------
//    LANG_NEUTRAL          SUBLANG_NEUTRAL     Language neutral
//    LANG_NEUTRAL          SUBLANG_DEFAULT     User default language
//    LANG_NEUTRAL          SUBLANG_SYS_DEFAULT System default language
//    LANG_INVARIANT        SUBLANG_NEUTRAL     Invariant locale
//
//  This concept is deprecated.  It is strongly recommended that
//  applications test for locale names instead of Language IDs / LCIDs.
//
//  Primary language IDs.
//
//  WARNING: This pattern is broken and not followed for all languages.
//           Serbian, Bosnian & Croatian are a few examples.
//
//  WARNING: There are > 6000 human languages.  The PRIMARYLANGID construct
//           cannot support all languages your application may encounter.
//           Please use Language Names, such as "en".
//
//  WARNING: Some languages may have more than one PRIMARYLANGID.  Please
//           use Locale Names, such as "en-FJ".
//
//  WARNING: Some languages do not have assigned LANGIDs.  Please use
//           Locale Names, such as "tlh-Piqd".
//
//  It is recommended that applications test for locale names or actual LCIDs.
//
//  Note that the LANG, SUBLANG construction is not always consistent.
//  The named locale APIs (eg GetLocaleInfoEx) are recommended.
//
// ** DEPRECATED ** DEPRECATED ** DEPRECATED ** DEPRECATED ** DEPRECATED **

Diff Detail

Repository
rB Blender

Event Timeline

Harley Acheson (harley) requested review of this revision.Aug 5 2021, 8:26 PM
Harley Acheson (harley) created this revision.

Hi @Harley Acheson (harley)

The brackets of the ternary operator seem to be incorrectly positioned. This means (y + (IsLanguage("ja"))) ? ... in my environment.
As a result, the position of the third party IME composition window (Google Japanese Input) was not displayed correctly.

::SetCaretPos(x, y + (IsLanguage("ja")) ? caret_rect_.getHeight() : 0);

The ternary operator is evaluated first like this.

::SetCaretPos(x, y + (IsLanguage("ja") ? caret_rect_.getHeight() : 0));

I did a simple CJK input test and current patch was fine with me other than this.

Thank you for improving IME!

The (old) patch was trying to add some trailing whitespace.

@Yuki Hashimoto (hzuika) - The brackets of the ternary operator seem to be incorrectly positioned...

Interesting. I like that one-liner. Is this better?

Interesting. I like that one-liner. Is this better?

I like one-linear too, but original code seems to be better to aviod same mistake in the future.

By the way, I think getHeight always returns 0.
Because, I have never seen a number other than 0 in the argument h of wm_window_ime_begin.
Is the height of the caret updated anywhere?

Harley Acheson (harley) planned changes to this revision.Aug 5 2021, 11:41 PM

@Yuki Hashimoto (hzuika) - ...I think getHeight always returns 0... Is the height of the caret updated anywhere?

Yikes. Hang on...

@Yuki Hashimoto (hzuika) - original code seems to be better to avoid same mistake in the future.

I made it a bit simpler.

getHeight always returns 0.... never seen a number other than 0 in the argument h of wm_window_ime_begin. Is the height of the caret updated anywhere?

No it is not. We always seem to call wm_window_ime_begin() with zero for "h" and it is never updated.

Obviously if that height of caret_rect_ is causing some real issue, like the caret position is incorrect for Japanese, then we'd need to make a new bug report about it and we can then make a specific patch for that.

Similarly, when I try to input Chinese I would like to hit "Enter" to select the suggested character but it does not do so. Could you try this too. We might have to change the excluded characters or keypresses for that. But again, new issue that we can deal with separately.

@Yuki Hashimoto (hzuika) - Ignore my last comment about Chinese input. It seems to be working the same as other applications.

Obviously if that height of caret_rect_ is causing some real issue, like the caret position is incorrect for Japanese, then we'd need to make a new bug report about it and we can then make a specific patch for that.

I had no problems when using Japanese IME (Microsoft IME and Google Japanese Input).
In addition, printf showed that the height was always 0 while typing Japanese.

::SetCaretPos(x, y); should be fine.

@Yuki Hashimoto (hzuika) - I had no problems when using Japanese IME (Microsoft IME and Google Japanese Input).
In addition, printf showed that the height was always 0 while typing Japanese.

::SetCaretPos(x, y); should be fine.

Yes. That height has always been zero. And since we have a native Japanese speaker (you) saying that it works fine I think it is pretty important that we change that. Then we can always refactor it out later.

Will see if I should do that here or in a separate patch. Will try doing it here.

Removing one unused thing that could cause confusion.

In MoveImeWindow(), if setting the Caret position and in Japanese there was an adjustment to the y position by the amount of caret_rect_'s height. But this height is always zero, always has been zero. caret_rect_ are only set in IMEBegin and never adjusted again. Basically this all starts from wm_window_IME_begin() which always passes zero for height.

We also have a native Japanese speaker confirming that current behavior works great. So removing the Japanese height adjustment of zero from MoveImeWindow()

intern/ghost/intern/GHOST_ImeWin32.h
351

Please note the source from this comment, I wasn't overly concerned about the contents, but the upper limit of 9 characters I wanted to verify, and went to look at the ISO spec and came up empty (I mean why would they even bother with terminator chars, it was fooling to look at the specs from the start, but that's on me) , it wasn't until I went to look at the LOCALE_SISO639LANGNAME definition in MSDN I recognized where this text came from.

doesn't have to be much though

MSDN states in the documentation for LOCALE_SISO639LANGNAME the maximum number of characters allowed for this string is nine, including a terminating null character.

I would have advocated for a link, but i have been bitten quite a few times now with MS moving the documentation to a different url.

Hi @Harley Acheson (harley).

I checked below now:

  • UpdateInputLanguage()
    • When IME's Language was switched, is language property correct? : ok
  • cpu consuming
    • Compared a time typing about 1000 letters of Japanese ('あ'-'ん', '0'-'9') by automation tool between no patched and patched.
      • Patched version takes seconds 98% of no patched version. I don't know why the patched is faster. Thought it has no problem about cpu consuming. : ok

I haven't been a reviewer here before. But tomorrow I will try to include other points as well.

Improved comments clarifying length of LOCALE_SISO639LANGNAME string as requested by @Ray Molenkamp (LazyDodo)

intern/ghost/intern/GHOST_ImeWin32.cpp
106–110

You should use a constant or a symbol instead of a literal by hand.
It is like this:

const char GHOST_IMEWIN32_LANG_ISO639_ENGLISH[3] = {'e', 'n', 0};
const char GHOST_IMEWIN32_LANG_ISO639_CHINESE[3] = {'z', 'h', 0};
const char GHOST_IMEWIN32_LANG_ISO639_JAPANESE[3] = {'j', 'a', 0};
const char GHOST_IMEWIN32_LANG_ISO639_KOREAN[3] = {'k', 'o', 0};
if ((ascii >= 'A' && ascii <= 'Z') || (ascii >= 'a' && ascii <= 'z')) {
  return true;
}
if (IsLanguage(GHOST_IMEWIN32_LANG_ISO639_JAPANESE) && (ascii >= ' ' && ascii <= '~')) {
  return true;
}
else if (IsLanguage(GHOST_IMEWIN32_LANG_ISO639_CHINESE) && ascii &&
         strchr("!\"$'(),.:;<>?[\\]^_`", ascii)) {
  return true;
}

P.S. I don't know the char array's length of each LANGNAME is [3].

intern/ghost/intern/GHOST_ImeWin32.cpp
106–110

updated. Adding GHOST_IMEWIN32_LANG_ISO639_LENGTH = 9, and GHOST_IMEWIN32_LANG_ISO639_HAWAII ('haw').

const char GHOST_IMEWIN32_LANG_ISO639_ENGLISH[3] = {'e', 'n', 0};
const char GHOST_IMEWIN32_LANG_ISO639_CHINESE[3] = {'z', 'h', 0};
const char GHOST_IMEWIN32_LANG_ISO639_JAPANESE[3] = {'j', 'a', 0};
const char GHOST_IMEWIN32_LANG_ISO639_KOREAN[3] = {'k', 'o', 0};
static const int GHOST_IMEWIN32_LANG_ISO639_LENGTH = 9;
const char GHOST_IMEWIN32_LANG_ISO639_ENGLISH[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'e', 'n', 0};
const char GHOST_IMEWIN32_LANG_ISO639_CHINESE[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'z', 'h', 0};
const char GHOST_IMEWIN32_LANG_ISO639_JAPANESE[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'j', 'a', 0};
const char GHOST_IMEWIN32_LANG_ISO639_KOREAN[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'k', 'o', 0};
const char GHOST_IMEWIN32_LANG_ISO639_HAWAII[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'h', 'a', 'w', 0};

const char GHOST_IMEWIN32_LANG_ISO639_HAWAII[GHOST_IMEWIN32_LANG_ISO639_LENGTH] = {'h', 'a', 'w', 0};

I got {'h', 'a', 'w', 0} from Microsoft Hawaii IME.
When if LOCALE_SISO639LANGNAME's length is unknown, the max length of GHOST_ImeWin32::language property might be able to get from each Microsoft IME (When someone support a language of IME.) .

@Takahiro Shizuki (sntulix) - You should use a constant or a symbol instead of a literal by hand

Something like this version might be all we really need. A define for that obscure ISO-639 length of 9, Enforcement of argument lengths, better comments, and just a few local defines for the languages we need to refer to.

Code is seems a good shape,so on that basis i accept, however i can't test any of it, i just don't know how, so if the international people testing this say it works ok i'll take their word for it.

just change that one comment before committing

intern/ghost/intern/GHOST_ImeWin32.h
350

The length and source of the length is already documented near the W32_ISO639_LEN define, it be fine if this comment went away and we just keep the The abbreviated ISO 639-1 name of the input language, such as "en" for English. bit

This revision is now accepted and ready to land.Aug 10 2021, 4:26 AM

Comment changes only.

This revision was automatically updated to reflect the committed changes.