Page MenuHome

BLF: New Font Stack for Better Language Coverage
ClosedPublic

Authored by Harley Acheson (harley) on Jun 17 2022, 9:08 PM.

Details

Summary

Replace our existing two fonts with a stack of new fonts to increase
and improve language coverage and to add many new symbols and icons.


This proposed change is to replace the two font files currently found in the datafiles/fonts folder with the contents of the following zip archive:

Because of this improved coverage, this patch also prints out a message to the console if a glyph is not found. Note that this will only happen the first time that character is asked for, not every time it is encountered.

Current Fonts:

The current fonts are "droidsans.ttf" (5.09 MB, containing 54017 glyphs), and "bmonofont-i18n.ttf" (5.31 MB, containing 51649 glyphs). These currently contain a differing number of languages (mono font not containing Devanagari and Tamil for example). They are also contain only a subset of the Arabic presentation symbols (62 & 72 out of 611), which will be needed for complex shaping (different glyphs depending on whether the letter is at the beginning, middle, or end of word). They also contain a minimum number of CJK ideographs (528/996 for Phonetics & Symbols, 35/64 Punctuation, 2/6582 of Extension A).

In summary the two fonts cover under 55,000 glyphs with two files totaling 10.4 MB.

Proposed Fonts
The proposed 25 fonts total 14.7 MB, so a 41% file size increase. But with substantially increase language and glyph coverage. In fact enough to cover all of the top 44 languages by number of speakers. This represents about 1.5 billion more people who can view their language in Blender. They also feature thousands of new symbols and icons.

DejaVuSans.woff2257,564
DejaVuSansMono.woff2145,192
lastresort.woff2118,564
Noto Sans CJK Regular.woff211,672,912
NotoEmoji-VariableFont_wght.woff21,026,984
NotoSansArabic-VariableFont_wdth,wght.woff2253,496
NotoSansArmenian-VariableFont_wdth,wght.woff247,492
NotoSansBengali-VariableFont_wdth,wght.woff2226,740
NotoSansDevanagari-Regular.woff269,872
NotoSansEthiopic-Regular.woff292,608
NotoSansGeorgian-VariableFont_wdth,wght.woff2101,524
NotoSansGujarati-Regular.woff258,668
NotoSansGurmukhi-VariableFont_wdth,wght.woff266,568
NotoSansHebrew-VariableFont_wdth,wght.woff217,544
NotoSansJavanese-Regular.woff234,144
NotoSansKannada-VariableFont_wdth,wght.woff2156,260
NotoSansMalayalam-VariableFont_wdth,wght.woff2159,848
NotoSansMath-Regular.woff2226,460
NotoSansMyanmar-Regular.woff264,692
NotoSansSymbols-VariableFont_wght.woff2152,244
NotoSansSymbols2-Regular.woff2201,324
NotoSansTamil-VariableFont_wdth,wght.woff298,380
NotoSansTelugu-VariableFont_wdth,wght.woff2209,708
NotoSansThai-VariableFont_wdth,wght.woff246,852

In the above list you will notice that the new languages added represent a very small part of the total. The majority of the space is in the CJK for increased coverage of Chinese, Japanese, and Korean. This one file contains 65,535 glyphs.

The two base fonts, "DejaVuSans.woff2" and "DejaVuSansMono.woff2" are the current replacements for our base fonts. Therefore you should see no difference from current fonts when viewing Latin characters.

These fonts all match well, being mostly from the "Noto" font family, which are designed to coexist coherently.

The "lastresort" font is a special type of font that will always return a symbol of some kind for characters that are otherwise not found.

Almost of these fonts are "variable" type in that they each contain a range of design variations along multiple axes. Basically an infinite number of font variations in one file. Although we won't notice this immediately, we can add specific support for this in D12977: BLF: Add Support for Variable Fonts and hopefully take advantage of this when we use these fonts for 3D text.

There is almost perfect coverage here for math and technical symbols. I am just a bit anal about covering these things well.

The "emoji" font file could come in handy for all the related ideographs. Addon authors could use them for example.

Diff Detail

Repository
rB Blender

Event Timeline

Harley Acheson (harley) requested review of this revision.Jun 17 2022, 9:08 PM
Harley Acheson (harley) created this revision.
Harley Acheson (harley) edited the summary of this revision. (Show Details)
Brecht Van Lommel (brecht) requested changes to this revision.Jun 20 2022, 1:14 PM

Is there something we can do about the performance impact noted in D12622? My understanding is that freetype is able to load just the necessary data from the font file, so I'm not sure why it's so slow even. Maybe there is something we are doing to trigger a more expensive operation, that we could postpone until the font is actually needed? Or worst case, we could cache unicode ranges outside the font files?

I'm also hesitant about including material symbols and recommending them to be used by add-on authors, introducing a set of icons with a different style than what we have now. I don't think reserving this space is a concern at all, if an add-on does that kind of hack they can't expect that to keep working.

source/blender/blenfont/intern/blf_glyph.c
588

This could be printed if debug logging is enabled, but I wouldn't do it always.

This revision now requires changes to proceed.Jun 20 2022, 1:14 PM

@Brecht Van Lommel (brecht) - Is there something we can do about the performance impact noted in D12622?

Not that I have found yet. Removing everything we do doesn't seem to reduce the time in any noticeable way. Using memory-mapping helps slightly but probably not enough to be worth it. But still investigating.

As a test, mostly for you, I made D15258: BLF: Fonts with FT_Face Optional which loads the fonts fully initially to gather data like the coverage, but then immediately drops the Face. It is only added again when actually needed, so remove the overhead for fonts that are not used by users.

Or worst case, we could cache unicode ranges outside the font files?

We could have a font.config or similar for something like this if it helps. Could hold those overage bits, allow ordering. Not sure it that is interesting.

I'm also hesitant about including material symbols...

No problem removing them from this. I just liked having everything in here for testing and showing off.

Not that I have found yet. Removing everything we do doesn't seem to reduce the time in any noticeable way. Using memory-mapping helps slightly but probably not enough to be worth it. But still investigating.

As a test, mostly for you, I made D15258: BLF: Fonts with FT_Face Optional which loads the fonts fully initially to gather data like the coverage, but then immediately drops the Face. It is only added again when actually needed, so remove the overhead for fonts that are not used by users.

It's better for memory usage, but does not address startup time.

Or worst case, we could cache unicode ranges outside the font files?

We could have a font.config or similar for something like this if it helps. Could hold those overage bits, allow ordering. Not sure it that is interesting.

We could just put the full list of bundled fonts + their unicode ranges in the code. Probably that's the easiest solution unless we find some way to make font loading faster.

@Brecht Van Lommel (brecht) - We could just put the full list of bundled fonts + their unicode ranges in the code. Probably that's the easiest solution unless we find some way to make font loading faster.

Interesting. I think I see a solution there, but lets make sure we are talking about the same thing...

We'd need something like D15258: BLF: Fonts with FT_Face Optional, so we can have FontBLFs that do not have a loaded FT_Face.

But while loading the files we check their names. If found we know the details so we don't bother loading at all, just leave the face NULL unless needed. If NOT found then we load, read details, then drop. That would work to be superfast for the things we ship, not have any overhead for unused files, but still allows users to add their own

Yes, exactly.

@Ray Molenkamp (LazyDodo) was suggesting I could make a codegen to gather this information on our released fonts at build time. Does that sound interesting or messy?

Personally I would not bother, we're not changing fonts that often. Doesn't seem worth adding complexity and potential for failures to the build system.

Harley Acheson (harley) edited the summary of this revision. (Show Details)Jun 28 2022, 6:19 PM
Harley Acheson (harley) marked an inline comment as done.Jul 6 2022, 7:36 PM

only print report of missing characters if DEBUG

Updated to the current state of master and it incorporate a change requested by review.

This revision is now accepted and ready to land.Jul 11 2022, 6:27 PM

Closed with commit {e9bd6abde37c}