Page MenuHome

Fix T97417: support tab and other whitespace characters after obj/mtl keywords
ClosedPublic

Authored by Aras Pranckevicius (aras_p) on Apr 27 2022, 9:16 PM.

Details

Summary

Even if available OBJ/MTL format documentations don't explicitly specify which characters can possibly separate keywords & arguments, turns out some files out there in the wild use TAB character after the line keywords. Which is something the new 3.2 importer was not quite expecting (T97417).

Fix this by factoring out a utility function that checks if line starts with a keyword followed by any whitespace, and using that across the importer. Also fix some other "possible whitespace around name-like parts" of obj/mtl parser as pointed out by the repro files in T97417.

Together with this change, I'd land a change to subversions tests repository that changes several of already existing obj importer test files to have tabs in some of their lines. The change itself also adds a new mtl parsing test that checks a ton of different cases of LF vs CRLF line endings, different indentation etc.

Note that the .mtl files attached to the bug report still not import 100% correctly, but the remaining issues are not related to whitespace or line endings; but rather to different assignment of values to the shader/material; I'll fix that in a separate commit.

Diff Detail

Repository
rB Blender

Event Timeline

Aras Pranckevicius (aras_p) requested review of this revision.Apr 27 2022, 9:16 PM
Aras Pranckevicius (aras_p) created this revision.

I can at least confirm that the geometry in my files now load correctly. However, it's failing to load the materials at all it seems. Was this patch meant to address that part too?

It seems to have something to do with the \r\n's in the file and leads to console spew like the following (notice how the printing of the literal carriage return leads to the following weird ' placement):

'BJ element not recognised: '
'BJ element not recognised: '
<ditto spew>
'BJ element not recognised: '
'BJ element not recognised: '
'BJ import: cannot read from MTL file: 'G:\test\cbox\CornellBox-Original.mtl
' not found in material library.
' not found in material library.
<ditto spew>
' not found in material library.
' not found in material library.


nit: Recognized is spelled wrong in the output

However, it's failing to load the materials at all it seems. Was this patch meant to address that part too?

I'll take a look.

Aras Pranckevicius (aras_p) planned changes to this revision.Apr 28 2022, 5:49 AM
Aras Pranckevicius (aras_p) edited the summary of this revision. (Show Details)
  • Support indented lines while parsing .mtl files,
  • Improve handling of CRLF (Windows style) line endings, in some cases for name-like things (e.g. "usemtl foo") it was wrongly adding the CR character to the name itself.
  • Extend .mtl parsing test coverage wrt indentation & line endings.

This is fine to submit after you address the comment I made as you see fit.

source/blender/io/wavefront_obj/importer/obj_import_file_reader.cc
324

I guess this is just allowing any control character as "whitespace". This is OK but a little broader than what is typically thought of as whitespace so maybe a comment is in order. Did you not use isspace() because you wanted this to be very fast?

Also, what if line only contains the keyword an no more. Won't this index off the end of line?

What if there is more than one whitespace character following the keyword? Maybe all the code after skips whitespece before looking for whatever comes next, in which case this is fine. But otherwise maybe there should be a loop finding out where the whitespace ends?

This revision is now accepted and ready to land.Apr 30 2022, 4:37 PM
Aras Pranckevicius (aras_p) marked an inline comment as done.May 1 2022, 6:41 PM
Aras Pranckevicius (aras_p) added inline comments.
source/blender/io/wavefront_obj/importer/obj_import_file_reader.cc
324

I guess this is just allowing any control character as "whitespace". This is OK but a little broader than what is typically thought of as whitespace so maybe a comment is in order. Did you not use isspace() because you wanted this to be very fast?

Yes, isspace() has a massive overhead (checks locale, which involves at least a TLS access and possibly a mutex lock etc.). I'll add a comment clarifying this.

Also, what if line only contains the keyword an no more. Won't this index off the end of line?

No, the first thing the function does is check the length for "keyword length plus at least one more character".

What if there is more than one whitespace character following the keyword? Maybe all the code after skips whitespece before looking for whatever comes next, in which case this is fine. But otherwise maybe there should be a loop finding out where the whitespace ends?

The rest of parsing code generally whitespace before other things, so that's fine. This one needs to basically check "is there a keyword _and_ whitespace after it", since some keywords have the same prefixes, e.g. can't just check for v without checking what's after that, since it could be vn or vt.

Aras Pranckevicius (aras_p) marked an inline comment as done.May 1 2022, 6:42 PM