Page MenuHome

Property Search: Use fuzzy string matching
AbandonedPublic

Authored by Hans Goudey (HooglyBoogly) on Oct 7 2020, 9:31 PM.

Details

Summary

This allows some error in spelling and using word prefixes in property search.

  • combuned -> Combined
  • symnetry -> Symmetry
  • ssr -> Screen Space Reflections
  • tansmittence -> Transmittance
  • gruvity -> Gravity

Essentially it allows one misspelled character on shorter words, and around
two or three on longer matches.

This is based on P1642 by Jacques Lucke with some modifications.

The largest difference comes from the fact that the question "Does this string
have a match" (property search) is a pretty different problem than "sort these
strings based on how much they match" (menu search).

This is because false positives matter a lot more in this case. If the result
is just a sorted list, a false positive is just lower down and doesn't take
any attention. But in this case a false positive is highlighted and takes
attention away from actual results.

In order to reduce the fuzziness for this case, I've added a tweaked version of
score_query_against_words. It uses the same methods, but instead of building
up a score for the whole string, it returns early if any result is above the
given ratio of "fuzziness."

Diff Detail

Repository
rB Blender
Branch
property-search-fuzzy-matching (branched from master)
Build Status
Buildable 11875
Build 11875: arc lint + arc unit

Event Timeline

Hans Goudey (HooglyBoogly) requested review of this revision.Oct 7 2020, 9:31 PM
Hans Goudey (HooglyBoogly) created this revision.
source/blender/blenlib/intern/string_search.cc
344

@Jacques Lucke (JacquesLucke) I'm not totally happy with the changes in this file. I do believe this function needs to be fundamentally different than score_query_against_words, but I don't really like the "almost" code duplication that this adds. Maybe it's not a problem though, I'm curious if you have ideas.

Haven't checked the patch itself yet. I wonder, do you think the search works better when there is a little bit of fuzzy matching? Maybe we have to acknowledge the fact that fuzzy matching without sorting is a bad idea...

Haven't checked the patch itself yet. I wonder, do you think the search works better when there is a little bit of fuzzy matching? Maybe we have to acknowledge the fact that fuzzy matching without sorting is a bad idea...

I think it does, mainly because of the allowance for a bit of misspelling (I added some examples to the patch description). I've found correcting misspellings sort of annoying here, this might save a fair amount of time.
That said, I'm not super tied to it. It's a nice improvement but not essential here. I also wonder if performance with all tab search is a worthwhile consideration, since the search happens on every layout pass.

Generally, I think it is okay to have two separate functions for fuzzy matching, even if they look similar at the first glance. That is because they will have to change for different reasons.

source/blender/blenlib/intern/string_search.cc
346

Not sure if it is a good idea to expose fuzzy_ratio. Better define a value that works well in this function. I don't think we want two different values for this in different places of Blender.

357

That does not seem to make sense. Doing word_is_usable[word_index] = false; has no effect when you return immediately afterwards. Returning true when any of the input words might be a bit too weak, or not? The would mean that s ksdfhgikfgh would match symmetry.

361

Looks like you "restart" matching here (same a couple lines further down). Instead of reusing the variable, it might be good to declare+initialize it again.
To do that, you should put the code above into a separate scope.

Hans Goudey (HooglyBoogly) marked an inline comment as done.Oct 8 2020, 9:34 PM
Hans Goudey (HooglyBoogly) added inline comments.
source/blender/blenlib/intern/string_search.cc
357

Ugh, true, the way I have written this makes no sense. If the first word matches and the rest of the search is complete gibberish like you said, the words will match.

Actually I don't think this should be per-word at all, except for the acronym matching. I experimented with that earlier and it worked well. It may require some generalization of the "normalization" process in this file though. I'll look into rewriting it this way.

source/blender/blenlib/intern/string_search.cc
357

Sounds good. It's totally fine when this function uses a different matching algorithms from the other one. It's simply a different use case, that I haven't experimented with yet.

Hans Goudey (HooglyBoogly) planned changes to this revision.Oct 21 2020, 2:10 PM
  • Rewrite algorithm for "is_fuzzy_match"

The algorithm is much simpler this way. First is an initials check. If the number of words in
the string matches the number of letters in the query, it returns true if all the initials match.
Then it just uses the existing get_fuzzy_match_errors and checks whether the number of errors
is under a threshold of 0.8.

It still does not seem very predictable to me, but that's just difficult anyway for this kind of search. It seems better than having no fuzzy search at all.

source/blender/blenlib/BLI_string_search.h
54

This doesn't really need to be here, unless you add some unit tests in BLI_string_search_test.cc (which might be a good idea).

source/blender/blenlib/intern/string_search.cc
415

remove string_search

417

equals

Also note that this initials matching works differently then the one I implemented earlier. This is fine, just wanted to note it here.

This revision is now accepted and ready to land.Feb 8 2021, 5:08 PM

This patch has an approved status however it was not committed yet. So I'm assuming the other reviewers are blocking. Updating the reviewers list to reflect this. This way the patch status still show as Need Review.

This revision now requires review to proceed.Mar 26 2021, 5:47 PM
This revision is now accepted and ready to land.Mar 26 2021, 6:03 PM

I'm abandoning this because I was never really convinced this was an improvement, and I never saw user feedback about this.

For fuzzy searching in a list, false positives don't really matter, since the best results will be at the top.
For property search, false positives are a bit annoying.

If someone makes the case that this is important, I'd be fine updating and committing it, but I'm not certain enough myself to move forward with it.