Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(UI): korean and english fuzzy search #3757

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

scarf005
Copy link
Member

Purpose of change

improve ergonomics of filtering items.

Describe the solution

lcmatch now uses std::wregex to fuzzy-search items.
e.g mgzn -> m.*?g.*?z.*?n

this isn't the best fuzzy search algorithm, however it's not too complex and supports complex fuzzy searching in korean (e.g match '감' with either 'ㄱ' or '가')

Describe alternatives you've considered

using std::locale on language.cpp

diff
diff --git a/src/language.cpp b/src/language.cpp
index da7c57a00568..657252dd9541 100644
--- a/src/language.cpp
+++ b/src/language.cpp
@@ -2,6 +2,7 @@
 
 #include <algorithm>
 #include <fstream>
+#include <locale>
 
 #if defined(_WIN32)
 #  if 1 // Prevent IWYU reordering platform_win.h below mmsystem.h
@@ -158,23 +159,9 @@ static std::string getSystemUILang()
 // Linux / Android
 static std::string getSystemUILang()
 {
-    std::string ret;
+    const auto lang = std::setlocale( LC_ALL, std::locale( "" ).name().c_str() );
 
-    const char *language = getenv( "LANGUAGE" );
-    if( language && language[0] != '\0' ) {
-        ret = language;
-    } else {
-        const char *loc = setlocale( LC_MESSAGES, nullptr );
-        if( loc != nullptr ) {
-            ret = loc;
-        }
-    }
-
-    if( ret == "C" || string_starts_with( ret, "C." ) ) {
-        ret = "en";
-    }
-
-    return to_valid_language( ret );
+    return to_valid_language( lang );
 }
 #endif // _WIN32 / !MACOSX

filtering entries by regex relevance in uilist::filterlist

this would be a big change, thus deserves its own PR.

Testing

added string_fuzzy_search_test.cpp.

fuzzy search in english

english.mp4

fuzzy search in korean

korean.mp4

Additional context

reference: https://taegon.kim/archives/9919

@github-actions github-actions bot added src changes related to source code. tests changes related to tests labels Nov 26, 2023
@cataclysmbnteam cataclysmbnteam deleted a comment from autofix-ci bot Nov 26, 2023
Copy link
Contributor

@Vollch Vollch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this feature being useful, but it need to be toggleable via hotkey, or something like that.
I suppose it might work better in korean, but in english it's prone to be very misleading. Look at that, for example:
Before:
image
After:
image

Second pic is pretty much unusable.

@Vollch
Copy link
Contributor

Vollch commented Dec 3, 2023

Or, thinking about it, maybe it could perform fuzzy search when string starts with some special character. As adding toggles to all places expecting text must be pretty difficult. e.g. "can" - regular, "~can" - fuzzy.

@scarf005
Copy link
Member Author

scarf005 commented Dec 4, 2023

or maybe we could improve the logic to make lcmatch return 'certainty score', and sort the result accordingly.

@Vollch
Copy link
Contributor

Vollch commented Dec 4, 2023

Right, that would also work. In fact, you don't really need to bother with scoring - instead you can rely on simple assumption that strict matches are most accurate.
So, make two list - strict and fuzzy, remove from fuzzy list items which already presented in strict list, merge two lists. This way searching for "can" i'd see "canned" stuff in first pages, and "casing" somewhere deep down.

@scarf005
Copy link
Member Author

scarf005 commented Dec 4, 2023

makes sense. about designing API, what do you think of this approach?

  1. change return type of lcmatch to int. it represents 'score'.
  2. strict match returns INT_MAX.
  3. loose match returns 1 (or arbitrary score value if algorithm gets better)
  4. sort entries by score.

This approach would let us incrementally implement better fuzzy-searching algorithm.

@Vollch
Copy link
Contributor

Vollch commented Dec 4, 2023

That'll break alphabetical sorting. I'd change step 4 to "sort by score, and then alphabetically within same score groups"

@scarf005
Copy link
Member Author

scarf005 commented Dec 4, 2023

alright. that sounds good. will investigate further.

@Vollch
Copy link
Contributor

Vollch commented Dec 4, 2023

And, also, if it does scoring anyway, it might make sense to have multiple scores even for strict search, not just INT_MAX.
E.g. "exact\word match", "entry\word starts with strings", "contains strings somewhere else".
Chances that i'm searching for "american flag" typing "can" are pretty low.

@scarf005 scarf005 marked this pull request as draft December 9, 2023 01:24
@scarf005
Copy link
Member Author

scarf005 commented Dec 9, 2023

setting it to draft until mentioned changes are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
src changes related to source code. tests changes related to tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants