Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOT] Fuzzy String Matching #7

Open
derekantrican opened this issue May 6, 2019 · 3 comments
Open

[BOT] Fuzzy String Matching #7

derekantrican opened this issue May 6, 2019 · 3 comments
Milestone

Comments

@derekantrican
Copy link
Owner

For instance, in https://www.reddit.com/r/climbing/comments/715awf/red_rock_season_is_back_cruisin_up_yak_crack_511c/ the user mentioned that they are climbing "Yak Crack". While there are some results with that name, the real result is actually "Yaak Crack". We should implement fuzzy query matching so that "Yaak Crack" would also come up in the results list.

@derekantrican derekantrican changed the title [DBBuilder] Fuzzy String Matching [BOT] Fuzzy String Matching May 6, 2019
@derekantrican derekantrican added this to the Stage 3 milestone May 15, 2019
@derekantrican
Copy link
Owner Author

We could solve this by upgrading the MountainProjectDataSearch.StringMatch function to something similar to:

private static bool StringMatch(string inputString, string targetString, bool caseInsensitive = true)
{
    string input = inputString;
    string target = targetString;

    if (caseInsensitive)
    {
        input = input.ToLower();
        target = target.ToLower();
    }

    if (target.Contains(input))
        return true;
    else if (Levenshtein(target, input) <= 3)
        return true;
}

This would check to see if there are 3 or fewer changes to "fix" the string. We can adjust this limit as needed, but not too much as a large limit will start matching other unrelated items

@derekantrican
Copy link
Owner Author

https://github.com/Turnerj/Quickenshtein is a C# Levenshtein implementation that should be pretty quick

@derekantrican
Copy link
Owner Author

This will be a bit more complicated than the snippet above. While that works for https://old.reddit.com/r/climbing/comments/uq7ej2/sent_my_first_v8_thin_lizzy_in_joshua_tree it doesn't work for the "Yak Crack" example. If we're using a levenshtein distance of <= 3, then that means "Yak Crack" can match just about any "___ Crack" route (which means that in our cutoff of !searchResult.IsEmpty() && searchResult.AllResults.Count < 75, 75 is too low)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant