Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete Results for Strings that Include Numbers #255

Open
thekenshow opened this issue Oct 28, 2021 · 1 comment
Open

Incomplete Results for Strings that Include Numbers #255

thekenshow opened this issue Oct 28, 2021 · 1 comment

Comments

@thekenshow
Copy link

Moving this issue from trilbymedia/grav-plugin-tntsearch#122 to here after getting verification that it's not a Grav issue.

Here are the details:

A client site uses part numbers in page titles and content (e.g., "SPK1000", "7393 Horn Driver") and TNTSearch isn't returning all matches when the minimum (three) characters are entered.

Test case 1 is a search for "spk", which should return "spk1000" and "spk7457", but only the first appears:

Screen Shot 2021-10-25 at 4 31 03 PM

A search for "spk7", returns "spk7457", which should also appear in the previous search:

Screen Shot 2021-10-25 at 4 31 13 PM

Test 2 is a search for "739", which should return three results - two instances of "7393 Horn Driver" and 1 with "739" in the body of the text, but instead only returns the latter:

Screen Shot 2021-10-26 at 1 16 04 PM

A search for "7393" turns up the first two expected above (two instances of "7393 Horn Driver"):

Screen Shot 2021-10-26 at 1 15 56 PM

Using the Test 2 "739" search documented above, with index rebuild + cache clear between tests, I tried the following configuration changes with no luck:

  • fuzzy enabled and then disabled, no change
  • phrases enabled and then disabled, no change
  • search type of "auto" and then "basic", no change
  • stemmer enabled and then disabled, no change
@ViliusS
Copy link
Contributor

ViliusS commented Oct 29, 2021

By default TNTSearch operates on full words. It does return only first word when using partial search, i.e. "spk1000" and "spk7457" are threated as completely different words so only first result is returned.

This is true for all words and it doesn't matter if they contain numbers or not. However for "normal" words this is alleviated with stemmer, which doesn't work for numbers obviously. According to this comment the only way to partially search in words with numbers is fuzzy search.

I have looked at fuzzy search code and it is complete mess unfortunately :(.

  1. Fuzzy search is never called if at least one full match is found. We even have a pull request to fix this.
  2. The default levenstein distance of 2 is not enough in your case. Even if we change the default distance there still be cases where fuzzy search doesn't include some results. That's because leveinsten argorithm wasn't created for partial search. It was created to search similar words, which in most cases are also full "normal" words.
  3. Some parts of the code is never reached (?) at all, like here for example
    return $this->getAllDocumentsForFuzzyKeyword($word, $noLimit);

I guess the correct way to fix this somebody needs to implement partial search algorithm. This would also enable partial search for languages which do not have stemmer.

You can also try the patch in mentioned pull request, if it changes things for you. Hopefully that helps a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants