Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StopWords from TNTSearch & Add Config Option #136

Open
benlilley opened this issue Jun 6, 2024 · 0 comments
Open

Implement StopWords from TNTSearch & Add Config Option #136

benlilley opened this issue Jun 6, 2024 · 0 comments

Comments

@benlilley
Copy link

benlilley commented Jun 6, 2024

First, thanks for the great plugin, we've been battling Grav's search for a while and finding this has really helped.

The Problem

Currently when searching larger data sets the search can become slow and cumbersome and cause some weird interaction with the search field playing catch up to the user. One of the reasons for this is that the plugin appears to search for every letter after you reach the initial minimum. For example searching for eating an apple will find every instance of an in your data set and search through them even if you have min: 3 set. This is slow and also lowers the result quality.

Note: Perhaps a configurable delay on the search input would help here too, aiming for searching on typing finishing not every key stroke.

Potential Solution

This is traditionally solved using stop words, which are actually implemented in TNTSearch: teamtnt/tntsearch#83 and seen in TNTIndexer.php:

class TNTIndexer
{
    protected $index              = null;
    protected $dbh                = null;
    protected $primaryKey         = null;
    protected $excludePrimaryKey  = true;
    public $stemmer               = null;
    public $tokenizer             = null;
    public $stopWords             = [];

A common list as a starting point for English would be:

public $stopWords = ['a', 'an', 'and', 'are', 'as', 'at', 'be', 'but', 'by', 'for', 'if', 'in', 'into', 'is', 'it', 'no', 'not', 'of', 'on', 'or', 'such', 'that', 'the', 'their', 'then', 'there', 'these', 'they', 'this', 'to', 'was', 'will', 'with'];

It would be great if there was an option to pass a list of these stop words to be ignored in tntsearch.yaml that way it's easy to discover and manage, plus won't be lost during a plugin update like updating the current vendor file will do. Ideally these words would also then be not used for the Highlighter functionality.

Grav: 1.7.46
TNT Search: 3.4.0
PHP: 8.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant