You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
running against a large corpus, especially with some settings, can result in a huge volume of results. many of them are "low-quality" in that the matching portion consists of superficially similar elements that don't carry much semantic weight.
adjusting the match length can help, but there might be other heuristics we can use to improve relevance. one possibility is TF-IDF.
The text was updated successfully, but these errors were encountered:
one possible quick n' dirty way to do this is to implement something like passim's --max-series, which for us would translate to dropping seed groups from the index if there are too many entries in the group (indicating a super common seed).
if we do TF-IDF, we can also implement that at the seed level to prune the graph early.
running against a large corpus, especially with some settings, can result in a huge volume of results. many of them are "low-quality" in that the matching portion consists of superficially similar elements that don't carry much semantic weight.
adjusting the match length can help, but there might be other heuristics we can use to improve relevance. one possibility is TF-IDF.
The text was updated successfully, but these errors were encountered: