Skip to content

v0.3

Compare
Choose a tag to compare
@MaartenGr MaartenGr released this 30 Apr 06:19
· 17 commits to master since this release
a60dfc6

You can now specify the top_n matches for each string. This option allows you to get a selection of matches that best suit the input. It is implemented in polyfuzz.models.TFIDF and polyfuzz.models.Embeddings since this is computationally quite heavy and these models are best suited for making those calculations.

Usage:

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF")
model.match(from_list, to_list, top_n=3)

Or usage in custom models:

from polyfuzz.models import TFIDF, Embeddings
from flair.embeddings import TransformerWordEmbeddings

embeddings = TransformerWordEmbeddings('bert-base-multilingual-cased')
bert = Embeddings(embeddings, min_similarity=0, model_id="BERT", top_n=3)
tfidf = TFIDF(min_similarity=0, top_n=3)

string_models = [bert, tfidf]
model = PolyFuzz(string_models)
model.match(from_list, to_list)