Release v0.3 · MaartenGr/PolyFuzz

You can now specify the top_n matches for each string. This option allows you to get a selection of matches that best suit the input. It is implemented in polyfuzz.models.TFIDF and polyfuzz.models.Embeddings since this is computationally quite heavy and these models are best suited for making those calculations.

Usage:

from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF")
model.match(from_list, to_list, top_n=3)

Or usage in custom models:

from polyfuzz.models import TFIDF, Embeddings
from flair.embeddings import TransformerWordEmbeddings

embeddings = TransformerWordEmbeddings('bert-base-multilingual-cased')
bert = Embeddings(embeddings, min_similarity=0, model_id="BERT", top_n=3)
tfidf = TFIDF(min_similarity=0, top_n=3)

string_models = [bert, tfidf]
model = PolyFuzz(string_models)
model.match(from_list, to_list)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3