Skip to content

Python library for fast fuzzy search over a big file written in Rust

License

Notifications You must be signed in to change notification settings

Intsights/fastzy

Repository files navigation

Logo

Python library for fast fuzzy search over a big file written in Rust

license Python Build PyPi

Table of Contents

About The Project

Fastzy is a library written in Rust that can search through a file looking for text based on its distance (Levenshtein). For measuring the Levenshtein distance, the library uses mbleven's algorithm. In situations where the requested distance exceeds 3, where mbleven is slower, Wagner-Fischer is used instead of mbleven. This library loads the whole file into memory, and creates a lightweight index based on the length of the lines. The result is that only potential lines are looked up, opposed to a large number of lines.

Built With

Performance

Library Function Time
polyleven polyleven.levenshtein('text') 8.48s
fastzy fastzy.search('text) 0.003s

Installation

pip3 install fastzy

Usage

import fastzy

# open a file and index it in memory
searcher = fastzy.Searcher(
    file_path='input_text_file.txt',
    separator='',
)

# search for the input text 'text' with the distance of 1
searcher.search(
    pattern='text',
    max_distance=1,
)
['test', 'texts', 'next']

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - [email protected]

Project Link: https://github.com/Intsights/fastzy