Skip to content

kemaleren/bwt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

BWT README

Author: Kemal Eren ([email protected])

ABOUT

bwt is a Python module for efficiently searching strings using the Burrows-Wheeler transform.

Sample usage:

import bwt
import urllib3

# get Pride and Prejudice, Chapters 1 through 5
url = 'http://www.gutenberg.org/cache/epub/1342/pg1342.txt'
http = urllib3.PoolManager()
text = http.urlopen("GET", url).data.decode()

chapters = text[675:30879]

# pre-compute data structures
bwt_data = bwt.make_all(chapters)

# find all occurances of the word 'married', with up to three
# mismatches.
bwt.find('married', chapters, mismatches=3, bwt_data=bwt_data)

Note: bwt.make_all() is not fast, because it uses a naive suffix array algorithm. You can compute the suffix array seperately with any efficient third-party module, and provide it to make_all().

REQUIREMENTS

  • Python (tested with 2.7.4 and 3.3.1)

About

String search using the Burrows-Wheeler transform

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages