Skip to content

Latest commit

 

History

History
37 lines (31 loc) · 1.55 KB

README.md

File metadata and controls

37 lines (31 loc) · 1.55 KB

Wordcorrector

Python module to find closest matching words to a given input based on a dictionary of words.

Note : Currently used dictionary was generated from movie reviews data.

Features

  • Model can be generated to correct using any list of words (english words, names, places, products, you name it)
  • Fast, simple and clean
  • Uses fuzzy matching method
  • Takes into consideration the relative frequency of usage of words
  • Number of suggestions can be varied

Usage

  1. Install the dependencies (given below)
  2. Run word_corrector.py
  3. Input a word when prompted. The program will return and display a list of top matches.

Building a new dataset

  1. Currently, the dictionary of words has been built using NLTK's movie review data.
  2. To use another list of words, make a JSON file in process/source/ folder in the following format:
  {
    "word1" : 1,
    "word2" : 2
  }

i.e. list of key-value pair where word will be key and its frequency/importance will be the value. Order does not matter. See the movie_review_data.json file for reference. 3. Open process/make_model.py, specify the name of above made source file in main function and run the file. 4. The word_corrector program will now use the new dataset.

Dependencies