ParallelPatternMatchingTrie ( TriePDC)

About the project

This project is Parallel and Distributed Computing course's J-component ( at VIT Chennai). Here main idea is to match and search words efficiently among text files in a directory. The files are indexed using multiprocessing. The indexed files are then searched using a Trie. In other words, this project is word autocomplete using multiprocessing and trie.

Report on the project

Click here to read the report.

Downloading the file

Execute following command to download the project.

git clone https://github.com/krunalmk/TriePDC.git

Extract the zip.

Executing the project

Open terminal in the extracted folder.
Execute following

for indexing the text files execute

python3 reindexthefiles.py

to get parallel prefix match for your input execute

python3 main.py <your word>
python3 main.py guten #Example: to get autocomplete suggestions from the text files for the word "guten".

Algorithm used in project

Algorithm for storing indexed data in JSON

The texts from text files in the current directory are read.
Characters like '.', ''', ',', ';', etc. are removed.
The cleaned text from step 2 is stored in JSON format in a file. The structure of the JSON ( data.json) is as follows:

{ word: {
        "File": {
                "filename1.txt": {
                                "Line": [ i1, i2, i3, ..., in]
                                },

                "filename2.txt": {
                                "Line": [ j1, j2, j3, ..., jn]
                                },
                }
        }
}
4. Multiprocessing concept is used to index the files efficiently.

Algorithm for searching the prefix

The data from JSON ( data.json) is read and stored in Trie.
The Trie eases the process of searching. It is very efficient. For more information on Trie, click here
Now the query prefix ( entered by user in terminal/ console) is matched in the Trie.
If match is found then file name along with line numbers of word is returned. You have got the results! Yayy!

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
.gitignore		.gitignore
19BCE1210_PDC_Review2.pptx		19BCE1210_PDC_Review2.pptx
Files.py		Files.py
KingLear.txt		KingLear.txt
Othello.txt		Othello.txt
README.md		README.md
SampleTextFile.txt		SampleTextFile.txt
Trie.py		Trie.py
TriePDC:ParallelTextPatternMatching_Krunal.pdf		TriePDC:ParallelTextPatternMatching_Krunal.pdf
__init__.py		__init__.py
bruteforce.py		bruteforce.py
data.json		data.json
main.py		main.py
reindexthefiles.py		reindexthefiles.py
shakespeare.txt		shakespeare.txt
triePDC_test.py		triePDC_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParallelPatternMatchingTrie ( TriePDC)

About the project

Report on the project

Downloading the file

Executing the project

Algorithm used in project

Algorithm for storing indexed data in JSON

Algorithm for searching the prefix

About

Releases 1

Packages

Languages

krunalmk/TriePDC

Folders and files

Latest commit

History

Repository files navigation

ParallelPatternMatchingTrie ( TriePDC)

About the project

Report on the project

Downloading the file

Executing the project

Algorithm used in project

Algorithm for storing indexed data in JSON

Algorithm for searching the prefix

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages