There are two files for indexing and searching respectively:
For indexing, the file is called indexer.py. To run this file you will have to run
python idexer.py $1 $2 $3
here $1 = The address to the dump file (.xml) , $2 is the location where the indexed files should be stroe, $3 is the name of the file that contains the stats.
The code will output on the terminal 2 lines:
- The number of files in the dump
- The total time taken to reate the index
In addition to this the code will also output 2 files in the $2 location
- tf.txt - This document contains the frequencies of words in the documents allong with the of