Skip to content

Information Retrieval and Extraction Miniproject to build a Wikipedia like Search engine using wiki dump

Notifications You must be signed in to change notification settings

tanvi2612/wiki-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Code Specifications


There are two files for indexing and searching respectively:

For indexing, the file is called indexer.py. To run this file you will have to run

python idexer.py $1 $2 $3

here $1 = The address to the dump file (.xml) , $2 is the location where the indexed files should be stroe, $3 is the name of the file that contains the stats.

The code will output on the terminal 2 lines:

  1. The number of files in the dump
  2. The total time taken to reate the index

In addition to this the code will also output 2 files in the $2 location

  1. tf.txt - This document contains the frequencies of words in the documents allong with the of

About

Information Retrieval and Extraction Miniproject to build a Wikipedia like Search engine using wiki dump

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published