Skip to content

stefanosh/NLP-Search-Engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NLP-Search-Engine

Make sure scrapy and dependencies are installed like it is mentioned on docs

Additional Dependencies Install:

pip install BeautifulSoup4
pip install pathlib
pip install nltk 
pip install numpy
pip install xmljson

Create DB and 'ARTICLES' table

cd NLP_search_engine
cd database
python create_DB.py

Then, run to crawl the websites and store the articles in database:

scrapy crawl hackernews
scrapy crawl technews
scrapy crawl reuters

Then, add postags to each word in article :

python morphoSyntactic_analysis.py

After Postagger step is done, in order to add lemmatisation, calculations of tf_idf etc, and write to inverted_index.xml file, run :

python vector_space_model.py

Do everything in one command:

cd NLP_search_engine/ && python ./database/create_DB.py && scrapy crawl hackernews && scrapy crawl technews && scrapy crawl reuters && python morphoSyntactic_analysis.py && python vector_space_model.py

Start a query! You can add as many words you wish, and limit the results by limit argument(when there is no limit, all articles containing the word are returned)

python query_index.py day compete value --limit 10

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages