NLP-Search-Engine

Make sure scrapy and dependencies are installed like it is mentioned on docs

Additional Dependencies Install:

pip install BeautifulSoup4
pip install pathlib
pip install nltk 
pip install numpy
pip install xmljson

Create DB and 'ARTICLES' table

cd NLP_search_engine
cd database
python create_DB.py

Then, run to crawl the websites and store the articles in database:

scrapy crawl hackernews
scrapy crawl technews
scrapy crawl reuters

Then, add postags to each word in article :

python morphoSyntactic_analysis.py

After Postagger step is done, in order to add lemmatisation, calculations of tf_idf etc, and write to inverted_index.xml file, run :

python vector_space_model.py

Do everything in one command:

cd NLP_search_engine/ && python ./database/create_DB.py && scrapy crawl hackernews && scrapy crawl technews && scrapy crawl reuters && python morphoSyntactic_analysis.py && python vector_space_model.py

Start a query! You can add as many words you wish, and limit the results by limit argument(when there is no limit, all articles containing the word are returned)

python query_index.py day compete value --limit 10

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
NLP_search_engine		NLP_search_engine
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-Search-Engine

About

Releases

Packages

Contributors 2

Languages

stefanosh/NLP-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

NLP-Search-Engine

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages