Similarity Computation

Run command:

python3 build_dict.py

authors-test:

folder containing .txt files of all the authors. File names should be formatted like:

"{Author Name}.{info}.txt"

papers-test:

folder containing .txt files of all the papers of which similarity will be computed

resulting matrix of size #num of reviewers * #num_papers

The matrix in 'similarity_result.txt' has 419 rows and 811 columns.

The given algorithm extracts 4 most relvant papers listed on an authors profile on semantic scholar with high likelihood. There are cases where the number of papers downlaoded is < 4:

the author does not have a semantic scholar profile
a majority of links are not accessible/give exceptions when sent a get request.

dependencies:

python packages:

requests, beautifulsoup4,

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
algorithms		algorithms
authors-test		authors-test
lp_testing		lp_testing
paper_crawling		paper_crawling
papers-test		papers-test
pdf2bow(updated)		pdf2bow(updated)
submission-pdfs		submission-pdfs
tpms		tpms
.gitignore		.gitignore
README.md		README.md
build_dict.py		build_dict.py
similarity_result.txt		similarity_result.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Similarity Computation

About

Releases

Packages

Contributors 2

Languages

Pr1yansh1/similarity_tmlr

Folders and files

Latest commit

History

Repository files navigation

Similarity Computation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages