Research Paper PDF's

Overview

This was work for a comission where the client wanted a program to scrape through volumes of research papers and extract information from their PDF's. This information included authors, publication dates, abstracts, institutions, etc.

Features

Scrapes metadata (title, volume, issue, year) of research papers from JSTOR.
Extracts and stores abstracts from research papers.
Downloads PDF files associated with each paper.
Automatically handles different browsing sessions (e.g., Chrome, Edge) to dodge bot detection.
Saves gathered papers and abstracts in a pickle.
Allows for resuming and updating the scraping process.

License

This project is open-source under the MIT License.

Author

Simon Gray

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Paper.py		Paper.py
README.md		README.md
affiliationsGetter.py		affiliationsGetter.py
fileMover.py		fileMover.py
final.csv		final.csv
finalTableMaker.py		finalTableMaker.py
finishedPaper.py		finishedPaper.py
finishedSortedResearchPapers.pickle		finishedSortedResearchPapers.pickle
folderMaker.py		folderMaker.py
main.py		main.py
pdfDownloader.py		pdfDownloader.py
researchPapers.pickle		researchPapers.pickle
sortedResearchPapers.pickle		sortedResearchPapers.pickle
titleAndColleges.py		titleAndColleges.py
world-universities.csv		world-universities.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Paper PDF's

Overview

Features

License

Author

About

Releases

Packages

Languages

simoistgray/ResearchPaperPDFs

Folders and files

Latest commit

History

Repository files navigation

Research Paper PDF's

Overview

Features

License

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages