This was work for a comission where the client wanted a program to scrape through volumes of research papers and extract information from their PDF's. This information included authors, publication dates, abstracts, institutions, etc.
- Scrapes metadata (title, volume, issue, year) of research papers from JSTOR.
- Extracts and stores abstracts from research papers.
- Downloads PDF files associated with each paper.
- Automatically handles different browsing sessions (e.g., Chrome, Edge) to dodge bot detection.
- Saves gathered papers and abstracts in a pickle.
- Allows for resuming and updating the scraping process.
This project is open-source under the MIT License.
Simon Gray