Web Scraping: Brazilian presidents speeches

Web scraping project to collect all brazilian presidents speeches from Biblioteca da presidência (ex-presidents) and Discursos do Planalto (actual president) for further data analysis.

Technologies:

git clone the project
Install all dependencies using npm install
Run the main project using: node index.js for past presidents (before Bolsonaro)
- They will have this folder pattern: pdfs/fernandocollor/1990/01.pdf
Run the bolsonaro.js project (node bolsonaro.js) to collect all Bolsonaro speeches

After running the main and bolsonaro files, approximately 80% of the data collected was pdf. Use the pdf-to-txt.js to extract the text data from the pdf file.
Use the rename-files.js to rename the files to a certain pattern (such as cafeFilho10.txt, meaning the 11th (because starts with 0) Café Filho speech).

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
README.md		README.md
bolsonaro.js		bolsonaro.js
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
pdf-to-txt.js		pdf-to-txt.js
presidentes.md		presidentes.md
rename-files.js		rename-files.js