This project scrapes data from the annual Resume of Congressional Activity, which are avaiable in PDF format.
Source | Description | Link |
---|---|---|
US Senate Web Site | Resumes of Congressional Activity, Session Dates | US Senate Web Site |
Data was scraped using tabula, formatted in Excel using VBA, and tidied in Jupyter Lab using Python.
Tool / Library | Version |
---|---|
Adobe Acrobat Pro | 2024.001.20629 |
JupyterLab | 4.1.2 |
Microsoft Office 365, Excel | 2403 |
Microsft Visual Basic for Applications | 7.1 |
Python | 3.12.2 |
tabula | 1.2.1 |
Name | Description |
---|---|
data | Folder containing original data files and scrubbed output |
code | Folder containing Jupyter notebooks and VBA exports |
documentation | Folder containing test results and data integrity issues |
Data Scrape and Validation Presenatation | Power Point recap of project and findings, saved as PDF |
Asset | License / Use Policy |
---|---|
Original Code | MIT License |
Congressional Activity | Federal Open Data Policy |