Skip to content

Parse the PDFs of the District of Columbia's campaign finance disclosures into CSVs for expenditures and contributions. The PDFs include more fields than the supplied CSVs, specifically employment information, to detect bundling and conflict of interest.

Notifications You must be signed in to change notification settings

lukerosiak/dccampfin

Repository files navigation

Parse the PDFs of the District of Columbia's campaign finance disclosures into CSVs for expenditures and contributions.

The PDFs include more fields than the supplied CSVs, specifically employment information, to detect bundling and conflict of interest, and payment type (money order?? cash??).

To run, install python, pdftotext and BeautifulSoup, modify the path where all the bulk PDFs and their extracted text files will live in dccampfin_settings.py, modify the set of years you want, and then:

python download_pdfs.py

python create_csvs.py

Your output will be two CSVs: output/detail_contribs.csv and output/detail_expends.csv

If you don't want to run this script, those files as generated 4/4/2013 are in the repo.

By Luke Rosiak of The Washington Times. Please credit me if you use this data. No guarantees as to its accuracy--submit a pull request if you see any errors.

About

Parse the PDFs of the District of Columbia's campaign finance disclosures into CSVs for expenditures and contributions. The PDFs include more fields than the supplied CSVs, specifically employment information, to detect bundling and conflict of interest.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages