Extract text from pdf documents containing facebook ad info and build structured database
A sample of the documents are included in the pdfs
directory. The entire collection of 3,000+ pdf files is available at https://intelligence.house.gov/social-media-content/social-media-advertisements.htm
For those interested in running the extraction code on the entire collection, simply place all of the files found at the link above into the pdfs
directory on your local machine. For size considerations of this repo on github, only the sample of about 100 files are included here.