Skip to content

This java program starts at a user specified website and scrapes email addresses and links to other websites. Before adding the scraped emails or url's it checks if they were scraped already. It then continues on to the links it scraped and checks those websites for new emails or url's. It will continue scraping until it reaches the specified ma…

Notifications You must be signed in to change notification settings

elishagreenwald/web-email-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

web-email-scraper

This java program starts at a user specified website and scrapes email addresses and links to other websites. Before adding the scraped emails or url's it checks if they were scraped already. It then continues on to the links it scraped and checks those websites for new emails or url's. It will continue scraping until it reaches the specified maximum email's to scrape. When it reaches the maximum emails to scrape it uploads all the email addressess to a specified remote sql server.

Settings to set:

-starting url

-sql server remote ip:port and database name

-sql username and password

Please note the following:

-Some sites don't like to be scraped and may block your IP permanently or temporariliy from accessing their site.

-The default starting url is cnn.com you can change it to anything though

libraries used (& required for this program to work)

-jsoup-1.8.2 http://jsoup.org/

-jtds-1.2.5 http://sourceforge.net/projects/jtds/files/jtds/1.2.5/

-ApacheRoutines -commons-validator-1.4.1 http://www.apache.org/

About

This java program starts at a user specified website and scrapes email addresses and links to other websites. Before adding the scraped emails or url's it checks if they were scraped already. It then continues on to the links it scraped and checks those websites for new emails or url's. It will continue scraping until it reaches the specified ma…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages