Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 875 Bytes

README.md

File metadata and controls

17 lines (12 loc) · 875 Bytes

Crawler TripAdvisor

A focused crawler in Java �for reviews-extraction from TripAdvisor

final project for the course of Web Information Management, june 2012

Detailed project information and evaluation can be found in the docs/ folder, in the pdf presentation eng_crawler_tripadvisor.pdf

Running the crawler

compile and run it/thecrawlers/crawler/CrawlHandler with the arguments:

  • numberOfCrawlers
  • rootFolder (it will contain intermediate crawl data) ...for example "data/crawl/"\
  • timeDelay (time delay between requests in milliseconds)

Warning

This version supports crawling on Tripadvisor as it is in june 2012. Due to the focused nature of the crawler and the evolution of page structure in Tripadvisor, this project will output parsing errors after some time and need updates.