Skip to content

Source Code Project Structure

Andy Jackson edited this page Oct 2, 2018 · 1 revision

Core Components

Hadoop Support

  • warc-hadoop-recordreaders: The generic code that parses ARC and WARC files for map-reduce jobs.
  • warc-hadoop-indexer: The map-reduce version of warc-indexer, combining the record readers and the indexer to run large scale indexing jobs.

User Interface

The indexing tools do not come with a UI, but a number of different Front ends exist.