Skip to content

Commit

Permalink
Remove Injector topology from crawler run script
Browse files Browse the repository at this point in the history
(integrated into main topology)
  • Loading branch information
sebastian-nagel committed Jan 30, 2020
1 parent f976a67 commit 7281e85
Showing 1 changed file with 1 addition and 9 deletions.
10 changes: 1 addition & 9 deletions bin/run-crawler.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,17 +26,9 @@ sleep 10
STORMCRAWLER="storm jar $PWD/lib/crawler.jar"
# inject seeds into Elasticsearch
$STORMCRAWLER com.digitalpebble.stormcrawler.elasticsearch.ESSeedInjector \
$PWD/seeds '*' -conf $PWD/conf/es-conf.yaml -conf $PWD/conf/crawler-conf.yaml
# alternatively running the flux
#$STORMCRAWLER org.apache.storm.flux.Flux --remote $PWD/conf/es-injector.flux
# wait until seeds are in the status index
sleep 20
# run the crawler
$STORMCRAWLER org.commoncrawl.stormcrawler.news.CrawlTopology \
-conf $PWD/conf/es-conf.yaml -conf $PWD/conf/crawler-conf.yaml
$PWD/seeds '*' -conf $PWD/conf/es-conf.yaml -conf $PWD/conf/crawler-conf.yaml
# alternatively running the flux
#$STORMCRAWLER org.apache.storm.flux.Flux --remote $PWD/conf/crawler.flux
# suppress warnings about malformed XML in sitemaps
Expand Down

0 comments on commit 7281e85

Please sign in to comment.