-
Notifications
You must be signed in to change notification settings - Fork 25
Excluding Resources From Indexing
Andy Jackson edited this page Sep 4, 2015
·
1 revision
As per the code if you set
warc.index.exclusions.enabled = true
and then
warc.index.exclusions.file = /path/to/file.txt
warc.index.exclusions.check_interval = 600
under the hood it uses StaticMapExclusionFilterFactory
Currently not clear how best to handle this when running under Hadoop Map-Reduce.
Alternatively, the completed Solr index can be 'clean-up' using delete queries. This is probably a sensible final step anyway, just to make totally sure the problematic content is not there.