You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
warc-indexer has the "index or no index of a WARC-record"-properties record_type_include, response_include, protocol_include, exclusions and url_exclude.
With some rewriting this could be fully generalized to work on any field content for the generated SolrDocument (with optimizations for the situations where "no index" can be determined before analyzing), making it posssible to use white/black-lists for MIME types, domains etc. It could be folded into the fields in the config or be a separate section.
The text was updated successfully, but these errors were encountered:
warc-indexer has the "index or no index of a WARC-record"-properties
record_type_include
,response_include
,protocol_include
,exclusions
andurl_exclude
.With some rewriting this could be fully generalized to work on any field content for the generated SolrDocument (with optimizations for the situations where "no index" can be determined before analyzing), making it posssible to use white/black-lists for MIME types, domains etc. It could be folded into the
fields
in the config or be a separate section.The text was updated successfully, but these errors were encountered: