Siembol Enrichment is an enrichment engine used to add useful data to events to assist in detection and investigations.
The data that is used to enrich events is stored in JSON files in a file store in the following format:
{
"key" :
{
"column1":"value",
"column2":"value2",
...
}
}
When creating a rule you can specify the table to use, the column to join on, and the column to add to the event.
rule_name
- Rule name that uniquely identifies the rulerule_author
- The author of the rule, i.e., the user who last modified the rulerule_version
- The version of the rulerule_description
- This field contains a single text input that allows you set a description for the rule. This should be a short, helpful comment that allows anyone to identify the purpose of this rulesource_type
- This fields allows you to determine the type of data you want to match on. It is essentially a matcher for thesource_type
field. This field does not support regex - however, using*
as an input matches all source types. The source_type field is set during parsing and is equal to the name of the last parser which was used to parse the logmatchers
- Matchers allow you to further filter the events that the enrichment will be applied totable_mapping
- Mappings for enriching events
Matchers allow you to further filter the events that the enrichment will be applied to. You can add as many matchers as you want.
is_enabled
- The matcher is enableddescription
- The description of the matchermatcher_type
- Type of matcher, eitherREGEX_MATCH
orIS_IN_SET
is_negated
- The matcher is negatedfield
- The name of the field on which the matcher will be evaluated
There are two types of matchers:
-
REGEX_MATCH
- A regex_match allows you use a regex statement to match a specified field. There are two string inputs:data
- The regex statement in Java syntax https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html except allowing to use underscores in the names of captured groups Named capture groups in the regex are added as fields in the event. They are available from the next matcher onwards and are included in the output event
-
IS_IN_SET
- It compares the value of a field to a set of strings defined indata
. If the value is in the set then the matcher returns true.data
- A list of strings to compare the value to. New line delimited. Does not support regex - each line must be a literal match however, field substitution is supported in this field
The table mapping tab is where you configure the enrichment you want to perform.
-
table_name
- The name of the table which contains the data you want to enrich the event with -
joining_key
- The string used to join the event with the table (the key json field). This field supports substitution e.g.${field_name}
orhttp://${host_field_name}/${path_field_name}
. This is used to filter the key field of the table -
tags
- Tags are added into the event after successful joining the table with the joining key. You can add as many tags as you wanttag_name
- The name of the tagtag_value
- The value of the tag
-
enriching_fields
- Fields from the enriching table that are added after successful joining the table with the joining key. You can add as many enriching fields as you wanttable_field_name
- The column in the enrichment table that you want to addevent_field_name
- The name you want the field to have in event after enriching
Note: you can only enrich from one table per rule. If you want to enrich the same event from multiple table, you need to create multiple rules.
topology.name
- The name of storm topologykafka.spout.num.executors
- The number of executors for kafka spoutenriching.engine.bolt.num.executors
- The number of executors for enriching rule enginememory.enriching.bolt.num.executors
- The number of executors for memory enrichments from tablesmerging.bolt.num.executors
- The number of executors for merging enriched fieldskafka.writer.bolt.num.executors
- The number of executors for producing output messagesenriching.rules.zookeeper.attributes
- The zookeeper attributes for updating enrichment ruleszk.url
- Zookeeper servers url. Multiple servers are separated by commazk.path
- Path to a zookeeper node
enriching.tables.zookeeper.attributes
- The zookeeper attributes for notifying the update of enrichment tableszk.url
- Zookeeper servers url. Multiple servers are separated by commazk.path
- Path to a zookeeper node
kafka.batch.writer.attributes
- Kafka batch writer attributes for producing output messagesproducer.properties
- Defines kafka producer properties, see https://kafka.apache.org/0102/documentation.html#producerconfigs
storm.attributes
- Storm attributes for the enrichment topologybootstrap.servers
- Kafka brokers servers url. Multiple servers are separated by commafirst.pool.offset.strategy
- Defines how the kafka spout seeks the offset to be used in the first poll to kafkakafka.spout.properties
- Defines kafka consumer attributes for kafka spout such asgroup.id
,protocol
, see https://kafka.apache.org/0102/documentation.html#consumerconfigspoll.timeout.ms
- Kafka consumer parameterpoll.timeout.ms
used in kafka spoutoffset.commit.period.ms
- Specifies the period of time (in milliseconds) after which the spout commits to Kafka, see https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/storm-moving-data/content/tuning_kafkaspout_performance.htmlmax.uncommitted.offsets
- Defines the maximum number of polled offsets (records) that can be pending commit before another poll can take placestorm.config
- Defines storm attributes for a topology, see https://storm.apache.org/releases/current/Configuration.html
enriching.input.topics
- The list of kafka input topics for reading messagesenriching.output.topic
- Output kafka topic name for correctly processed messagesenriching.error.topic
- Output kafka topic name for error messagesenriching.tables.hdfs.uri
- The url for hdfs cluster where enriching tables are stored