Siembol provides a scalable, advanced security analytics framework based on open-source big data technologies. Siembol normalizes, enriches, and alerts on data from various sources, allowing security teams to respond to attacks before they become incidents.
Siembol was developed in-house at G-Research as a security data processing application, forming the core of the G-Research Security Data Platform. We knew that we needed a highly efficient, real-time event processing engine and used Splunk and Apache Metron in the early years of our experience. However, neither product attended to all of our needs -- we wanted specific features that mattered to G-Research.
As early adopters of Metron, we believed in the product and tried hard to adapt it to our needs. Ultimately, we recognized its limitations and we began to add the missing features and shore up its instabilities. Sadly, by the time we were able to give back to the Metron community, Metron's time had passed. However, as we still believe in the core mission of Metron, we are releasing our work under the project name, 'Siembol'. We hope this will provide the security community with an effective alternative, filling the void left by Metron's move to the Apache Attic.
Components for alert escalation. CSIRT security teams can easily create a rule-based alert from a single data source, or they can create advanced correlation rules that combine various data sources. We are planning to release a tool for translating Sigma rule specification into the alerting rule engine soon after open-sourcing.
Ability to integrate with other systems. While the core functionality of Metron was great, we always yearned for more integration with the growing ecosystem of SIEM-related projects. Currently, Siembol integrates with other systems such as Jira, TheHive, Cortex, ELK, and LDAP. Beyond that, Siembol’s plugin interface allows a custom integration with other systems used in incident response.
Advanced parsing framework for building fault-tolerant parsers. Metron gave us a great way to introduce a powerful parsing framework into our Security Data Platform but it was brittle and sensitive to even minor syntactic mistakes. Siembol provides a robust framework for normalizing and extracting fields from logs, supporting chaining of parsers, field extractors and transformation functions.
Advanced enrichment component. Siembol allows for defining rules for selecting enrichment logic, joining enrichment tables, and defining how to enrich the processed log with information from user-defined tables.
Configurations and rules are defined by a modern Angular web application -- Siembol UI -- and stored in Git repositories. All configurations are stored in JSON format and edited by web forms in order to speed up the creation and learning time and avoid mistakes. Moreover, the Siembol UI supports validation, testing, and creating and evaluating test cases to mitigate configuration errors in a production environment.
Supports OAUTH2/OIDC for authentication and authorization in the Siembol UI. All Siembol services can have multiple instances with authorization based on OIDC group membership. This allows for multi-tenancy usage without the need to deploy multiple instances of Siembol.
Easy installation for use with prepared Docker images and Helm charts. Metron’s installation process was arduous and overwhelming; due to its flexible architecture, there were a multitude of ways to set up and configure Metron -- all of which could overwhelm a first-time user. While Siembol maintains the flexibility of Metron for advanced users, Siembol has simplified the installation process for those new to the project.
- Security teams can easily create a rule-based alert from a single data source, or they can create advanced correlation rules that combine various data sources.
- Siembol UI supports translating Sigma rule specification (generic and open signature format for SIEM alerting https://github.com/SigmaHQ/sigma) into the Siembol alerting rule.
- Easy way to integrate Siembol with other systems such as Jira, Cortex, ELK, and LDAP.
- Functionality to provide additional enrichments about an alert, such as ELK searches or LDAP searches, with the option to filter the alert as part of an automatic incident response.
- Plugin interface allowing for custom integration with other systems used in incident response.
- We are planning to publish a collection of plugins that we are using internally at G-Research, while providing space for collecting plugins from the Siembol community.
- Custom-built framework for normalizing logs (parsing) including chaining of extractors and transformations which allows the user to:
- extract JSON, CSV structures, key value pairs, timestamps.
- parse timestamps using standard formatters to an epoch form.
- transform messages by renaming fields, filtering fields, or even the option to filter the whole message.
- Supporting use cases for advanced log ingestion using multiple parsers and a routing logic.
- Supporting a generic text parser, syslog, BSD syslog, and NetFlow v9 binary parser.
- Defining rules for selecting enrichment logic, joining enrichment tables, and defining how to enrich the processed log.
- All configurations are stored in JSON format and edited by web forms in order to avoid mistakes and speed up creation and learning time.
- Configurations are stored in Git repositories.
- Supporting high integrity use cases with protected GitHub main branches for deploying configurations.
- Supporting validation and testing configurations. Moreover, Siembol UI supports creating and evaluating test cases.
- Configuration errors for a rule will only affect that specific rule; errors will no longer bring down the entire apparatus, an improvement over Metron.
- Siembol prefers a declarative JSON language rather than a scripting language like Stellar. We consider declarative language with testing and validation to be less error prone and simpler to understand.
- Supporting OAUTH2/OIDC for authentication and authorization in Siembol UI.
- All Siembol services can have multiple instances with authorization based on OIDC group membership. This allows multi-tenancy usage without the need to deploy multiple instances of Siembol.
- We are planning to test and tune OAUTH/OIDC integration with popular identity providers.
- Siembol supports deployment on external Hadoop clusters to ensure high performance. However, we are providing k8s Helm charts for all deployment dependencies in order to test Siembol in development environments.
-
Siembol can be used to centralize both security data collecting and the monitoring of logs from different sources. In the process of collecting and inspecting logs from third party tools, the format of these logs can vary. Therefore, it is important for Siembol to support the normalization of logs into a standardized format with common fields, such as timestamp. It is often useful to enrich a log with metadata provided by CMDB or other internal systems which are important for building detections.
-
For example, data repositories can be enriched by data classification, network devices by a network zone, username by active directory group, etc. By using Siembol alerting services, CSIRT teams can use the tool to add detection on top of normalized logs. Alerts triggered from the detections are integrated into incident response and defined and evaluated by the Siembol response service. This allows for integration of Siembol with systems such as Jira, Hive, or Cortex, and provides additional enrichments by searching ELK or doing LDAP queries.
-
At G-Research we use Siembol to parse, normalize, enrich and detect approximately 150k events a second. Per day, this adds up to volumes of approximately 15TB of raw data, or 13 billion events.
- Siembol can be used as a tool for detecting attacks or leaks by teams responsible for the system platform. For example, the Big Data team at G-Research is using Siembol to detect leaks and attacks on the Hadoop platform. These detections are then used as another data source within the Siembol SIEM log collection for the CSIRT team handling these incidents.
- Parsing - normalizing logs in messages with one layer of key/value pairs.
- Enrichment - adding useful data to events to assist in detection and investigations.
- Alerting - filtering matching events from an incoming data stream of events based on a configurable rule set. The correlation alerting allows users to group several detections together before raising an alert.
- Response - flexible incident response workflows can be built and triggered in real-time via the highly modular and pluggable framework.
- Kafka - message broker for data pipelines.
- Storm - stream processing framework for services except Siembol response integrated in Kafka streaming.
- GitHub - store of service configurations used in Siembol UI.
- ZooKeeper - synchronization cache for updating service configurations from Git to services.
- Kubernetes cluster - environment to deploy Siembol UI and related microservices for management and orchestration of Siembol services configurations.
- Identity provider - identity provider (OAUTH2/OIDC) used for Siembol UI, allowing for OIDC groups in managing authorization to services.