diff --git a/CHANGELOG.md b/CHANGELOG.md index c4a0e9fb62..496b8fb372 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -248,7 +248,7 @@ questions. * Add an `unflatten()` function that turns fields with dot-separated names into fields of nested records (#2277) * Fix an issue where querying an index in a Zed lake did not return all matched records (#2273) * Accept type definition names and aliases in shaper functions (#2289) -* Add a reference [shaper for Zeek data](zeek/Shaping-Zeek-NDJSON.md) (#2300, #2368, #2448, #2489, #2601) +* Add a reference [shaper for Zeek data](docs/integrations/zeek/shaping-zeek-ndjson.md) (#2300, #2368, #2448, #2489, #2601) * Fix an issue where accessing a `null` array element in a `by` grouping caused a panic (#2310) * Add support for parsing timestamps with offset format `±[hh][mm]` (#2297) * Remove cropping from `shape()` (#2309) @@ -326,7 +326,7 @@ questions. * Fix an issue where `len()` of a `null` array was evaluating to something greater than zero (#2761) * Fix an issue where `sort` with no fields was ignoring alias types and nested fields when picking a sort field (#2762) * Fix an issue where unexpected `cut: no record found` warnings were returned by `zed lake query` but not when the same data was queried via `zq` (#2764) -* Move and extend the [Zeek interoperability docs](zeek/README.md) (#2770, #2782, #2830) +* Move and extend the [Zeek interoperability docs](docs/integrations/zeek/README.md) (#2770, #2782, #2830) * Create endpoints in the Zed lake service API that correspond to underlying Zed lake operations, and expose them via `zapi` commands (#2741, #2774, #2786, #2775, #2794, #2795, #2796, #2920, #2925, #2928) * Fix an issue where `zq` would surface a syntax error when reading ZSON it had sent as output (#2792) * Add an `/events` endpoint to the API, which can be used by clients such as the Brim app to be notified of pool updates (#2791) @@ -365,7 +365,7 @@ questions. * Fix an issue where temporary spill-to-disk directories were not being deleted upon exit (#3009, #3010) * Fix a ZSON issue with `union` types with alias decorators (#3015, #3016) * The ZSON format has been changed such that integer type IDs are no longer output (#3017) -* Update the reference Zed shaper for Zeek ([shaper](zeek/shaper.zed), [docs](zeek/Shaping-Zeek-NDJSON.md)) to reflect changes in Zeek release v4.1.0 (#3021) +* Update the reference Zed shaper for Zeek ([docs](docs/integrations/zeek/shaping-zeek-ndjson.md)) to reflect changes in Zeek release v4.1.0 (#3021) * Fix an issue where backslash escapes in Zed regular expressions were not accepted (#3040) * The ZST format has been updated to work for typedef'd outer records (#3047) * Fix an issue where an empty string could not be output as a JSON field name (#3054) @@ -416,7 +416,7 @@ questions. * zqd: Update Zeek pointer to [v3.2.1-brim9](https://github.com/brimdata/zeek/releases/tag/v3.2.1-brim9) which provides the latest [geolocation](https://github.com/brimdata/brim/wiki/Geolocation) data (#2010) * zqd: Update Suricata pointer to [v5.0.3-brim1](https://github.com/brimdata/build-suricata/releases/tag/v5.0.3-brim1) which disables checksum checks, allowing for alert creation on more types of pcaps (#1975) -* ZSON: Update [Zeek Interoperability doc](zeek/Data-Type-Compatibility.md) to include current ZSON syntax (#1956) +* ZSON: Update [Zeek Interoperability doc](docs/integrations/zeek/data-type-compatibility.md) to include current ZSON syntax (#1956) * zq: Ensure the output from the [`fuse`](docs/language/operators/fuse.md) operator is deterministic (#1958) * zq: Fix an issue where the presence of the Greek µ character caused a ZSON read parsing error (#1967) * zqd: Fix an issue where Zeek events generated during pcap import and written to an archivestore were only visible after ingest completion (#1973) @@ -500,7 +500,7 @@ questions. ## v0.23.0 * zql: Add `week` as a unit for [time grouping with `every`](docs/language/functions/every.md) (#1374) -* zq: Fix an issue where a `null` value in a [JSON type definition](zeek/README.md) caused a failure without an error message (#1377) +* zq: Fix an issue where a `null` value in a [JSON type definition](docs/integrations/zeek/README.md) caused a failure without an error message (#1377) * zq: Add [`zst` format](docs/formats/vng.md) to `-i` and `-f` command-line help (#1384) * zq: ZNG spec and `zq` updates to introduce the beta ZNG storage format (#1375, #1415, #1394, #1457, #1512, #1523, #1529), also addressing the following: * New data type `bytes` for storing sequences of bytes encoded as base64 (#1315) @@ -516,11 +516,11 @@ questions. * zqd: Check and convert alpha ZNG filestores to beta ZNG (#1574, #1576) * zq: Fix an issue where spill-to-disk file names could collide (#1391) * zq: Allow the [`fuse` operator](docs/language/operators/fuse.md) to spill-to-disk to avoid memory limitations (#1355, #1402) -* zq: No longer require `_path` as a first column in a [JSON type definition](zeek/README.md) (#1370) +* zq: No longer require `_path` as a first column in a [JSON type definition](docs/integrations/zeek/README.md) (#1370) * zql: Improve ZQL docs for [aggregate functions](docs/language/operators/summarize.md) and grouping (#1385) * zql: Point links for developer docs at [pkg.go.dev](https://pkg.go.dev/) instead of [godoc.org](https://godoc.org/) (#1401) * zq: Add support for timestamps with signed timezone offsets (#1389) -* zq: Add a [JSON type definition](zeek/README.md) for alert events in [Suricata EVE logs](https://suricata.readthedocs.io/en/suricata-5.0.2/output/eve/eve-json-output.html) (#1400) +* zq: Add a [JSON type definition](docs/integrations/zeek/README.md) for alert events in [Suricata EVE logs](https://suricata.readthedocs.io/en/suricata-5.0.2/output/eve/eve-json-output.html) (#1400) * zq: Update the [ZNG over JSON (ZJSON)](docs/formats/zjson.md) spec and implementation (#1299) * zar: Use buffered streaming for archive import (#1397) * zq: Add an `ast` command that prints parsed ZQL as its underlying JSON object (#1416) @@ -652,7 +652,7 @@ questions. * zql: Group-by no longer emits records in "deterministic but undefined" order (#914) * zqd: Revise constraints on Space names (#853, #926, #944, #945) * zqd: Fix an issue where a file replacement race could cause an "access is denied" error in Brim during pcap import (#925) -* zng: Revise [Zeek compatibility](zeek/Data-Type-Compatibility.md) doc (#919) +* zng: Revise [Zeek compatibility](docs/integrations/zeek/data-type-compatibility.md) doc (#919) * zql: Clarify [`cut` operator documentation](docs/language/operators/cut.md) (#924) * zqd: Fix an issue where an invalid 1970 Space start time could be created in Brim during pcap import (#938) diff --git a/zeek/README.md b/docs/integrations/zeek/README.md similarity index 63% rename from zeek/README.md rename to docs/integrations/zeek/README.md index 3fcb52b72f..7d87eceb29 100644 --- a/zeek/README.md +++ b/docs/integrations/zeek/README.md @@ -5,6 +5,6 @@ with logs from the [Zeek](https://zeek.org/) open source network security monitoring tool. Depending on how you use Zeek, one or more of the following docs may be of interest to you. -* [Reading Zeek Log Formats](Reading-Zeek-Log-Formats.md) -* [Zed/Zeek Data Type Compatibility](Data-Type-Compatibility.md) -* [Shaping Zeek NDJSON](Shaping-Zeek-NDJSON.md) +* [Reading Zeek Log Formats](reading-zeek-log-formats.md) +* [Zed/Zeek Data Type Compatibility](data-type-compatibility.md) +* [Shaping Zeek NDJSON](shaping-zeek-ndjson.md) diff --git a/docs/integrations/zeek/_category_.yaml b/docs/integrations/zeek/_category_.yaml new file mode 100644 index 0000000000..a0a8034a0e --- /dev/null +++ b/docs/integrations/zeek/_category_.yaml @@ -0,0 +1,2 @@ +position: 3 +label: Zeek diff --git a/zeek/Data-Type-Compatibility.md b/docs/integrations/zeek/data-type-compatibility.md similarity index 81% rename from zeek/Data-Type-Compatibility.md rename to docs/integrations/zeek/data-type-compatibility.md index 7063a9703d..249131259f 100644 --- a/zeek/Data-Type-Compatibility.md +++ b/docs/integrations/zeek/data-type-compatibility.md @@ -1,37 +1,29 @@ -# Zed/Zeek Data Type Compatibility - -- [Introduction](#introduction) -- [Equivalent Types](#equivalent-types) -- [Example](#example) -- [Type-Specific Details](#type-specific-details) - * [`double`](#double) - * [`enum`](#enum) - * [`port`](#port) - * [`set`](#set) - * [`string`](#string) - * [`record`](#record) +--- +sidebar_position: 2 +sidebar_label: Zed/Zeek Data Type Compatibility +--- -## Introduction +# Zed/Zeek Data Type Compatibility As the Zed data model was in many ways inspired by the [Zeek TSV log format](https://docs.zeek.org/en/master/log-formats.html#zeek-tsv-format-logs), -the rich Zed storage formats ([ZSON](../docs/formats/zson.md), -[ZNG](../docs/formats/zng.md), etc.) maintain comprehensive interoperability +the rich Zed storage formats ([ZSON](../../formats/zson.md), +[ZNG](../../formats/zng.md), etc.) maintain comprehensive interoperability with Zeek. When Zeek is configured to output its logs in NDJSON format, much of the rich type information is lost in translation, but -this can be restored by following the guidance for [shaping Zeek NDJSON](Shaping-Zeek-NDJSON.md). +this can be restored by following the guidance for [shaping Zeek NDJSON](shaping-zeek-ndjson.md). On the other hand, Zeek TSV can be converted to Zed storage formats and back to Zeek TSV without any loss of information. This document describes how the Zed type system is able to represent each of the types that may appear in Zeek logs. -Tools like [`zq`](https://github.com/brimdata/zed) and -[Zui](https://github.com/brimdata/zui) maintain an internal Zed-typed +Tools like [`zq`](../../commands/zq.md) and +[Zui](https://zui.brimdata.io/) maintain an internal Zed-typed representation of any Zeek data that is read or imported. Therefore, knowing the equivalent types will prove useful when performing operations in the -[Zed language](../docs/language/README.md) such as -[type casting](../docs/language/README.md#data-types) or looking at the data +[Zed language](../../language/README.md) such as +[type casting](../../language/data-types.md) or looking at the data when output as ZSON. ## Equivalent Types @@ -45,20 +37,20 @@ applicable to handling certain types. | Zeek Type | Zed Type | Additional Detail | |------------|------------|-------------------| -| [`bool`](https://docs.zeek.org/en/current/script-reference/types.html#type-bool) | [`bool`](../docs/formats/zson.md#33-primitive-values) | | -| [`count`](https://docs.zeek.org/en/current/script-reference/types.html#type-count) | [`uint64`](../docs/formats/zson.md#33-primitive-values) | | -| [`int`](https://docs.zeek.org/en/current/script-reference/types.html#type-int) | [`int64`](../docs/formats/zson.md#33-primitive-values) | | -| [`double`](https://docs.zeek.org/en/current/script-reference/types.html#type-double) | [`float64`](../docs/formats/zson.md#33-primitive-values) | See [`double` details](#double) | -| [`time`](https://docs.zeek.org/en/current/script-reference/types.html#type-time) | [`time`](../docs/formats/zson.md#33-primitive-values) | | -| [`interval`](https://docs.zeek.org/en/current/script-reference/types.html#type-interval) | [`duration`](../docs/formats/zson.md#33-primitive-values) | | -| [`string`](https://docs.zeek.org/en/current/script-reference/types.html#type-string) | [`string`](../docs/formats/zson.md#33-primitive-values) | See [`string` details about escaping](#string) | -| [`port`](https://docs.zeek.org/en/current/script-reference/types.html#type-port) | [`uint16`](../docs/formats/zson.md#33-primitive-values) | See [`port` details](#port) | -| [`addr`](https://docs.zeek.org/en/current/script-reference/types.html#type-addr) | [`ip`](../docs/formats/zson.md#33-primitive-values) | | -| [`subnet`](https://docs.zeek.org/en/current/script-reference/types.html#type-subnet) | [`net`](../docs/formats/zson.md#33-primitive-values) | | -| [`enum`](https://docs.zeek.org/en/current/script-reference/types.html#type-enum) | [`string`](../docs/formats/zson.md#33-primitive-values) | See [`enum` details](#enum) | -| [`set`](https://docs.zeek.org/en/current/script-reference/types.html#type-set) | [`set`](../docs/formats/zson.md#343-set-value) | See [`set` details](#set) | -| [`vector`](https://docs.zeek.org/en/current/script-reference/types.html#type-vector) | [`array`](../docs/formats/zson.md#342-array-value) | | -| [`record`](https://docs.zeek.org/en/current/script-reference/types.html#type-record) | [`record`](../docs/formats/zson.md#341-record-value) | See [`record` details](#record) | +| [`bool`](https://docs.zeek.org/en/current/script-reference/types.html#type-bool) | [`bool`](../../formats/zson.md#23-primitive-values) | | +| [`count`](https://docs.zeek.org/en/current/script-reference/types.html#type-count) | [`uint64`](../../formats/zson.md#23-primitive-values) | | +| [`int`](https://docs.zeek.org/en/current/script-reference/types.html#type-int) | [`int64`](../../formats/zson.md#23-primitive-values) | | +| [`double`](https://docs.zeek.org/en/current/script-reference/types.html#type-double) | [`float64`](../../formats/zson.md#23-primitive-values) | See [`double` details](#double) | +| [`time`](https://docs.zeek.org/en/current/script-reference/types.html#type-time) | [`time`](../../formats/zson.md#23-primitive-values) | | +| [`interval`](https://docs.zeek.org/en/current/script-reference/types.html#type-interval) | [`duration`](../../formats/zson.md#23-primitive-values) | | +| [`string`](https://docs.zeek.org/en/current/script-reference/types.html#type-string) | [`string`](../../formats/zson.md#23-primitive-values) | See [`string` details about escaping](#string) | +| [`port`](https://docs.zeek.org/en/current/script-reference/types.html#type-port) | [`uint16`](../../formats/zson.md#23-primitive-values) | See [`port` details](#port) | +| [`addr`](https://docs.zeek.org/en/current/script-reference/types.html#type-addr) | [`ip`](../../formats/zson.md#23-primitive-values) | | +| [`subnet`](https://docs.zeek.org/en/current/script-reference/types.html#type-subnet) | [`net`](../../formats/zson.md#23-primitive-values) | | +| [`enum`](https://docs.zeek.org/en/current/script-reference/types.html#type-enum) | [`string`](../../formats/zson.md#23-primitive-values) | See [`enum` details](#enum) | +| [`set`](https://docs.zeek.org/en/current/script-reference/types.html#type-set) | [`set`](../../formats/zson.md#243-set-value) | See [`set` details](#set) | +| [`vector`](https://docs.zeek.org/en/current/script-reference/types.html#type-vector) | [`array`](../../formats/zson.md#242-array-value) | | +| [`record`](https://docs.zeek.org/en/current/script-reference/types.html#type-record) | [`record`](../../formats/zson.md#241-record-value) | See [`record` details](#record) | > **Note:** The [Zeek data type](https://docs.zeek.org/en/current/script-reference/types.html) > page describes the types in the context of the @@ -159,8 +151,8 @@ out again in the Zeek TSV log format. Other implementations of the Zed storage formats (should they exist) may handle these differently. Multiple Zeek types discussed below are represented via a -[type definition](../docs/formats/zson.md#25-type-definitions) to one of Zed's -[primitive types](../docs/formats/zson.md#33-primitive-values). The Zed type +[type definition](../../formats/zson.md#22-type-decorators) to one of Zed's +[primitive types](../../formats/zson.md#23-primitive-values). The Zed type definitions maintain the history of the field's original Zeek type name such that `zq` may restore it if the field is later output in Zeek format. Knowledge of its original Zeek type may also enable special @@ -186,7 +178,6 @@ these values are represented with a ZSON type name bound to the Zed `string` type. See the text above regarding [type definitions](#type-specific-details) for more details. - ### `port` The numeric values that appear in Zeek logs under this type are represented @@ -214,7 +205,7 @@ _not_ intended to be read or presented as such. Meanwhile, another Zeek UTF-8. These details are currently only captured within the Zeek source code itself that defines how these values are generated. -Zed includes a [primitive type](../docs/formats/zson.md#33-primitive-values) +Zed includes a [primitive type](../../formats/zson.md#23-primitive-values) called `bytes` that's suited to storing the former "always binary" case and a `string` type for the latter "always printable" case. However, Zeek logs do not currently communicate details that would allow an implementation to know @@ -258,7 +249,7 @@ Zed that refer to the record at a higher level but affect all values lower down in the record hierarchy. Revisiting the data from our example, we can output all fields within -`my_record` via a Zed [`cut`](../docs/language/operators/cut.md) operation. +`my_record` via a Zed [`cut`](../../language/operators/cut.md) operation. #### Command: diff --git a/zeek/Reading-Zeek-Log-Formats.md b/docs/integrations/zeek/reading-zeek-log-formats.md similarity index 89% rename from zeek/Reading-Zeek-Log-Formats.md rename to docs/integrations/zeek/reading-zeek-log-formats.md index 17bfa87138..fa721b9fb6 100644 --- a/zeek/Reading-Zeek-Log-Formats.md +++ b/docs/integrations/zeek/reading-zeek-log-formats.md @@ -1,17 +1,15 @@ -# Reading Zeek Log Formats - -- [Summary](#summary) -- [Zeek TSV](#zeek-tsv) -- [Zeek NDJSON](#zeek-ndjson) -- [The Role of `_path`](#the-role-of-_path) +--- +sidebar_position: 1 +sidebar_label: Reading Zeek Log Formats +--- -# Summary +# Reading Zeek Log Formats Zed is capable of reading both common Zeek log formats. This document provides guidance for what to expect when reading logs of these formats using the Zed tools such as `zq`. -# Zeek TSV +## Zeek TSV [Zeek TSV](https://docs.zeek.org/en/master/log-formats.html#zeek-tsv-format-logs) is Zeek's default output format for logs. This format can be read automatically @@ -19,9 +17,9 @@ is Zeek's default output format for logs. This format can be read automatically with the Zed tools such as `zq`. The following example shows a TSV `conn.log` being read via `zq` and -output as [ZSON](../docs/formats/zson.md). +output as [ZSON](../../formats/zson.md). -#### conn.log: +#### conn.log ```mdtest-input conn.log #separator \x09 @@ -35,13 +33,13 @@ output as [ZSON](../docs/formats/zson.md). 1521911721.255387 C8Tful1TvM3Zf5x8fl 10.164.94.120 39681 10.47.3.155 3389 tcp - 0.004266 97 19 RSTR - - 0 ShADTdtr 10 730 6 342 - ``` -#### Example: +#### Example ```mdtest-command zq -Z 'head 1' conn.log ``` -#### Output: +#### Output ```mdtest-output { _path: "conn", @@ -74,11 +72,11 @@ zq -Z 'head 1' conn.log Other than Zed, Zeek provides one of the richest data typing systems available and therefore such records typically need no adjustment to their data types once they've been read in as is. The -[Zed/Zeek Data Type Compatibility](Data-Type-Compatibility.md) document +[Zed/Zeek Data Type Compatibility](data-type-compatibility.md) document provides further detail on how the rich data types in Zeek TSV map to the -equivalent [rich types in Zed](../docs/formats/zson.md#33-primitive-values). +equivalent [rich types in Zed](../../formats/zson.md#23-primitive-values). -# Zeek NDJSON +## Zeek NDJSON As an alternative to the default TSV format, there are two common ways that Zeek may instead generate logs in [NDJSON](http://ndjson.org/) format. @@ -94,19 +92,19 @@ as is, but with caveats. Let's revisit the same `conn` record we just examined from the Zeek TSV log, but now as NDJSON generated using the JSON Streaming Logs package. -#### conn.ndjson: +#### conn.ndjson ```mdtest-input conn.ndjson {"_path":"conn","_write_ts":"2018-03-24T17:15:21.400275Z","ts":"2018-03-24T17:15:21.255387Z","uid":"C8Tful1TvM3Zf5x8fl","id.orig_h":"10.164.94.120","id.orig_p":39681,"id.resp_h":"10.47.3.155","id.resp_p":3389,"proto":"tcp","duration":0.004266023635864258,"orig_bytes":97,"resp_bytes":19,"conn_state":"RSTR","missed_bytes":0,"history":"ShADTdtr","orig_pkts":10,"orig_ip_bytes":730,"resp_pkts":6,"resp_ip_bytes":342} ``` -#### Example: +#### Example ```mdtest-command zq -Z 'head 1' conn.ndjson ``` -#### Output: +#### Output ```mdtest-output { _path: "conn", @@ -148,18 +146,18 @@ that Zeek chose to output these values in NDJSON as it did. Furthermore, if you were just seeking to do quick searches on the string values or simple math on the numbers, these limitations may be acceptable. However, if you intended to perform operations like -[aggregations with time-based grouping](../docs/language/functions/bucket.md) -or [CIDR matches](../docs/language/functions/network_of.md) +[aggregations with time-based grouping](../../language/functions/bucket.md) +or [CIDR matches](../../language/functions/network_of.md) on IP addresses, you would likely want to restore the rich Zed data types as -the records are being read. The document on [Shaping Zeek NDJSON](Shaping-Zeek-NDJSON.md) +the records are being read. The document on [Shaping Zeek NDJSON](shaping-zeek-ndjson.md) provides details on how this can be done. -# The Role of `_path` +## The Role of `_path` Zeek's `_path` field plays an important role in differentiating between its different [log types](https://docs.zeek.org/en/master/script-reference/log-files.html) (`conn`, `dns`, etc.) For instance, -[shaping Zeek NDJSON](Shaping-Zeek-NDJSON.md) relies on the value of +[shaping Zeek NDJSON](shaping-zeek-ndjson.md) relies on the value of the `_path` field to know which Zed type to apply to an input NDJSON record. diff --git a/zeek/shaper.zed b/docs/integrations/zeek/shaping-zeek-ndjson.md similarity index 57% rename from zeek/shaper.zed rename to docs/integrations/zeek/shaping-zeek-ndjson.md index 4fe5154a98..2b62cecdb7 100644 --- a/zeek/shaper.zed +++ b/docs/integrations/zeek/shaping-zeek-ndjson.md @@ -1,3 +1,56 @@ +--- +sidebar_position: 3 +sidebar_label: Shaping Zeek NDJSON +--- + +# Shaping Zeek NDJSON + +As described in [Reading Zeek Log Formats](reading-zeek-log-formats.md), +logs output by Zeek in NDJSON format lose much of their rich data typing that +was originally present inside Zeek. This detail can be restored using a Zed +shaper, such as the reference `shaper.zed` described below. + +A full description of all that's possible with shapers is beyond the scope of +this doc. However, this example for shaping Zeek NDJSON is quite simple and +is described below. + +## Zeek Version/Configuration + +The fields and data types in the reference `shaper.zed` reflect the default +NDJSON-format logs output by Zeek releases up to the version number referenced +in the comments at the top of that file. They have been revisited periodically +as new Zeek versions have been released. + +Most changes we've observed in Zeek logs between versions have involved only the +addition of new fields. Because of this, we expect the shaper should be usable +as is for Zeek releases older than the one most recently tested, since fields +in the shaper not present in your environment would just be filled in with +`null` values. + +[Zeek v4.1.0](https://github.com/zeek/zeek/releases/tag/v4.1.0) is the first +release we've seen since starting to maintain this reference shaper where +field names for the same log type have _changed_ between releases. Because +of this, as shown below, the shaper includes `switch` logic that applies +different type definitions based on the observed field names that are known +to be specific to newer Zeek releases. + +All attempts will be made to update this reference shaper in a timely manner +as new Zeek versions are released. However, if you have modified your Zeek +installation with [packages](https://packages.zeek.org/) +or other customizations, or if you are using a [Corelight Sensor](https://corelight.com/products/appliance-sensors/) +that produces Zeek logs with many fields and logs beyond those found in open +source Zeek, the reference shaper will not cover all the fields in your logs. +[As described below](#zed-pipeline), the reference shaper will assign +inferred types to such additional fields. By exploring your data, you can then +iteratively enhance your shaper to match your environment. If you need +assistance, please speak up on our [public Slack](https://www.brimdata.io/join-slack/). + +## Reference Shaper Contents + +The following reference `shaper.zed` may seem large, but ultimately it follows a +fairly simple pattern that repeats across the many [Zeek log types](https://docs.zeek.org/en/master/script-reference/log-files.html). + +```mdtest-input shaper.zed // This reference Zed shaper for Zeek NDJSON logs was most recently tested with // Zeek v4.1.0. The fields and data types reflect the default NDJSON // logs output by that Zeek version when using the JSON Streaming Logs package. @@ -137,3 +190,223 @@ yield nest_dotted(this) | switch ( case _path=="x509" and has(fingerprint) => yield shape() default => yield shape(schemas[_path]) ) +``` + +### Leading Type Definitions + +The top three lines define types that are referenced further below in the main +portion of the Zed shaper. + +``` +type port=uint16; +type zenum=string; +type conn_id={orig_h:ip,orig_p:port,resp_h:ip,resp_p:port}; +``` +The `port` and `zenum` types are described further in the [Zed/Zeek Data Type Compatibility](data-type-compatibility.md) +doc. The `conn_id` type will just save us from having to repeat these fields +individually in the many Zeek record types that contain an embedded `id` +record. + +### Default Type Definitions Per Zeek Log `_path` + +The bulk of this Zed shaper consists of detailed per-field data type +definitions for each record in the default set of NDJSON logs output by Zeek. +These type definitions reference the types we defined above, such as `port` +and `conn_id`. The syntax for defining primitive and complex types follows the +relevant sections of the [ZSON Format](../../formats/zson.md#2-the-zson-format) +specification. + +``` +... +type conn={_path:string,ts:time,uid:string,id:conn_id,proto:zenum,service:string,duration:duration,orig_bytes:uint64,resp_bytes:uint64,conn_state:string,local_orig:bool,local_resp:bool,missed_bytes:uint64,history:string,orig_pkts:uint64,orig_ip_bytes:uint64,resp_pkts:uint64,resp_ip_bytes:uint64,tunnel_parents:|[string]|,_write_ts:time}; +type dce_rpc={_path:string,ts:time,uid:string,id:conn_id,rtt:duration,named_pipe:string,endpoint:string,operation:string,_write_ts:time}; +... +``` + +> **Note:** See [the role of `_path`](reading-zeek-log-formats.md#the-role-of-_path) +> for important details if you're using Zeek's built-in [ASCII logger](https://docs.zeek.org/en/current/scripts/base/frameworks/logging/writers/ascii.zeek.html) +> to generate NDJSON rather than the [JSON Streaming Logs](https://github.com/corelight/json-streaming-logs) package. + +### Version-Specific Type Definitions + +The next block of type definitions are exceptions for Zeek v4.1.0 where the +names of fields for certain log types have changed from prior releases. + +``` +type ssl_4_1_0={_path:string,ts:time,uid:string,id:conn_id,version:string,cipher:string,curve:string,server_name:string,resumed:bool,last_alert:string,next_protocol:string,established:bool,ssl_history:string,cert_chain_fps:[string],client_cert_chain_fps:[string],subject:string,issuer:string,client_subject:string,client_issuer:string,sni_matches_cert:bool,validation_status:string,_write_ts:time}; +type x509_4_1_0={_path:string,ts:time,fingerprint:string,certificate:{version:uint64,serial:string,subject:string,issuer:string,not_valid_before:time,not_valid_after:time,key_alg:string,sig_alg:string,key_type:string,key_length:uint64,exponent:string,curve:string},san:{dns:[string],uri:[string],email:[string],ip:[ip]},basic_constraints:{ca:bool,path_len:uint64},host_cert:bool,client_cert:bool,_write_ts:time}; +``` + +### Mapping From `_path` Values to Types + +The next section is just simple mapping from the string values typically found +in the Zeek `_path` field to the name of one of the types we defined above. + +``` +const schemas = |{ + "broker": broker, + "capture_loss": capture_loss, + "cluster": cluster, + "config": config, + "conn": conn, + "dce_rpc": dce_rpc, +... +``` + +### Zed Pipeline + +The Zed shaper ends with a pipeline that stitches together everything we've defined +so far. + +``` +put this := unflatten(this) | switch ( + _path=="ssl" has(ssl_history) => put this := shape(ssl_4_1_0); + _path=="x509" has(fingerprint) => put this := shape(x509_4_1_0); + default => put this := shape(schemas[_path]); +) +``` + +Picking this apart, it transforms reach record as it's being read, in three +steps: + +1. `unflatten()` reverses the Zeek NDJSON logger's "flattening" of nested + records, e.g., how it populates a field named `id.orig_h` rather than + creating a field `id` with sub-field `orig_h` inside it. Restoring the + original nesting now gives us the option to reference the record named `id` + in the Zed language and access the entire 4-tuple of values, but still + access the individual values using the same dotted syntax like `id.orig_h` + when needed. + +2. The `switch()` detects if fields specific to Zeek v4.1.0 are present for the + two log types for which the [version-specific type definitions](#version-specific-type-definitions) + should be applied. For all log lines and types other than these exceptions, + the [default type definitions](#default-type-definitions-per-zeek-log-_path) + are applied. + +3. Each `shape()` call applies an appropriate type definition based on the + nature of the incoming record. The logic of `shape()` includes: + + * For any fields referenced in the type definition that aren't present in + the input record, the field is added with a `null` value. (Note: This + could be performed separately via the `fill()` function.) + + * The data type of each field in the type definition is applied to the + field of that name in the input record. (Note: This could be performed + separately via the `cast()` function.) + + * The fields in the input record are ordered to match the order in which + they appear in the type definition. (Note: This could be performed + separately via the `order()` function.) + + Any fields that appear in the input record that are not present in the + type definition are kept and assigned an inferred data type. If you would + prefer to have such additional fields dropped (i.e., to maintain strict + adherence to the shape), append a call to the `crop()` function to the + Zed pipeline, e.g.: + + ``` + ... | put this := shape(schemas[_path]) | put this := crop(schemas[_path]) + ``` + + Open issues [zed/2585](https://github.com/brimdata/zed/issues/2585) and + [zed/2776](https://github.com/brimdata/zed/issues/2776) both track planned + future improvements to this part of Zed shapers. + +## Invoking the Shaper From `zq` + +A shaper is typically invoked via the `-I` option of `zq`. + +For example, if we assume this input file `weird.ndjson` + +```mdtest-input weird.ndjson +{ + "_path": "weird", + "_write_ts": "2018-03-24T17:15:20.600843Z", + "ts": "2018-03-24T17:15:20.600843Z", + "uid": "C1zOivgBT6dBmknqk", + "id.orig_h": "10.47.1.152", + "id.orig_p": 49562, + "id.resp_h": "23.217.103.245", + "id.resp_p": 80, + "name": "TCP_ack_underflow_or_misorder", + "notice": false, + "peer": "zeek" +} +``` + +applying the reference shaper via + +```mdtest-command +zq -Z -I shaper.zed weird.ndjson +``` + +produces + +```mdtest-output +{ + _path: "weird", + ts: 2018-03-24T17:15:20.600843Z, + uid: "C1zOivgBT6dBmknqk", + id: { + orig_h: 10.47.1.152, + orig_p: 49562 (port=uint16), + resp_h: 23.217.103.245, + resp_p: 80 (port) + } (=conn_id), + name: "TCP_ack_underflow_or_misorder", + addl: null (string), + notice: false, + peer: "zeek", + source: null (string), + _write_ts: 2018-03-24T17:15:20.600843Z +} (=weird) +``` + +If working in a directory containing many NDJSON logs, the +reference shaper can be applied to all the records they contain and +output them all in a single binary [ZNG](../../formats/zng.md) file as +follows: + +``` +zq -I shaper.zed *.log > /tmp/all.zng +``` + +If you wish to apply the shaper and then perform additional +operations on the richly-typed records, the Zed query on the command line +should begin with a `|`, as this appends it to the pipeline at the bottom of +the shaper from the included file. + +For example, to count Zeek `conn` records into CIDR-based buckets based on +originating IP address: + +``` +zq -I shaper.zed -f table '| count() by network_of(id.orig_h) | sort -r' conn.log +``` + +[zed/2584](https://github.com/brimdata/zed/issues/2584) tracks a planned +improvement for this use of `zq -I`. + +If you intend to frequently shape the same NDJSON data, you may want to create +an alias in your +shell to always invoke `zq` with the necessary `-I` flag pointing to the path +of your finalized shaper. [zed/1059](https://github.com/brimdata/zed/issues/1059) +tracks a planned enhancement to persist such settings within Zed itself rather +than relying on external mechanisms such as shell aliases. + +## Importing Shaped Data Into Zui + +If you wish to browse your shaped data with [Zui](https://zui.brimdata.io/), +the best way to accomplish this at the moment would be to use `zq` to convert +it to ZNG [as shown above](#invoking-the-shaper-from-zq), then drag the ZNG +into Zui as you would any other log. An enhancement [zed/2695](https://github.com/brimdata/zed/issues/2695) +is planned that will soon make it possible to attach your shaper to a +Pool. This will allow you to drag the original NDJSON logs directly into the +Pool in Zui and have the shaping applied as the records are being committed to +the Pool. + +## Contact us! + +If you're having difficulty, interested in shaping other data sources, or +just have feedback, please join our [public Slack](https://www.brimdata.io/join-slack/) +and speak up or [open an issue](https://github.com/brimdata/zed/issues/new/choose). +Thanks! diff --git a/zeek/Shaping-Zeek-NDJSON.md b/zeek/Shaping-Zeek-NDJSON.md deleted file mode 100644 index 056b57414b..0000000000 --- a/zeek/Shaping-Zeek-NDJSON.md +++ /dev/null @@ -1,234 +0,0 @@ -# Shaping Zeek NDJSON - -- [Summary](#summary) -- [Zeek Version/Configuration](#zeek-versionconfiguration) -- [Reference Shaper Contents](#reference-shaper-contents) - * [Leading Type Definitions](#leading-type-definitions) - * [Default Type Definitions Per Zeek Log `_path`](#default-type-definitions-per-zeek-log-_path) - * [Version-Specific Type Definitions](#version-specific-type-definitions) - * [Mapping From `_path` Values to Types](#mapping-from-_path-values-to-types) - * [Zed Pipeline](#zed-pipeline) -- [Invoking the Shaper From `zq`](#invoking-the-shaper-from-zq) -- [Importing Shaped Data Into Zui](#importing-shaped-data-into-zui) -- [Contact us!](#contact-us) - -# Summary - -As described in [Reading Zeek Log Formats](Reading-Zeek-Log-Formats.md), -logs output by Zeek in NDJSON format lose much of their rich data typing that -was originally present inside Zeek. This detail can be restored using a Zed -shaper, such as the reference [`shaper.zed`](shaper.zed) -that can be found in this directory of the repository. - -A full description of all that's possible with shapers is beyond the scope of -this doc. However, this example for shaping Zeek NDJSON is quite simple and -is described below. - -# Zeek Version/Configuration - -The fields and data types in the reference `shaper.zed` reflect the default -NDJSON-format logs output by Zeek releases up to the version number referenced -in the comments at the top of that file. They have been revisited periodically -as new Zeek versions have been released. - -Most changes we've observed in Zeek logs between versions have involved only the -addition of new fields. Because of this, we expect the shaper should be usable -as is for Zeek releases older than the one most recently tested, since fields -in the shaper not present in your environment would just be filled in with -`null` values. - -[Zeek v4.1.0](https://github.com/zeek/zeek/releases/tag/v4.1.0) is the first -release we've seen since starting to maintain this reference shaper where -field names for the same log type have _changed_ between releases. Because -of this, as shown below, the shaper includes `switch` logic that applies -different type definitions based on the observed field names that are known -to be specific to newer Zeek releases. - -All attempts will be made to update this reference shaper in a timely manner -as new Zeek versions are released. However, if you have modified your Zeek -installation with [packages](https://packages.zeek.org/) -or other customizations, or if you are using a [Corelight Sensor](https://corelight.com/products/appliance-sensors/) -that produces Zeek logs with many fields and logs beyond those found in open -source Zeek, the reference shaper will not cover all the fields in your logs. -[As described below](#zed-pipeline), the reference shaper will assign -inferred types to such additional fields. By exploring your data, you can then -iteratively enhance your shaper to match your environment. If you need -assistance, please speak up on our [public Slack](https://www.brimdata.io/join-slack/). - -# Reference Shaper Contents - -The reference `shaper.zed` may seem large, but ultimately it follows a -fairly simple pattern that repeats across the many [Zeek log types](https://docs.zeek.org/en/master/script-reference/log-files.html). - -## Leading Type Definitions - -The top three lines define types that are referenced further below in the main -portion of the Zed shaper. - -``` -type port=uint16; -type zenum=string; -type conn_id={orig_h:ip,orig_p:port,resp_h:ip,resp_p:port}; -``` -The `port` and `zenum` types are described further in the [Zed/Zeek Data Type Compatibility](Data-Type-Compatibility.md) -doc. The `conn_id` type will just save us from having to repeat these fields -individually in the many Zeek record types that contain an embedded `id` -record. - -## Default Type Definitions Per Zeek Log `_path` - -The bulk of this Zed shaper consists of detailed per-field data type -definitions for each record in the default set of NDJSON logs output by Zeek. -These type definitions reference the types we defined above, such as `port` -and `conn_id`. The syntax for defining primitive and complex types follows the -relevant sections of the [ZSON Format](../docs/formats/zson.md#3-the-zson-format) -specification. - -``` -... -type conn={_path:string,ts:time,uid:string,id:conn_id,proto:zenum,service:string,duration:duration,orig_bytes:uint64,resp_bytes:uint64,conn_state:string,local_orig:bool,local_resp:bool,missed_bytes:uint64,history:string,orig_pkts:uint64,orig_ip_bytes:uint64,resp_pkts:uint64,resp_ip_bytes:uint64,tunnel_parents:|[string]|,_write_ts:time}; -type dce_rpc={_path:string,ts:time,uid:string,id:conn_id,rtt:duration,named_pipe:string,endpoint:string,operation:string,_write_ts:time}; -... -``` - -> **Note:** See [the role of `_path` ](Reading-Zeek-Log-Formats.md#the-role-of-_path) -> for important details if you're using Zeek's built-in [ASCII logger](https://docs.zeek.org/en/current/scripts/base/frameworks/logging/writers/ascii.zeek.html) -> to generate NDJSON rather than the [JSON Streaming Logs](https://github.com/corelight/json-streaming-logs) package. - -## Version-Specific Type Definitions - -The next block of type definitions are exceptions for Zeek v4.1.0 where the -names of fields for certain log types have changed from prior releases. - -``` -type ssl_4_1_0={_path:string,ts:time,uid:string,id:conn_id,version:string,cipher:string,curve:string,server_name:string,resumed:bool,last_alert:string,next_protocol:string,established:bool,ssl_history:string,cert_chain_fps:[string],client_cert_chain_fps:[string],subject:string,issuer:string,client_subject:string,client_issuer:string,sni_matches_cert:bool,validation_status:string,_write_ts:time}; -type x509_4_1_0={_path:string,ts:time,fingerprint:string,certificate:{version:uint64,serial:string,subject:string,issuer:string,not_valid_before:time,not_valid_after:time,key_alg:string,sig_alg:string,key_type:string,key_length:uint64,exponent:string,curve:string},san:{dns:[string],uri:[string],email:[string],ip:[ip]},basic_constraints:{ca:bool,path_len:uint64},host_cert:bool,client_cert:bool,_write_ts:time}; -``` - -## Mapping From `_path` Values to Types - -The next section is just simple mapping from the string values typically found -in the Zeek `_path` field to the name of one of the types we defined above. - -``` -const schemas = |{ - "broker": broker, - "capture_loss": capture_loss, - "cluster": cluster, - "config": config, - "conn": conn, - "dce_rpc": dce_rpc, -... -``` - -## Zed Pipeline - -The Zed shaper ends with a pipeline that stitches together everything we've defined -so far. - -``` -put this := unflatten(this) | switch ( - _path=="ssl" has(ssl_history) => put this := shape(ssl_4_1_0); - _path=="x509" has(fingerprint) => put this := shape(x509_4_1_0); - default => put this := shape(schemas[_path]); -) -``` - -Picking this apart, it transforms reach record as it's being read, in three -steps: - -1. `unflatten()` reverses the Zeek NDJSON logger's "flattening" of nested - records, e.g., how it populates a field named `id.orig_h` rather than - creating a field `id` with sub-field `orig_h` inside it. Restoring the - original nesting now gives us the option to reference the record named `id` - in the Zed language and access the entire 4-tuple of values, but still - access the individual values using the same dotted syntax like `id.orig_h` - when needed. - -2. The `switch()` detects if fields specific to Zeek v4.1.0 are present for the - two log types for which the [version-specific type definitions](#version-specific-type-definitions) - should be applied. For all log lines and types other than these exceptions, - the [default type definitions](#default-type-definitions-per-zeek-log-_path) - are applied. - -3. Each `shape()` call applies an appropriate type definition based on the - nature of the incoming record. The logic of `shape()` includes: - - * For any fields referenced in the type definition that aren't present in - the input record, the field is added with a `null` value. (Note: This - could be performed separately via the `fill()` function.) - - * The data type of each field in the type definition is applied to the - field of that name in the input record. (Note: This could be performed - separately via the `cast()` function.) - - * The fields in the input record are ordered to match the order in which - they appear in the type definition. (Note: This could be performed - separately via the `order()` function.) - - Any fields that appear in the input record that are not present in the - type definition are kept and assigned an inferred data type. If you would - prefer to have such additional fields dropped (i.e., to maintain strict - adherence to the shape), append a call to the `crop()` function to the - Zed pipeline, e.g.: - - ``` - ... | put this := shape(schemas[_path]) | put this := crop(schemas[_path]) - ``` - - Open issues [zed/2585](https://github.com/brimdata/zed/issues/2585) and - [zed/2776](https://github.com/brimdata/zed/issues/2776) both track planned - future improvements to this part of Zed shapers. - -# Invoking the Shaper From `zq` - -A shaper is typically invoked via the `-I` option of `zq`. - -For example, if working in a directory containing many NDJSON logs, the -reference shaper can be applied to all the records they contain and -output them all in a single binary [ZNG](../docs/formats/zng.md) file as -follows: - -``` -zq -I shaper.zed *.log > /tmp/all.zng -``` - -If you wish to apply the shaper and then perform additional -operations on the richly-typed records, the Zed query on the command line -should begin with a `|`, as this appends it to the pipeline at the bottom of -the shaper from the included file. - -For example, to count Zeek `conn` records into CIDR-based buckets based on -originating IP address: - -``` -zq -I shaper.zed -f table '| count() by network_of(id.orig_h) | sort -r' conn.log -``` - -[zed/2584](https://github.com/brimdata/zed/issues/2584) tracks a planned -improvement for this use of `zq -I`. - -If you intend to frequently shape the same NDJSON data, you may want to create -an alias in your -shell to always invoke `zq` with the necessary `-I` flag pointing to the path -of your finalized shaper. [zed/1059](https://github.com/brimdata/zed/issues/1059) -tracks a planned enhancement to persist such settings within Zed itself rather -than relying on external mechanisms such as shell aliases. - -# Importing Shaped Data Into Zui - -If you wish to browse your shaped data with [Zui](https://github.com/brimdata/zui), -the best way to accomplish this at the moment would be to use `zq` to convert -it to ZNG [as shown above](#invoking-the-shaper-from-zq), then drag the ZNG -into Zui as you would any other log. An enhancement [zed/2695](https://github.com/brimdata/zed/issues/2695) -is planned that will soon make it possible to attach your shaper to a -Pool. This will allow you to drag the original NDJSON logs directly into the -Pool in Zui and have the shaping applied as the records are being committed to -the Pool. - -# Contact us! - -If you're having difficulty, interested in shaping other data sources, or -just have feedback, please join our [public Slack](https://www.brimdata.io/join-slack/) -and speak up or [open an issue](https://github.com/brimdata/zed/issues/new/choose). -Thanks! diff --git a/zeek/ztests/shape-zeek-ndjson.yaml b/zeek/ztests/shape-zeek-ndjson.yaml deleted file mode 100644 index 0c74d4e05a..0000000000 --- a/zeek/ztests/shape-zeek-ndjson.yaml +++ /dev/null @@ -1,30 +0,0 @@ -script: | - zq -Z -I shaper.zed - - -inputs: - - name: shaper.zed - source: ../shaper.zed - - name: stdin - data: | - {"_path":"weird","_write_ts":"2018-03-24T17:15:20.600843Z","ts":"2018-03-24T17:15:20.600843Z","uid":"C1zOivgBT6dBmknqk","id.orig_h":"10.47.1.152","id.orig_p":49562,"id.resp_h":"23.217.103.245","id.resp_p":80,"name":"TCP_ack_underflow_or_misorder","notice":false,"peer":"zeek"} - -outputs: - - name: stdout - data: | - { - _path: "weird", - ts: 2018-03-24T17:15:20.600843Z, - uid: "C1zOivgBT6dBmknqk", - id: { - orig_h: 10.47.1.152, - orig_p: 49562 (port=uint16), - resp_h: 23.217.103.245, - resp_p: 80 (port) - } (=conn_id), - name: "TCP_ack_underflow_or_misorder", - addl: null (string), - notice: false, - peer: "zeek", - source: null (string), - _write_ts: 2018-03-24T17:15:20.600843Z - } (=weird)