diff --git a/CHANGELOG.md b/CHANGELOG.md index ac5da4e2a8..120fc2e3a4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,7 +7,7 @@ * Update the [`grok` function docs](docs/language/functions/grok.md) with additional examples and guidance (#5243) * Update the [Lateral Subquery docs](docs/language/lateral-subqueries.md) with an emphasis on when primitive values or arrays are returned by [Lateral Expressions](docs/language/lateral-subqueries.md#lateral-expressions) (#5264) * The terms "pipeline" and "branch" are now used throughout the [Zed docs](docs/README.md) instead of "dataflow" and "leg" (#5272) -* Add docs for [`lake` output format](docs/commands/super.md#superdb-data-lake-metadata-output) and [`zed ls`](docs/commands/zed.md#ls) (#5187) +* Add docs for [`lake` output format](docs/commands/super.md#superdb-data-lake-metadata-output) and [`zed ls`](docs/commands/super-db.md#ls) (#5187) * Add docs for the [`top` operator](docs/language/operators/top.md) (#5276) * Add [`fluentd` integration docs](docs/integrations/fluentd.md) (#5190, #5195) * Add a [`strftime` function](docs/language/functions/strftime.md) to format `time` values into strings (#5197, #5204) @@ -19,7 +19,7 @@ ## v1.17.0 * Improve the performance of multi-pool searches (e.g., `from * | "MyFilter"`) (#5174) * Reduce the amount of memory consumed by the [`merge` operator](docs/language/operators/merge.md) and merge-dependent operations, such as compaction (#5171) -* Add the `-pool` flag to [`zed manage`](docs/commands/zed.md#manage) (#5164) +* Add the `-pool` flag to [`zed manage`](docs/commands/super-db.md#manage) (#5164) * Fix an issue where the lake API was not providing query descriptions for Zed programs that contain scopes (#5152) * Fix an issue where attempts to use the [`load` operator](docs/language/operators/load.md) in `zq` caused a panic (#5162) * Fix a parser issue with collisions between the names of [user-defined operators](docs/language/statements.md#operator-statements) and [functions](docs/language/statements.md#func-statements) and some built-in [operators](docs/language/operators/README.md) (#5161) @@ -52,20 +52,20 @@ * Fix an issue where math and [`join`](docs/language/operators/join.md) matches involving `float16` and `float32` types could yield incorrect results (#5086) ## v1.14.0 -* Add the `-manage` flag to [`zed serve`](docs/commands/zed.md#serve) to have the Zed service process initiate [maintenance tasks](docs/commands/zed.md#manage) on a regular interval (#5017) +* Add the `-manage` flag to [`zed serve`](docs/commands/super-db.md#serve) to have the Zed service process initiate [maintenance tasks](docs/commands/super-db.md#manage) on a regular interval (#5017) * Fix an issue where the Python client would not allow loading to a pool with `/` in its name (#5020) * Fix an issue where pools with KSUID-like names could not be accessed by name (#5019) * Fix a reference counting issue that could cause a Zed service panic (#5029, #5030) ## v1.13.0 -* Improve the error message when [`zed manage -monitor`](docs/commands/zed.md#manage) is attempted on a local lake (#4979) -* The [`zed serve`](docs/commands/zed.md#serve) log now includes version, storage root, and auth info at startup (#4988) -* Add [docs for the `zed manage` command](docs/commands/zed.md#manage) to compact data for improved performance (#4961) +* Improve the error message when [`zed manage -monitor`](docs/commands/super-db.md#manage) is attempted on a local lake (#4979) +* The [`zed serve`](docs/commands/super-db.md#serve) log now includes version, storage root, and auth info at startup (#4988) +* Add [docs for the `zed manage` command](docs/commands/super-db.md#manage) to compact data for improved performance (#4961) * Add the ability to [cast](docs/language/expressions.md#casts) to Zed's `type` type (#4980, #4985) * Add the ability to [`yield`](docs/language/operators/yield.md) a Zed `error` literal (#4998) * Fix an issue with accessing values inside complex literals (#4953) * Fix an issue where [cast](docs/language/expressions.md#casts) of an empty string to a `duration` value incorrectly yielded `0s` (#4965) -* Fix an issue where a [`zed vacuum`](docs/commands/zed.md#vacuum) on a large amount of data could crash the Zed service (#4974) +* Fix an issue where a [`zed vacuum`](docs/commands/super-db.md#vacuum) on a large amount of data could crash the Zed service (#4974) * Fix an issue where some IPv6 values of Zed's `net` type were not parsed correctly in Zed queries (#4992) * Fix an issue where output of certain union-typed values was not consistent (#4995) * Fix an issue where parsing of `type` literals inside of `type` literals was incorrectly permitted (#4996) @@ -89,7 +89,7 @@ * Fix an issue where loading and querying certain data caused a panic (#4877) ## v1.11.0 -* Introduce new logic for the `zed` CLI command to [locate the lake](docs/commands/zed.md#locating-the-lake) (#4758, #4787, #4811) +* Introduce new logic for the `zed` CLI command to [locate the lake](docs/commands/super-db.md#locating-the-lake) (#4758, #4787, #4811) * [Cast expressions](docs/language/expressions.md#casts) now behave more like function calls (#4805) * Reduce the amount of memory needed to store a Zed value (#4812) * Add support for unicode in keywords and identifiers (#4799, #4796) @@ -106,14 +106,14 @@ * Sorting is now performed automatically on [`join`](docs/language/operators/join.md) inputs when needed (explicit [`sort`](docs/language/operators/sort.md) no longer required) (#4770) * Various query performance improvements (#4736, #4737, #4739, #4740, #4783, #4785) * [`join`](docs/language/operators/join.md) now works correctly when data inputs are sorted in descending order (#4767) -* Reduce memory consumption during [`delete -where`](docs/commands/zed.md#delete) operations (#4734) +* Reduce memory consumption during [`delete -where`](docs/commands/super-db.md#delete) operations (#4734) * Fix a `null`-handling issue that caused incorrect query results after pool compaction (#4735, #4753) * Allow writing of vectors when compacting objects in a pool (#4756, #4757) * Ensure query runtime errors are logged and made available through a new [Query Status](docs/lake/api.md#query-status) lake API endpoint (#4763, #4765, #4766, #4769) * Add an example to the [`where` docs](docs/language/operators/where.md) showing inverse containment logic (#4761) * Add an example to the [`cut` docs](docs/language/operators/cut.md) that includes setting a default value for a field (#4773, #4776) * Boolean `not` and `!` now both work the same in [expressions](docs/language/expressions.md#logic) and [search expressions](docs/language/search-expressions.md#boolean-logic) (#4768) -* The [`zed` command](docs/commands/zed.md) now returns a hint mentioning [`init`](docs/commands/zed.md#init) if no lake exists at the expected path (#4786) +* The [`zed` command](docs/commands/super-db.md) now returns a hint mentioning [`init`](docs/commands/super-db.md#init) if no lake exists at the expected path (#4786) ## v1.9.0 * The [Zed Language Overview docs](docs/language/overview.md) have been split into multiple sections (#4576) * Add support for [user-defined operators](docs/language/statements.md#operator-statements) (#4417, #4635, #4646, #4644, #4663, #4674, #4698, #4702, #4716) @@ -122,7 +122,7 @@ * The [shaping docs](docs/language/shaping.md) have been expanded with a new section on [error handling](docs/language/shaping.md#error-handling) (#4686) * `zq` no longer attaches positional command line file inputs directly to [`join`](docs/language/operators/join.md) inputs (use [`file`](docs/language/operators/file.md) within a Zed program instead) (#4689) * [Zeek](https://zeek.org/)-related docs have been moved to the Integrations area of the [Zed docs site](https://zed.brimdata.io/docs) (#4694, #4696) -* [`zed create`](docs/commands/zed.md#create) now has a `-use` flag to set the newly-created pool as the default pool for future operations (#4656) +* [`zed create`](docs/commands/super-db.md#create) now has a `-use` flag to set the newly-created pool as the default pool for future operations (#4656) * Fix an issue where the [Zed Python client](docs/libraries/python.md) was incorrectly returning `False` for all `bool` values (#4706) * Fix an issue where the `!=` operator was not returning correct results when comparing certain types (#4704) @@ -134,7 +134,7 @@ ## v1.8.0 * Improve [`sort`](docs/language/operators/sort.md) performance for `duration` and `time` types (#4469) * Improve performance and reduce memory used by `zed load` and `sort` on multi-GB inputs (#4476, #4484) -* Fix an issue where [meta-queries](docs/commands/zed.md#meta-queries) were incorrectly returning results (#4474) +* Fix an issue where [meta-queries](docs/commands/super-db.md#meta-queries) were incorrectly returning results (#4474) * The [`join` operator](docs/language/operators/join.md) now has an additional syntax that uses subqueries, which is more reminiscent of SQL (#4467, #4473, #4492, #4502) * Improve performance when a Zed lake scan is not order sensitive (#4526) * The [lake API documentation](docs/lake/api.md) now includes both request & response MIME types (#4512) @@ -146,7 +146,7 @@ * Fix an issue where certain ZNG files could not be read and caused a `control` error (#4579) * Fix an issue where `zed serve` would exit if it tried to write to a closed socket (#4587) * Improve JSON output for Zed [maps](docs/formats/zed.md#24-map) (#4589) -* Add the [`zed vacuum`](docs/commands/zed.md#215-vacuum) command (#4577, #4598, #4600) +* Add the [`zed vacuum`](docs/commands/super-db.md#215-vacuum) command (#4577, #4598, #4600) ## v1.7.0 * Add [`regexp_replace()`](docs/language/functions/regexp_replace.md) function for replacing regular expression matches in a string (#4435, #4449) @@ -168,7 +168,7 @@ * Allow loading and responses in [VNG](docs/formats/csup.md) format over the lake API (#4345) * Fix an issue where [record spread expressions](docs/language/expressions.md#record-expressions) could cause a crash (#4359) * Fix an issue where the Zed service `/version` endpoint returned "unknown" if it had been built via `go install` (#4371) -* Branch-level [meta-queries](docs/commands/zed.md#meta-queries) on the `main` branch no longer require an explicit `@main` reference (#4377, #4394) +* Branch-level [meta-queries](docs/commands/super-db.md#meta-queries) on the `main` branch no longer require an explicit `@main` reference (#4377, #4394) * Add `-defaultfmt` flag to `zed serve` to specify the lake API's default response format (#4379, #4396) * Zed queries now appear in the lake log when `zed serve` is run at `-log.level debug` (#4385) * Fix an issue where elements of complex [named types](docs/formats/zed.md#3-named-type) could not be accessed (#4391) @@ -203,7 +203,7 @@ * Improve handling of errors during [shaping](docs/language/shaping.md) (#4067, #4069) * Allow use of a pool name regexp/glob pattern with the [`from` operator](docs/language/operators/from.md) (#4072, #4075) * Add [`levenshtein()` function](docs/language/functions/levenshtein.md) for fuzzy string matching (#4104) -* Allow use of any filter with [`zed delete -where`](docs/commands/zed.md#24-delete) (#4100, #4124, #4126, #4125, #4127) +* Allow use of any filter with [`zed delete -where`](docs/commands/super-db.md#24-delete) (#4100, #4124, #4126, #4125, #4127) * Add [`regexp()`](docs/language/functions/regexp.md) function for regular expression searches and capture groups (#4145, #4158) * Add [`coalesce()`](docs/language/functions/coalesce.md) function for locating non-null/non-error values (#4172) * Add `line` format for sourcing newline-delimited input as strings (#4175) @@ -235,7 +235,7 @@ * Allow conversion of time values to other numeric types (#3816) * Remove scaling from duration and time conversions (#3809) * Add [`over` expressions](docs/language/lateral-subqueries.md#lateral-expressions) (#3797) -* Add `-where` flag to [`zed delete`](docs/commands/zed.md#24-delete) (#3791) +* Add `-where` flag to [`zed delete`](docs/commands/super-db.md#24-delete) (#3791) * Allow base62 object IDs in lake API request bodies (#3783) * Remove `let` operator and [`over` operator](docs/language/operators/over.md)'s `as` clause (#3785) @@ -243,7 +243,7 @@ * Comprehensive [documentation](docs/README.md) * Substantial improvements to the [Zed language](docs/language/README.md) -* Revamped [`zed` command](docs/commands/zed.md) +* Revamped [`zed` command](docs/commands/super-db.md) * New Zed lake format (see #3634 for a migration script) * New version of the [ZNG format](docs/formats/bsup.md) (with read-only support for the previous version) * New version of the [ZSON format](docs/formats/jsup.md) @@ -324,7 +324,7 @@ As you can see below, there's been many changes since the last Zed GA release! Highlights include: * The introduction of Zed lakes for data storage, which include powerful - Git-like branching. See the [Zed lake README](docs/commands/zed.md) + Git-like branching. See the [Zed lake README](docs/commands/super-db.md) for details. * Enhancements to the Zed language to unify search and expression syntax, introduce new operators and functions for data exploration and shaping, and @@ -414,7 +414,7 @@ questions. * `zq` now reads its inputs sequentially rather than the prior merged behavior (#2492) * Extend the `len()` function to return the number of fields in a record (#2494) * Remove the `-E` flag in `zed` commands that displayed `time` values as epoch (#2495) -* Add the [Zed lake design](docs/commands/zed.md) README document (#2500, #2569, #2595, #2781, #2940, #3014, #3034, #3035) +* Add the [Zed lake design](docs/commands/super-db.md) README document (#2500, #2569, #2595, #2781, #2940, #3014, #3034, #3035) * Fix an issue where escaping quotes caused a parse error (#2510) * Fix an issue where multiple ZSON type definitions would be output when only the first was needed (#2511) * Use less buffer when decoding ZSON (#2515) diff --git a/docs/README.md b/docs/README.md index cdcb77dd28..88447c6b0c 100644 --- a/docs/README.md +++ b/docs/README.md @@ -30,7 +30,7 @@ packaged up in the easy-to-understand [SuperPipe language](language/README.md). While `super` and its accompanying data formats are production quality, the project's -[SuperDB data lake](commands/zed.md) is a bit [earlier in development](commands/zed.md#status). +[SuperDB data lake](commands/super-db.md) is a bit [earlier in development](commands/super-db.md#status). ## Terminology @@ -52,8 +52,8 @@ data transformation to _shape_ the input data into the desired set of organizing super-structured data types called "shapes", which are traditionally called _schemas_ in relational systems but are much more flexible in SuperDB. -* A [SuperDB data lake](commands/zed.md) is a collection of super-structured data stored -across one or more [data pools](commands/zed.md#data-pools) with ACID commit semantics and +* A [SuperDB data lake](commands/super-db.md) is a collection of super-structured data stored +across one or more [data pools](commands/super-db.md#data-pools) with ACID commit semantics and accessed via a [Git](https://git-scm.com/)-like API. ## Digging Deeper @@ -63,7 +63,7 @@ is the best way to learn about `super` in depth. All of its examples use `super` commands run on the command line. Run `super -h` for a list of command options and online help. -The [`super db` documentation](commands/zed.md) +The [`super db` documentation](commands/super-db.md) is the best way to learn about the SuperDB data lake. All of its examples use `super db` commands run on the command line. Run `super db -h` or `-h` with any subcommand for a list of command options @@ -93,7 +93,7 @@ or other third-party services to interpret the lake data. Once copied, a new service can be instantiated by pointing a `super db serve` at the copy of the lake. -Functionality like [data compaction](commands/zed.md#manage) and retention are all API-driven. +Functionality like [data compaction](commands/super-db.md#manage) and retention are all API-driven. Bite-sized components are unified by the super-structured data, usually in the SUPZ format: * All lake meta-data is available via meta-queries. diff --git a/docs/commands/README.md b/docs/commands/README.md index 90a1e5c999..c4eaa41682 100644 --- a/docs/commands/README.md +++ b/docs/commands/README.md @@ -3,7 +3,7 @@ The [`super` command](super.md) is used to execute command-line queries on inputs from files, HTTP URLs, or [S3](../integrations/amazon-s3.md). -The [`super db` sub-commands](zed.md) are for creating, configuring, ingesting +The [`super db` sub-commands](super-db.md) are for creating, configuring, ingesting into, querying, and orchestrating SuperDB data lakes. These sub-commands are organized into further subcommands like the familiar command patterns of `docker` or `kubectl`. diff --git a/docs/commands/zed.md b/docs/commands/super-db.md similarity index 69% rename from docs/commands/zed.md rename to docs/commands/super-db.md index 21cd64ebdc..a2237260ab 100644 --- a/docs/commands/zed.md +++ b/docs/commands/super-db.md @@ -1,27 +1,27 @@ --- sidebar_position: 2 -sidebar_label: zed +sidebar_label: super db --- -# zed +# `super db` -> **TL;DR** `zed` is a command-line tool to manage and query Zed data lakes. -> You can import data from a variety of formats and `zed` will automatically -> commit the data in the Zed data model's [super-structured](../formats/README.md) +> **TL;DR** `super db` is a sub-command of `super` to manage and query SuperDB data lakes. +> You can import data from a variety of formats and it will automatically +> be committed in [super-structured](../formats/README.md) > format, providing full fidelity of the original format and the ability > to reconstruct the original data without loss of information. > -> Zed lakes provide an easy-to-use substrate for data discovery, preparation, +> SuperDB data lakes provide an easy-to-use substrate for data discovery, preparation, > and transformation as well as serving as a queryable and searchable store > for super-structured data both for online and archive use cases.

:::tip Status -While [`super`](super.md) and the [Zed formats](../formats/README.md) -are production quality, the Zed lake is still fairly early in development +While [`super`](super.md) and its accompanying [formats](../formats/README.md) +are production quality, the SuperDB data lake is still fairly early in development and alpha quality. -That said, Zed lakes can be utilized quite effectively at small scale, +That said, SuperDB data lakes can be utilized quite effectively at small scale, or at larger scales when scripted automation is deployed to manage the lake's data layout via the [lake API](../lake/api.md). @@ -31,10 +31,10 @@ Enhanced scalability with self-tuning configuration is under development. ## The Lake Model -A Zed lake is a cloud-native arrangement of data, optimized for search, +A SuperDB data lake is a cloud-native arrangement of data, optimized for search, analytics, ETL, data discovery, and data preparation at scale based on data represented in accordance -with the [Zed data model](../formats/zed.md). +with the [super data model](../formats/zed.md). A lake is organized into a collection of data pools forming a single administrative domain. The current implementation supports @@ -42,7 +42,7 @@ ACID append and delete semantics at the commit level while we have plans to support CRUD updates at the primary-key level in the near future. -The semantics of a Zed lake loosely follows the nomenclature and +The semantics of a SuperDB data lake loosely follows the nomenclature and design patterns of [`git`](https://git-scm.com/). In this approach, * a _lake_ is like a GitHub organization, * a _pool_ is like a `git` repository, @@ -50,37 +50,37 @@ design patterns of [`git`](https://git-scm.com/). In this approach, * the _use_ command is like a `git checkout`, and * the _load_ command is like a `git add/commit/push`. -A core theme of the Zed lake design is _ergonomics_. Given the Git metaphor, -our goal here is that the Zed lake tooling be as easy and familiar as Git is +A core theme of the SuperDB data lake design is _ergonomics_. Given the Git metaphor, +our goal here is that the lake tooling be as easy and familiar as Git is to a technical user. -Since Zed lakes are built around the Zed data model, +Since SuperDB data lakes are built around the super data model, getting different kinds of data into and out of a lake is easy. There is no need to define schemas or tables and then fit semi-structured data into schemas before loading data into a lake. -And because Zed supports a large family of formats and the load endpoint +And because SuperDB supports a large family of formats and the load endpoint automatically detects most formats, it's easy to just load data into a lake without thinking about how to convert it into the right format. ### CLI-First Approach -The Zed project has taken a _CLI-first approach_ to designing and implementing +The SuperDB project has taken a _CLI-first approach_ to designing and implementing the system. Any time a new piece of functionality is added to the lake, -it is first implemented as a `zed` command. This is particularly convenient +it is first implemented as a `super db` command. This is particularly convenient for testing and continuous integration as well as providing intuitive, bite-sized chunks for learning how the system works and how the different components come together. While the CLI-first approach provides these benefits, all of the functionality is also exposed through [an API](../lake/api.md) to -a Zed service. Many use cases involve an application like -[Zui](https://zui.brimdata.io/) or a +a lake service. Many use cases involve an application like +[SuperDB Desktop](https://zui.brimdata.io/) or a programming environment like Python/Pandas interacting -with the service API in place of direct use with the `zed` command. +with the service API in place of direct use with `super db`. ### Storage Layer -The Zed lake storage model is designed to leverage modern cloud object stores +The lake storage model is designed to leverage modern cloud object stores and separates compute from storage. A lake is entirely defined by a collection of cloud objects stored @@ -90,7 +90,7 @@ and so forth is stored as cloud objects inside of the lake. There is no need to set up and manage an auxiliary metadata store. Data is arranged in a lake as a set of pools, which are comprised of one -or more branches, which consist of a sequence of data commit objects +or more branches, which consist of a sequence of data [commit objects](#commit-objects) that point to cloud data objects. Cloud objects and commits are immutable and named with globally unique IDs, @@ -109,7 +109,7 @@ at a specific point in time. While this commit model may sound heavyweight, excellent live ingest performance can be achieved by micro-batching commits. -Because the Zed lake represents all state transitions with immutable objects, +Because the lake represents all state transitions with immutable objects, the caching of any cloud object (or byte ranges of cloud objects) is easy and effective since a cached object is never invalid. This design makes backup/restore, data migration, archive, and @@ -117,30 +117,30 @@ replication easy to support and deploy. The cloud objects that comprise a lake, e.g., data objects, commit history, transaction journals, partial aggregations, etc., -are stored as Zed data, i.e., either as [row-based Super Binary](../formats/bsup.md) +are stored as super-structured data, i.e., either as [row-based Super Binary](../formats/bsup.md) or [Super Columnar](../formats/csup.md). This makes introspection of the lake structure straightforward as many key lake data structures can be queried with metadata queries and presented -to a client as Zed data for further processing by downstream tooling. +to a client for further processing by downstream tooling. -Zed's implementation also includes a storage abstraction that maps the cloud object -model onto a file system so that Zed lakes can also be deployed on standard file systems. +The implementation also includes a storage abstraction that maps the cloud object +model onto a file system so that lakes can also be deployed on standard file systems. -### Zed Command Personalities +### Command Personalities -The `zed` command provides a single command-line interface to Zed lakes, but -different personalities are taken on by `zed` depending on the particular +The `super db` command provides a single command-line interface to SuperDB data lakes, but +different personalities are taken on by `super db` depending on the particular sub-command executed and the [lake location](#locating-the-lake). -To this end, `zed` can take on one of three personalities: +To this end, `super db` can take on one of three personalities: * _Direct Access_ - When the lake is a storage path (`file` or `s3` URI), -then the `zed` commands (except for `serve`) all operate directly on the +then the `super db` commands (except for `serve`) all operate directly on the lake located at that path. * _Client Personality_ - When the lake is an HTTP or HTTPS URL, then the -lake is presumed to be a Zed lake service endpoint and the client +lake is presumed to be a service endpoint and the client commands are directed to the service managing the lake. -* _Server Personality_ - When the [`zed serve`](#serve) command is executed, then +* _Server Personality_ - When the [`super db serve`](#serve) command is executed, then the personality is always the server personality and the lake must be a storage path. This command initiates a continuous server process that serves client requests for the lake at the configured storage path. @@ -149,41 +149,43 @@ Note that a storage path on the file system may be specified either as a fully qualified file URI of the form `file://` or be a standard file system path, relative or absolute, e.g., `/lakes/test`. -Concurrent access to any Zed lake storage, of course, preserves -data consistency. You can run multiple `zed serve` processes while also -running any `zed` lake command all pointing at the same storage endpoint +Concurrent access to any lake storage, of course, preserves +data consistency. You can run multiple `super db serve` processes while also +running any `super db` lake command all pointing at the same storage endpoint and the lake's data footprint will always remain consistent as the endpoints -all adhere to the consistency semantics of the Zed lake. - -> One caveat here: data consistency is not fully implemented yet for -> the S3 endpoint so only single-node access to S3 is available right now, -> though support for multi-node access is forthcoming. -> For a shared file system, the close-to-open cache consistency -> semantics of NFS should provide the necessary consistency guarantees needed by -> a Zed lake though this has not been tested. Multi-process, single-node -> access to a local file system has been thoroughly tested and should be -> deemed reliable, i.e., you can run a direct-access instance of `zed` alongside -> a server instance of `zed` on the same file system and data consistency will -> be maintained. +all adhere to the consistency semantics of the lake. + +:::tip caveat +Data consistency is not fully implemented yet for +the S3 endpoint so only single-node access to S3 is available right now, +though support for multi-node access is forthcoming. +For a shared file system, the close-to-open cache consistency +semantics of [NFS](https://en.wikipedia.org/wiki/Network_File_System) should provide the necessary consistency guarantees needed by +the lake though this has not been tested. Multi-process, single-node +access to a local file system has been thoroughly tested and should be +deemed reliable, i.e., you can run a direct-access instance of `super db` alongside +a server instance of `super db` on the same file system and data consistency will +be maintained. +::: ### Locating the Lake -At times you may want the Zed CLI tools to access the same lake storage -used by other tools such as [Zui](https://zui.brimdata.io/). To help +At times you may want `super db` commands to access the same lake storage +used by other tools such as [SuperDB Desktop](https://zui.brimdata.io/). To help enable this by default while allowing for separate lake storage when desired, -`zed` checks each of the following in order to attempt to locate an existing +`super db` checks each of the following in order to attempt to locate an existing lake. 1. The contents of the `-lake` option (if specified) -2. The contents of the `ZED_LAKE` environment variable (if defined) -3. A Zed lake service running locally at `http://localhost:9867` (if a socket +2. The contents of the `SUPER_DB_LAKE` environment variable (if defined) +3. A lake service running locally at `http://localhost:9867` (if a socket is listening at that port) -4. A `zed` subdirectory below a path in the +4. A `super` subdirectory below a path in the [`XDG_DATA_HOME`](https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html) environment variable (if defined) 5. A default file system location based on detected OS platform: - - `%LOCALAPPDATA%\zed` on Windows - - `$HOME/.local/share/zed` on Linux and macOS + - `%LOCALAPPDATA%\super` on Windows + - `$HOME/.local/share/super` on Linux and macOS ### Data Pools @@ -191,8 +193,8 @@ A lake is made up of _data pools_, which are like "collections" in NoSQL document stores. Pools may have one or more branches and every pool always has a branch called `main`. -A pool is created with the [create command](#create) -and a branch of a pool is created with the [branch command](#branch). +A pool is created with the [`create` command](#create) +and a branch of a pool is created with the [`branch` command](#branch). A pool name can be any valid UTF-8 string and is allocated a unique ID when created. The pool can be referred to by its name or by its ID. @@ -203,14 +205,16 @@ A pool may be renamed but the unique ID is always fixed. Data is added into a pool in atomic units called _commit objects_. Each commit object is assigned a global ID. -Similar to Git, Zed commit objects are arranged into a tree and +Similar to Git, commit objects are arranged into a tree and represent the entire commit history of the lake. -> Technically speaking, Git can merge from multiple parents and thus +:::tip note +Technically speaking, Git can merge from multiple parents and thus Git commits form a directed acyclic graph instead of a tree; -Zed does not currently support multiple parents in the commit object history. +SuperDB does not currently support multiple parents in the commit object history. +::: -A branch is simply a named pointer to a commit object in the Zed lake +A branch is simply a named pointer to a commit object in the lake and like a pool, a branch name can be any valid UTF-8 string. Consistent updates to a branch are made by writing a new commit object that points to the previous tip of the branch and updating the branch to point at @@ -220,7 +224,7 @@ commit object's parent); if the constraint is violated, then the transaction is aborted. The _working branch_ of a pool may be selected on any command with the `-use` option -or may be persisted across commands with the [use command](#use) so that +or may be persisted across commands with the [`use` command](#use) so that `-use` does not have to be specified on each command-line. For interactive workflows, the `use` command is convenient but for automated workflows in scripts, it is good practice to explicitly specify the branch in each @@ -228,7 +232,7 @@ command invocation with the `-use` option. #### Commitish -Many `zed` commands operate with respect to a commit object. +Many `super db` commands operate with respect to a commit object. While commit objects are always referenceable by their commit ID, it is also convenient to refer to the commit object at the tip of a branch. @@ -240,7 +244,7 @@ A commitish is always relative to the pool and has the form: where `` is a pool name or pool ID, `` is a commit object ID, and `` is a branch name. -In particular, the working branch set by the [use command](#use) is a commitish. +In particular, the working branch set by the [`use` command](#use) is a commitish. A commitish may be abbreviated in several ways where the missing detail is obtained from the working-branch commitish, e.g., @@ -259,7 +263,7 @@ which is the sort key for all data stored in the lake. Different data pools can have different pool keys but all of the data in a pool must have the same pool key. -As pool data is often comprised of Zed records (analogous to JSON objects), +As pool data is often comprised of [records](../formats/zed.md#21-record) (analogous to JSON objects), the pool key is typically a field of the stored records. When pool data is not structured as records/objects (e.g., scalar or arrays or other non-record types), then the pool key would typically be configured @@ -270,8 +274,10 @@ key. For example, on a pool with pool key `ts`, the query `ts == 100` will be optimized to scan only the data objects where the value `100` could be present. -> The pool key will also serve as the primary key for the forthcoming -> CRUD semantics. +:::tip note +The pool key will also serve as the primary key for the forthcoming +CRUD semantics. +::: A pool also has a configured sort order, either ascending or descending and data is organized in the pool in accordance with this order. @@ -290,21 +296,21 @@ optimize scans over such data is impaired. Because commits are transactional and immutable, a query sees its entire data scan as a fixed "snapshot" with respect to the -commit history. In fact, Zed's [from operator](../language/operators/from.md) +commit history. In fact, the [`from` operator](../language/operators/from.md) allows a commit object to be specified with the `@` suffix to a pool reference, e.g., ``` -zed query 'from logs@1tRxi7zjT7oKxCBwwZ0rbaiLRxb |> ...' +super db query 'from logs@1tRxi7zjT7oKxCBwwZ0rbaiLRxb | ...' ``` In this way, a query can time-travel through the commit history. As long as the -underlying data has not been deleted, arbitrarily old snapshots of the Zed +underlying data has not been deleted, arbitrarily old snapshots of the lake can be easily queried. -If a writer commits data after and while a reader is scanning, then the reader +If a writer commits data after or while a reader is scanning, then the reader does not see the new data since it's scanning the snapshot that existed before these new writes occurred. -Also, arbitrary metadata can be committed to the log as described below, +Also, arbitrary metadata can be committed to the log [as described below](#load), e.g., to associate derived analytics to a specific journal commit point potentially across different data pools in a transactionally consistent fashion. @@ -321,12 +327,14 @@ using that pool's "branches log" in a similar fashion, then its corresponding commit object can be used to construct the data of that branch at that past point in time. - > Note that time travel using timestamps is a forthcoming feature. +:::tip note +Time travel using timestamps is a forthcoming feature. +::: -## Zed Commands +## `super db` Commands -The `zed` command is structured as a primary command -consisting of a large number of interrelated sub-commands, similar to the +While `super db` is itself a sub-command of [`super`](super.md), it invokes +a large number of interrelated sub-commands, similar to the [`docker`](https://docs.docker.com/engine/reference/commandline/cli/) or [`kubectl`](https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands) commands. @@ -334,10 +342,10 @@ commands. The following sections describe each of the available commands and highlight some key options. Built-in help shows the commands and their options: -* `zed -h` with no args displays a list of `zed` commands. -* `zed command -h`, where `command` is a sub-command, displays help +* `super db -h` with no args displays a list of `super db` commands. +* `super db command -h`, where `command` is a sub-command, displays help for that sub-command. -* `zed command sub-command -h` displays help for a sub-command of a +* `super db command sub-command -h` displays help for a sub-command of a sub-command and so forth. By default, commands that display lake metadata (e.g., [`log`](#log) or @@ -347,15 +355,16 @@ format. However, the `-f` option can be used to specify any supported ### Auth ``` -zed auth login|logout|method|verify +super db auth login|logout|method|verify ``` -Access to a Zed lake can be secured with [Auth0 authentication](https://auth0.com/). +Access to a lake can be secured with [Auth0 authentication](https://auth0.com/). +A [guide](../integrations/zed-lake-auth.md) is available with example configurations. Please reach out to us on our [community Slack](https://www.brimdata.io/join-slack/) -if you'd like help setting this up and trying it out. +if you have feedback on your experience or need additional help. ### Branch ``` -zed branch [options] [name] +super db branch [options] [name] ``` The `branch` command creates a branch with the name `name` that points to the tip of the working branch or, if the `name` argument is not provided, @@ -363,7 +372,7 @@ lists the existing branches of the selected pool. For example, this branch command ``` -zed branch -use logs@main staging +super db branch -use logs@main staging ``` creates a new branch called "staging" in pool "logs", which points to the same commit object as the "main" branch. Once created, commits @@ -374,25 +383,25 @@ at any time. Supposing the `main` branch of `logs` was already the working branch, then you could create the new branch called "staging" by simply saying ``` -zed branch staging +super db branch staging ``` Likewise, you can delete a branch with `-d`: ``` -zed branch -d staging +super db branch -d staging ``` and list the branches as follows: ``` -zed branch +super db branch ``` ### Create ``` -zed create [-orderby key[,key...][:asc|:desc]] +super db create [-orderby key[,key...][:asc|:desc]] ``` The `create` command creates a new data pool with the given name, which may be any valid UTF-8 string. -The `-orderby` option indicates the pool key that is used to sort +The `-orderby` option indicates the [pool key](#pool-key) that is used to sort the data in lake, which may be in ascending or descending order. If a pool key is not specified, then it defaults to @@ -400,14 +409,16 @@ the [special value `this`](../language/pipeline-model.md#the-special-value-this) A newly created pool is initialized with a branch called `main`. -> Zed lakes can be used without thinking about branches. When referencing a pool without -> a branch, the tooling presumes the "main" branch as the default, and everything -> can be done on main without having to think about branching. +:::tip note +Lakes can be used without thinking about branches. When referencing a pool without +a branch, the tooling presumes the "main" branch as the default, and everything +can be done on main without having to think about branching. +::: ### Delete ``` -zed delete [options] [...] -zed delete [options] -where +super db delete [options] [...] +super db delete [options] -where ``` The `delete` command removes one or more data objects indicated by their ID from a pool. This command @@ -421,12 +432,12 @@ provided filter expression is true. The value provided to `-where` must be a single filter expression, e.g.: ``` -zed delete -where 'ts > 2022-10-05T17:20:00Z and ts < 2022-10-05T17:21:00Z' +super db delete -where 'ts > 2022-10-05T17:20:00Z and ts < 2022-10-05T17:21:00Z' ``` ### Drop ``` -zed drop [options] | +super db drop [options] | ``` The `drop` command deletes a pool and all of its constituent data. As this is a DANGER ZONE command, you must confirm that you want to delete @@ -435,7 +446,7 @@ without confirmation. ### Init ``` -zed init [path] +super db init [path] ``` A new lake is initialized with the `init` command. The `path` argument is a [storage path](#storage-layer) and is optional. If not present, the path @@ -448,18 +459,18 @@ storage path to create a new, empty lake at the specified path. ### Load ``` -zed load [options] input [input ...] +super db load [options] input [input ...] ``` The `load` command commits new data to a branch of a pool. -Run `zed load -h` for a list of command-line options. +Run `super db load -h` for a list of command-line options. Note that there is no need to define a schema or insert data into -a "table" as all Zed data is _self describing_ and can be queried in a +a "table" as all super-structured data is _self describing_ and can be queried in a schema-agnostic fashion. Data of any _shape_ can be stored in any pool and arbitrary data _shapes_ can coexist side by side. -As with `zq`, +As with [`super`](super.md), the [input arguments](super.md#usage) can be in any [supported format](super.md#input-formats) and the input format is auto-detected if `-i` is not provided. Likewise, @@ -467,8 +478,8 @@ the inputs may be URLs, in which case, the `load` command streams the data from a Web server or [S3](../integrations/amazon-s3.md) and into the lake. When data is loaded, it is broken up into objects of a target size determined -by the pool's `threshold` parameter (which defaults 500MiB but can be configured -when the pool is created). Each object is sorted by the pool key but +by the pool's `threshold` parameter (which defaults to 500MiB but can be configured +when the pool is created). Each object is sorted by the [pool key](#pool-key) but a sequence of objects is not guaranteed to be globally sorted. When lots of small or unsorted commits occur, data can be fragmented. The performance impact of fragmentation can be eliminated by regularly [compacting](#manage) @@ -476,7 +487,7 @@ pools. For example, this command ``` -zed load sample1.json sample2.bsup sample3.jsup +super db load sample1.json sample2.bsup sample3.jsup ``` loads files of varying formats in a single commit to the working branch. @@ -484,13 +495,13 @@ An alternative branch may be specified with a branch reference with the `-use` option, i.e., `@`. Supposing a branch called `live` existed, data can be committed into this branch as follows: ``` -zed load -use logs@live sample.bsup +super db load -use logs@live sample.bsup ``` Or, as mentioned above, you can set the default branch for the load command -via `use`: +via [`use`](#use): ``` -zed use logs@live -zed load sample.bsup +super db use logs@live +super db load sample.bsup ``` During a `load` operation, a commit is broken out into units called _data objects_ where a target object size is configured into the pool, @@ -503,11 +514,11 @@ Data added to a pool can arrive in any order with respect to the pool key. While each object is sorted before it is written, the collection of objects is generally not sorted. -Each load operation creates a single commit object, which includes: +Each load operation creates a single [commit object](#commit-objects), which includes: * an author and message string, * a timestamp computed by the server, and -* an optional metadata field of any Zed type expressed as a ZSON value. -This data has the Zed type signature: +* an optional metadata field of any type expressed as a Super JSON value. +This data has the type signature: ``` { author: string, @@ -519,40 +530,40 @@ This data has the Zed type signature: where `` is the type of any optionally attached metadata . For example, this command sets the `author` and `message` fields: ``` -zed load -user user@example.com -message "new version of prod dataset" ... +super db load -user user@example.com -message "new version of prod dataset" ... ``` -If these fields are not specified, then the Zed system will fill them in +If these fields are not specified, then the system will fill them in with the user obtained from the session and a message that is descriptive of the action. -The `date` field here is used by the Zed lake system to do time travel +The `date` field here is used by the lake system to do [time travel](#time-travel) through the branch and pool history, allowing you to see the state of branches at any time in their commit history. -Arbitrary metadata expressed as any [ZSON value](../formats/jsup.md) +Arbitrary metadata expressed as any [Super JSON value](../formats/jsup.md) may be attached to a commit via the `-meta` flag. This allows an application or user to transactionally commit metadata alongside committed data for any purpose. This approach allows external applications to implement arbitrary data provenance and audit capabilities by embedding custom metadata in the commit history. -Since commit objects are stored as Zed, the metadata can easily be -queried by running the `log -f bsup` to retrieve the log in ZNG format, +Since commit objects are stored as super-structured data, the metadata can easily be +queried by running the `log -f bsup` to retrieve the log in Super Binary format, for example, and using [`super`](super.md) to pull the metadata out as in: ``` -zed log -f bsup | zq 'has(meta) | yield {id,meta}' - +super db log -f bsup | super -c 'has(meta) | yield {id,meta}' - ``` ### Log ``` -zed log [options] [commitish] +super db log [options] [commitish] ``` -The `log` command, like `git log`, displays a history of the commit objects +The `log` command, like `git log`, displays a history of the [commit objects](#commit-objects) starting from any commit, expressed as a [commitish](#commitish). If no argument is given, the tip of the working branch is used. -Run `zed log -h` for a list of command-line options. +Run `super db log -h` for a list of command-line options. To understand the log contents, the `load` operation is actually decomposed into two steps under the covers: @@ -570,14 +581,16 @@ from the current pointer back through history to the first commit object. A commit object includes an optional author and message, along with a required timestamp, that is stored in the commit journal for reference. These values may -be specified as options to the `load` command, and are also available in the -API for automation. +be specified as options to the [`load`](#load) command, and are also available in the +[lake API](../lake/api.md) for automation. -> Note that the branchlog meta-query source is not yet implemented. +:::tip note +The branchlog meta-query source is not yet implemented. +::: ### Ls ``` -zed ls [options] [pool] +super db ls [options] [pool] ``` The `ls` command lists pools in a lake or branches in a pool. @@ -589,7 +602,7 @@ with the ID of their commit object, which points at the tip of each branch. ### Manage ``` -zed manage [options] +super db manage [options] ``` The `manage` command performs maintenance tasks on a lake. @@ -598,7 +611,7 @@ by reading data objects in a pool and writing their contents back to large, non-overlapping objects. If the `-monitor` option is specified and the lake is [located](#locating-the-lake) -via network connection, `zed manage` will run continuously and perform updates +via network connection, `super db manage` will run continuously and perform updates as needed. By default a check is performed once per minute to determine if updates are necessary. The `-interval` option may be used to specify an alternate check frequency in [duration format](../formats/jsup.md#23-primitive-values). @@ -621,7 +634,7 @@ tasks run at the specified interval by the service process. Data is merged from one branch into another with the `merge` command, e.g., ``` -zed merge -use logs@updates main +super db merge -use logs@updates main ``` where the `updates` branch is being merged into the `main` branch within the `logs` pool. @@ -637,25 +650,23 @@ parent. This Git-like behavior for a data lake provides a clean solution to the live ingest problem. -For example, data can be continuously ingested into a branch of main called `live` +For example, data can be continuously ingested into a branch of `main` called `live` and orchestration logic can periodically merge updates from branch `live` to branch `main`, possibly [compacting](#manage) data after the merge according to configured policies and logic. ### Query ``` -zed query [options] +super db query [options] ``` -The `query` command runs a Zed program with data from a lake as input. -A query typically begins with a [from operator](../language/operators/from.md) -indicating the pool and branch to use as input. If `from` is not present, then the -query reads from the working branch. +The `query` command runs a [SuperSQL](../language/README.md) query with data from a lake as input. +A query typically begins with a [`from` operator](../language/operators/from.md) +indicating the pool and branch to use as input. -The pool/branch names are specified with `from` at the beginning of the Zed -query. +The pool/branch names are specified with `from` in the query. -As with `zq`, the default output format is ZSON for -terminals and ZNG otherwise, though this can be overridden with +As with [`super`](super.md), the default output format is Super JSON for +terminals and Super Binary otherwise, though this can be overridden with `-f` to specify one of the various supported output formats. If a pool name is provided to `from` without a branch name, then branch @@ -665,54 +676,44 @@ This example reads every record from the full key range of the `logs` pool and sends the results to stdout. ``` -zed query 'from logs' +super db query 'from logs' ``` -We can narrow the span of the query by specifying a filter on the pool key: +We can narrow the span of the query by specifying a filter on the [pool key](#pool-key): ``` -zed query 'from logs |> ts >= 2018-03-24T17:36:30.090766Z and ts <= 2018-03-24T17:36:30.090758Z' +super db query 'from logs | ts >= 2018-03-24T17:36:30.090766Z and ts <= 2018-03-24T17:36:30.090758Z' ``` Filters on pool keys are efficiently implemented as the data is laid out according to the pool key and seek indexes keyed by the pool key are computed for each data object. -Lake queries also can refer to HEAD (i.e., the branch context set in the most -recent `use` command) either implicitly by omitting the `from` operator: -``` -zed query '*' -``` -or by referencing `HEAD`: -``` -zed query 'from HEAD' -``` - -When querying data to the ZNG output format, -output from a pool can be easily piped to other commands like `zq`, e.g., +When querying data to the [Super Binary](../formats/bsup.md) output format, +output from a pool can be easily piped to other commands like `super`, e.g., ``` -zed query -f bsup 'from logs' | zq -f table 'count() by field' - +super db query -f bsup 'from logs' | super -f table -c 'count() by field' - ``` Of course, it's even more efficient to run the query inside of the pool traversal like this: ``` -zed query -f table 'from logs |> count() by field' +super db query -f table 'from logs | count() by field' ``` By default, the `query` command scans pool data in pool-key order though -the Zed optimizer may, in general, reorder the scan to optimize searches, +the query optimizer may, in general, reorder the scan to optimize searches, aggregations, and joins. An order hint can be supplied to the `query` command to indicate to -the optimizer the desired processing order, but in general, `sort` operators +the optimizer the desired processing order, but in general, [`sort` operators](../language/operators/sort.md) should be used to guarantee any particular sort order. -Arbitrarily complex Zed queries can be executed over the lake in this fashion +Arbitrarily complex queries can be executed over the lake in this fashion and the planner can utilize cloud resources to parallelize and scale the -query over many parallel workers that simultaneously access the Zed lake data in +query over many parallel workers that simultaneously access the lake data in shared cloud storage (while also accessing locally- or cluster-cached copies of data). #### Meta-queries Commit history, metadata about data objects, lake and pool configuration, etc. can all be queried and -returned as Zed data, which in turn, can be fed into Zed analytics. +returned as super-structured data, which in turn, can be fed into analytics. This allows a very powerful approach to introspecting the structure of a lake making it easy to measure, tune, and adjust lake parameters to optimize layout for performance. @@ -728,53 +729,53 @@ There are three types of meta-queries: sources vary based on level. For example, a list of pools with configuration data can be obtained -in the ZSON format as follows: +in the Super JSON format as follows: ``` -zed query -Z "from :pools" +super db query -Z "from :pools" ``` This meta-query produces a list of branches in a pool called `logs`: ``` -zed query -Z "from logs:branches" +super db query -Z "from logs:branches" ``` -Since this is all just Zed, you can filter the results just like any query, +You can filter the results just like any query, e.g., to look for particular branch: ``` -zed query -Z "from logs:branches |> branch.name=='main'" +super db query -Z "from logs:branches | branch.name=='main'" ``` This meta-query produces a list of the data objects in the `live` branch of pool `logs`: ``` -zed query -Z "from logs@live:objects" +super db query -Z "from logs@live:objects" ``` -You can also pretty-print in human-readable form most of the metadata Zed records +You can also pretty-print in human-readable form most of the metadata records using the "lake" format, e.g., ``` -zed query -f lake "from logs@live:objects" +super db query -f lake "from logs@live:objects" ``` The `main` branch is queried by default if an explicit branch is not specified, e.g., ``` -zed query -f lake "from logs:objects" +super db query -f lake "from logs:objects" ``` ### Rename ``` -zed rename +super db rename ``` The `rename` command assigns a new name `` to an existing pool ``, which may be referenced by its ID or its previous name. ### Serve ``` -zed serve [options] +super db serve [options] ``` -The `serve` command implements Zed's server personality to service requests -from instances of Zed's client [personality](#zed-command-personalities). -It listens for Zed lake API requests on the interface and port +The `serve` command implements the [server personality](#command-personalities) to service requests +from instances of the client personality. +It listens for [lake API](../lake/api.md) requests on the interface and port specified by the `-l` option, executes the requests, and returns results. The `-log.level` option controls log verbosity. Available levels, ordered @@ -788,14 +789,14 @@ normally performed via the separate [`manage`](#manage) command. ### Use ``` -zed use [] +super db use [] ``` The `use` command sets the working branch to the indicated commitish. When run with no argument, it displays the working branch and [lake](#locating-the-lake). For example, ``` -zed use logs +super db use logs ``` provides a "pool-only" commitish that sets the working branch to `logs@main`. @@ -803,20 +804,20 @@ If a `@branch` or commit ID are given without a pool prefix, then the pool of the commitish previously in use is presumed. For example, if you are on `logs@main` then run this command: ``` -zed use @test +super db use @test ``` then the working branch is set to `logs@test`. To specify a branch in another pool, simply prepend the pool name to the desired branch: ``` -zed use otherpool@otherbranch +super db use otherpool@otherbranch ``` -This command stores the working branch in `$HOME/.zed_head`. +This command stores the working branch in `$HOME/.super_head`. ### Vacuum ``` -zed vacuum [options] +super db vacuum [options] ``` The `vacuum` command permanently removes underlying data objects that have diff --git a/docs/commands/super.md b/docs/commands/super.md index c0efbd3845..1faec39c6a 100644 --- a/docs/commands/super.md +++ b/docs/commands/super.md @@ -427,12 +427,12 @@ hello - greeting ### SuperDB Data Lake Metadata Output The `lake` format is used to pretty-print lake metadata, such as in -[`super db` sub-command](zed.md) outputs. Because it's `super db`'s default output format, +[`super db` sub-command](super-db.md) outputs. Because it's `super db`'s default output format, it's rare to request it explicitly via `-f`. However, since it's possible for -`super db` to [generate output in any supported format](zed.md#zed-commands), +`super db` to [generate output in any supported format](super-db.md#super-db-commands), the `lake` format is useful to reverse this. -For example, imagine you'd executed a [meta-query](zed.md#meta-queries) via +For example, imagine you'd executed a [meta-query](super-db.md#meta-queries) via `super db query -Z "from :pools"` and saved the output in this file `pools.jsup`. ```mdtest-input pools.jsup diff --git a/docs/integrations/fluentd.md b/docs/integrations/fluentd.md index 086ec0abca..7bea84d9f8 100644 --- a/docs/integrations/fluentd.md +++ b/docs/integrations/fluentd.md @@ -6,7 +6,7 @@ sidebar_label: Fluentd # Fluentd The [Fluentd](https://www.fluentd.org/) open source data collector can be used -to push log data to a [Zed lake](../commands/zed.md) in a continuous manner. +to push log data to a [SuperDB data lake](../commands/super-db.md) in a continuous manner. This allows for querying near-"live" event data to enable use cases such as dashboarding and alerting in addition to creating a long-running historical record for archiving and analytics. @@ -61,7 +61,7 @@ After making these changes, Zeek was started by running A binary [release package](https://github.com/brimdata/super/releases) of Zed executables compatible with our instance was downloaded and unpacked to a -directory in our `$PATH`, then the [lake service](https://zed.brimdata.io/docs/commands/zed#serve) +directory in our `$PATH`, then the [lake service](../commands/super-db.md#serve) was started with a specified storage path. ``` @@ -79,7 +79,7 @@ zed create zeek ``` The default settings when running `zed create` set the -[pool key](../commands/zed.md#pool-key) to the `ts` +[pool key](../commands/super-db.md#pool-key) to the `ts` field and sort the stored data in descending order by that key. This configuration is ideal for Zeek log data. @@ -88,7 +88,7 @@ The [Zui](https://zui.brimdata.io/) desktop application automatically starts a Zed lake service when it launches. Therefore if you are using Zui you can skip the first set of commands shown above. The pool can be created from Zui by clicking **+**, selecting **New Pool**, then entering `ts` for the -[pool key](../commands/zed.md#pool-key). +[pool key](../commands/super-db.md#pool-key). ::: ### Fluentd @@ -107,7 +107,7 @@ sudo gem install fluentd --no-doc The following simple `fluentd.conf` was used to watch the streamed Zeek logs for newly added lines and load each set of them to the pool in the Zed lake as -a separate [commit](../commands/zed.md#commit-objects). +a separate [commit](../commands/super-db.md#commit-objects). ``` @@ -345,7 +345,7 @@ which in our test environment produced ## Zed Lake Maintenance -The Zed lake stores the data for each [`load`](../commands/zed.md#load) +The lake stores the data for each [`load`](../commands/super-db.md#load) operation in a separate commit. If you observe the output of `zed log -use zeek-shaped` after several minutes, you will see many such commits have accumulated, which is a reflection of Fluentd frequently @@ -361,10 +361,10 @@ in storing the pool data across a smaller number of larger as data volumes increase. By default, even after compaction is performed, the granular commit history is -still maintained to allow for [time travel](../commands/zed.md#time-travel) +still maintained to allow for [time travel](../commands/super-db.md#time-travel) use cases. However, if time travel is not functionality you're likely to leverage, you can reduce the lake's storage footprint by periodically running -[`zed vacuum`](../commands/zed.md#vacuum). This will delete files from lake +[`zed vacuum`](../commands/super-db.md#vacuum). This will delete files from lake storage that contain the granular commits that have already been rolled into larger objects by compaction. diff --git a/docs/integrations/grafana.md b/docs/integrations/grafana.md index 351b541de1..230bebdae4 100644 --- a/docs/integrations/grafana.md +++ b/docs/integrations/grafana.md @@ -7,6 +7,6 @@ sidebar_label: Grafana A [data source plugin](https://grafana.com/grafana/plugins/?type=datasource) for [Grafana](https://grafana.com/) is available that enables plotting of -time-series data that's stored in [Zed lakes](../commands/zed.md). See the +time-series data that's stored in [SuperDB data lakes](../commands/super-db.md). See the README in the [grafana-zed-datasource repository](https://github.com/brimdata/grafana-zed-datasource) for details. diff --git a/docs/integrations/zed-lake-auth.md b/docs/integrations/zed-lake-auth.md index 3dca92f902..e1c432e499 100644 --- a/docs/integrations/zed-lake-auth.md +++ b/docs/integrations/zed-lake-auth.md @@ -5,12 +5,12 @@ sidebar_label: Authentication Configuration # Configuring Authentication for a Zed Lake Service -A [Zed lake service](../commands/zed.md#serve) may be configured to require +A [SuperDB data lake service](../commands/super-db.md#serve) may be configured to require user authentication to be accessed from clients such as the [Zui](https://zui.brimdata.io/) application, the -[`zed`](../commands/zed.md) CLI tools, or the -[Zed Python client](../libraries/python.md). This document describes a simple -[Auth0](https://auth0.com) configuration with accompanying `zed serve` flags +[`super db`](../commands/super.md) CLI commands, or the +[SuperDB Python client](../libraries/python.md). This document describes a simple +[Auth0](https://auth0.com) configuration with accompanying `super db serve` flags that can be used as a starting point for creating similar configurations in your own environment. diff --git a/docs/lake/api.md b/docs/lake/api.md index c60dd0b5af..cfefb8164b 100644 --- a/docs/lake/api.md +++ b/docs/lake/api.md @@ -263,10 +263,10 @@ On success, HTTP 204 is returned with no response payload. Create a commit that reflects the deletion of some data in the branch. The data to delete can be specified via a list of object IDs or -as a filter expression (see [limitations](../commands/zed.md#delete)). +as a filter expression (see [limitations](../commands/super-db.md#delete)). This simply removes the data from the branch without actually removing the -underlying data objects thereby allowing [time travel](../commands/zed.md#time-travel) to work in the face +underlying data objects thereby allowing [time travel](../commands/super-db.md#time-travel) to work in the face of deletes. Permanent removal of underlying data objects is handled by a separate [vacuum](#vacuum-pool) operation. @@ -281,7 +281,7 @@ POST /pool/{pool}/branch/{branch}/delete | pool | string | path | **Required.** ID of the pool. | | branch | string | path | **Required.** Name of branch. | | object_ids | [string] | body | Object IDs to be deleted. | -| where | string | body | Filter expression (see [limitations](../commands/zed.md#delete)). | +| where | string | body | Filter expression (see [limitations](../commands/super-db.md#delete)). | | Content-Type | string | header | [MIME type](#mime-types) of the request payload. | | Accept | string | header | Preferred [MIME type](#mime-types) of the response. | @@ -535,7 +535,7 @@ To receive successful (2xx) responses in a preferred format, include the MIME type of the format in the request's Accept HTTP header. If the Accept header is not specified, the service will return ZSON as the default response format. A different default response format can be specified by invoking the -`-defaultfmt` option when running [`zed serve`](../commands/zed.md#serve). +`-defaultfmt` option when running [`super db serve`](../commands/super-db.md#serve). For non-2xx responses, the content type of the response will be `application/json` or `text/plain`. diff --git a/docs/lake/format.md b/docs/lake/format.md index faee43f4b6..dcc7a93921 100644 --- a/docs/lake/format.md +++ b/docs/lake/format.md @@ -13,8 +13,8 @@ as we add new capabilities to the system. ## Introduction -To support the client-facing [Zed lake semantics](../commands/zed.md#the-lake-model) -implemented by the [`zed` command](../commands/zed.md), we are developing +To support the client-facing [SuperDB data lake semantics](../commands/super-db.md#the-lake-model) +implemented by the [`super db` command](../commands/super-db.md), we are developing an open specification for the Zed lake storage format described in this document. As we make progress on the Zed lake model, we will update this document as we go. diff --git a/docs/language/operators/file.md b/docs/language/operators/file.md index dc524093c8..b3fff30404 100644 --- a/docs/language/operators/file.md +++ b/docs/language/operators/file.md @@ -8,5 +8,5 @@ :::tip Note The `file` shorthand is exclusively for working with inputs to -[`super`](../../commands/super.md) and is not available for use with [Zed lakes](../../commands/zed.md). +[`super`](../../commands/super.md) and is not available for use with [SuperDB data lakes](../../commands/super-db.md). ::: diff --git a/docs/language/operators/from.md b/docs/language/operators/from.md index c139dd6f83..2b4a808771 100644 --- a/docs/language/operators/from.md +++ b/docs/language/operators/from.md @@ -22,7 +22,7 @@ from ( The `from` operator identifies one or more data sources and transmits their data to its output. A data source can be -* the name of a data pool in a SuperDB lake, with optional [commitish](../../commands/zed.md#commitish); +* the name of a data pool in a SuperDB lake, with optional [commitish](../../commands/super-db.md#commitish); * the names of multiple data pools, expressed as a [regular expression](../search-expressions.md#regular-expressions) or [glob](../search-expressions.md#globs) pattern; * a path to a file; * an HTTP, HTTPS, or S3 URI; or @@ -33,7 +33,7 @@ File paths and URIs may be followed by an optional [format](../../commands/super ::: Sourcing data from pools is only possible when querying a lake, such as -via the [`super db` command](../../commands/zed.md) or +via the [`super db` command](../../commands/super-db.md) or [SuperDB lake API](../../lake/api.md). Sourcing data from files is only possible with the [`super` command](../../commands/super.md). diff --git a/docs/language/operators/load.md b/docs/language/operators/load.md index bfd7b3a576..1f223edd53 100644 --- a/docs/language/operators/load.md +++ b/docs/language/operators/load.md @@ -10,18 +10,18 @@ load [@] [author ] [message ] [meta ] :::tip Note The `load` operator is exclusively for working with pools in a -[SuperDB data lake](../../commands/zed.md) and is not available for use in +[SuperDB data lake](../../commands/super-db.md) and is not available for use in [`super`](../../commands/super.md). ::: ### Description The `load` operator populates the specified `` with the values it -receives as input. Much like how [`super db load`](../../commands/zed.md#load) +receives as input. Much like how [`super db load`](../../commands/super-db.md#load) is used at the command line to populate a pool with data from files, streams, and URIs, the `load` operator is used to save query results from your SuperPipe query to a pool in the same SuperDB data lake. `` is a string indicating the -[name or ID](../../commands/zed.md#data-pools) of the destination pool. +[name or ID](../../commands/super-db.md#data-pools) of the destination pool. If the optional `@` string is included then the data will be committed to an existing branch of that name, otherwise the `main` branch is assumed. The `author`, `message`, and `meta` strings may also be provided to further diff --git a/docs/language/pipeline-model.md b/docs/language/pipeline-model.md index 533b11d17f..4108f2a063 100644 --- a/docs/language/pipeline-model.md +++ b/docs/language/pipeline-model.md @@ -16,7 +16,7 @@ In addition to the data sources specified as files on the `zq` command line, a source may also be specified with the [`from` operator](operators/from.md). When running on the command-line, `from` may refer to a file, an HTTP -endpoint, or an [S3](../integrations/amazon-s3.md) URI. When running in a [SuperDB data lake](../commands/zed.md), `from` typically +endpoint, or an [S3](../integrations/amazon-s3.md) URI. When running in a [SuperDB data lake](../commands/super-db.md), `from` typically refers to a collection of data called a "data pool" and is referenced using the pool's name much as SQL references database tables by their name. diff --git a/docs/libraries/python.md b/docs/libraries/python.md index 386fd8c9b1..9ffa763d8d 100644 --- a/docs/libraries/python.md +++ b/docs/libraries/python.md @@ -10,7 +10,7 @@ with a Zed lake. The Zed Python package supports loading data into a Zed lake as well as querying and retrieving results in the [ZJSON format](../formats/zjson.md). The Python client interacts with the Zed lake via the REST API served by -[`zed serve`](../commands/zed.md#serve). +[`super db serve`](../commands/super-db.md#serve). This approach works adequately when high data throughput is not required. We plan to introduce native [Super Binary](../formats/bsup.md) support for diff --git a/docs/tutorials/zed.md b/docs/tutorials/zed.md index 5c1fc3857c..3c21db432a 100644 --- a/docs/tutorials/zed.md +++ b/docs/tutorials/zed.md @@ -10,7 +10,7 @@ analytics? This is where the `zed` command comes in. `zed` builds on the type system and language found in `zq` and adds a high performance data lake on top. > Note: `zed` is currently in alpha form. Check out its current status in the -> [`zed` README](../commands/zed.md#status). +> [`super db` command](../commands/super-db.md#status) documentation.. ## Creating a Lake @@ -317,7 +317,7 @@ $ zed query -Z 'min(created_at), max(created_at)' Obviously this is only the tip of the iceberg in terms of things that can be done with the `zed` command. Some suggested next steps: -1. Dig deeper into Zed lakes by having a look at the [`zed` README](../commands/zed.md). +1. Dig deeper into SuperDB data lakes by having a look at the [`super db` command](../commands/super-db.md) documentation. 2. Get a better idea of ways you can query your data by looking at the [Zed language documentation](../language/README.md). diff --git a/docs/tutorials/zq.md b/docs/tutorials/zq.md index e8218d2971..cf9d208073 100644 --- a/docs/tutorials/zq.md +++ b/docs/tutorials/zq.md @@ -1255,5 +1255,5 @@ clean data for analysis by `zq` or even export into other systems or for testing If you'd like to learn more, feel free to read through the [language docs](../language/README.md) in depth -or see how you can organize [Zed data into a lake](../commands/zed.md) +or see how you can organize [data into a lake](../commands/super-db.md) using a git-like commit model.