From affe7823bb58fbc062fd650949dcda361399e060 Mon Sep 17 00:00:00 2001 From: Phil Rzewski Date: Fri, 13 Dec 2024 12:58:40 -0800 Subject: [PATCH] Make links work again inside tips --- docs/commands/super-db.md | 42 ++++++++++----- docs/commands/super.md | 6 ++- docs/formats/bsup.md | 12 +++-- docs/formats/csup.md | 54 ++++++++++++------- docs/install.md | 6 ++- docs/integrations/amazon-s3.md | 6 ++- docs/integrations/fluentd.md | 12 +++-- docs/integrations/zed-lake-auth/index.md | 12 +++-- .../zeek/data-type-compatibility.md | 6 ++- docs/integrations/zeek/shaping-zeek-json.md | 6 ++- docs/language/functions/cast.md | 6 ++- docs/language/functions/fill.md | 6 ++- docs/language/functions/grok.md | 12 +++-- docs/language/functions/order.md | 12 +++-- docs/language/lateral-subqueries.md | 12 +++-- docs/language/operators/cut.md | 6 ++- docs/language/operators/file.md | 6 ++- docs/language/operators/from.md | 6 ++- docs/language/operators/join.md | 6 ++- docs/language/operators/load.md | 6 ++- docs/language/pipe-ambiguity.md | 6 ++- docs/language/search-expressions.md | 18 ++++--- docs/tutorials/join.md | 18 ++++--- docs/tutorials/zq.md | 6 ++- 24 files changed, 192 insertions(+), 96 deletions(-) diff --git a/docs/commands/super-db.md b/docs/commands/super-db.md index 1204522aa3..a338a50389 100644 --- a/docs/commands/super-db.md +++ b/docs/commands/super-db.md @@ -15,7 +15,8 @@ title: super db

-{{< tip "Status" >}} +{{% tip "Status" %}} + While [`super`](super.md) and its accompanying [formats](../formats/_index.md) are production quality, the SuperDB data lake is still fairly early in development and alpha quality. @@ -25,7 +26,8 @@ is deployed to manage the lake's data layout via the [lake API](../lake/api.md). Enhanced scalability with self-tuning configuration is under development. -{{< /tip >}} + +{{% /tip %}} ## The Lake Model @@ -153,7 +155,8 @@ running any `super db` lake command all pointing at the same storage endpoint and the lake's data footprint will always remain consistent as the endpoints all adhere to the consistency semantics of the lake. -{{< tip "Caveat" >}} +{{% tip "Caveat" %}} + Data consistency is not fully implemented yet for the S3 endpoint so only single-node access to S3 is available right now, though support for multi-node access is forthcoming. @@ -164,7 +167,8 @@ access to a local file system has been thoroughly tested and should be deemed reliable, i.e., you can run a direct-access instance of `super db` alongside a server instance of `super db` on the same file system and data consistency will be maintained. -{{< /tip >}} + +{{% /tip %}} ### Locating the Lake @@ -206,11 +210,13 @@ Each commit object is assigned a global ID. Similar to Git, commit objects are arranged into a tree and represent the entire commit history of the lake. -{{< tip "Note" >}} +{{% tip "Note" %}} + Technically speaking, Git can merge from multiple parents and thus Git commits form a directed acyclic graph instead of a tree; SuperDB does not currently support multiple parents in the commit object history. -{{< /tip >}} + +{{% /tip %}} A branch is simply a named pointer to a commit object in the lake and like a pool, a branch name can be any valid UTF-8 string. @@ -272,10 +278,12 @@ key. For example, on a pool with pool key `ts`, the query `ts == 100` will be optimized to scan only the data objects where the value `100` could be present. -{{< tip "Note" >}} +{{% tip "Note" %}} + The pool key will also serve as the primary key for the forthcoming CRUD semantics. -{{< /tip >}} + +{{% /tip %}} A pool also has a configured sort order, either ascending or descending and data is organized in the pool in accordance with this order. @@ -325,9 +333,11 @@ using that pool's "branches log" in a similar fashion, then its corresponding commit object can be used to construct the data of that branch at that past point in time. -{{< tip "Note" >}} +{{% tip "Note" %}} + Time travel using timestamps is a forthcoming feature. -{{< /tip >}} + +{{% /tip %}} ## `super db` Commands @@ -407,11 +417,13 @@ the [special value `this`](../language/pipeline-model.md#the-special-value-this) A newly created pool is initialized with a branch called `main`. -{{< tip "Note" >}} +{{% tip "Note" %}} + Lakes can be used without thinking about branches. When referencing a pool without a branch, the tooling presumes the "main" branch as the default, and everything can be done on main without having to think about branching. -{{< /tip >}} + +{{% /tip %}} ### Delete ``` @@ -582,9 +594,11 @@ that is stored in the commit journal for reference. These values may be specified as options to the [`load`](#load) command, and are also available in the [lake API](../lake/api.md) for automation. -{{< tip "Note" >}} +{{% tip "Note" %}} + The branchlog meta-query source is not yet implemented. -{{< /tip >}} + +{{% /tip %}} ### Ls ``` diff --git a/docs/commands/super.md b/docs/commands/super.md index fadcd25e95..0c828bf15d 100644 --- a/docs/commands/super.md +++ b/docs/commands/super.md @@ -187,13 +187,15 @@ not desirable because (1) the Super JSON parser is not particularly performant a (2) all JSON numbers are floating point but the Super JSON parser will parse as JSON any number that appears without a decimal point as an integer type. -{{< tip "Note" >}} +{{% tip "Note" %}} + The reason `super` is not particularly performant for Super JSON is that the [Super Binary](../formats/bsup.md) or [Super Columnar](../formats/csup.md) formats are semantically equivalent to Super JSON but much more efficient and the design intent is that these efficient binary formats should be used in use cases where performance matters. Super JSON is typically used only when data needs to be human-readable in interactive settings or in automated tests. -{{< /tip >}} + +{{% /tip %}} To this end, `super` uses a heuristic to select between Super JSON and plain JSON when the `-i` option is not specified. Specifically, plain JSON is selected when the first values diff --git a/docs/formats/bsup.md b/docs/formats/bsup.md index 1bfacd5914..66dec48288 100644 --- a/docs/formats/bsup.md +++ b/docs/formats/bsup.md @@ -130,7 +130,8 @@ size decompression buffers in advance of decoding. Values for the `format` byte are defined in the [Super Binary compression format specification](./compression.md). -{{< tip "Note" >}} +{{% tip "Note" %}} + This arrangement of frames separating types and values allows for efficient scanning and parallelization. In general, values depend on type definitions but as long as all of the types are known when @@ -143,7 +144,8 @@ heuristics, e.g., knowing a filtering predicate can't be true based on a quick scan of the data perhaps using the Boyer-Moore algorithm to determine that a comparison with a string constant would not work for any value in the buffer. -{{< /tip >}} + +{{% /tip %}} Whether the payload was originally uncompressed or was decompressed, it is then interpreted according to the `T` bits of the frame code as a @@ -211,12 +213,14 @@ is further encoded as a "counted string", which is the `uvarint` encoding of the length of the string followed by that many bytes of UTF-8 encoded string data. -{{< tip "Note" >}} +{{% tip "Note" %}} + As defined by [Super JSON](jsup.md), a field name can be any valid UTF-8 string much like JSON objects can be indexed with arbitrary string keys (via index operator) even if the field names available to the dot operator are restricted by language syntax for identifiers. -{{< /tip >}} + +{{% /tip %}} The type ID follows the field name and is encoded as a `uvarint`. diff --git a/docs/formats/csup.md b/docs/formats/csup.md index de60c76920..a64b394511 100644 --- a/docs/formats/csup.md +++ b/docs/formats/csup.md @@ -64,12 +64,14 @@ then write the metadata into the reassembly section along with the trailer at the end. This allows a stream to be converted to a Super Columnar file in a single pass. -{{< tip "Note" >}} +{{% tip "Note" %}} + That said, the layout is flexible enough that an implementation may optimize the data layout with additional passes or by writing the output to multiple files then merging them together (or even leaving the Super Columnar entity as separate files). -{{< /tip >}} + +{{% /tip %}} ### The Data Section @@ -85,7 +87,8 @@ There is no information in the data section for how segments relate to one another or how they are reconstructed into columns. They are just blobs of Super Binary data. -{{< tip "Note" >}} +{{% tip "Note" %}} + Unlike Parquet, there is no explicit arrangement of the column chunks into row groups but rather they are allowed to grow at different rates so a high-volume column might be comprised of many segments while a low-volume @@ -93,9 +96,11 @@ column must just be one or several. This allows scans of low-volume record type (the "mice") to perform well amongst high-volume record types (the "elephants"), i.e., there are not a bunch of seeks with tiny reads of mice data interspersed throughout the elephants. -{{< /tip >}} -{{< tip "TBD" >}} +{{% /tip %}} + +{{% tip "TBD" %}} + The mice/elephants model creates an interesting and challenging layout problem. If you let the row indexes get too far apart (call this "skew"), then you have to buffer very large amounts of data to keep the column data aligned. @@ -109,7 +114,8 @@ if you use lots of buffering on ingest, you can write the mice in front of the elephants so the read path requires less buffering to align columns. Or you can do two passes where you store segments in separate files then merge them at close according to an optimization plan. -{{< /tip >}} + +{{% /tip %}} ### The Reassembly Section @@ -117,7 +123,8 @@ The reassembly section provides the information needed to reconstruct column streams from segments, and in turn, to reconstruct the original values from column streams, i.e., to map columns back to composite values. -{{< tip "Note" >}} +{{% tip "Note" %}} + Of course, the reassembly section also provides the ability to extract just subsets of columns to be read and searched efficiently without ever needing to reconstruct the original rows. How well this performs is up to any particular @@ -127,7 +134,8 @@ Also, the reassembly section is in general vastly smaller than the data section so the goal here isn't to express information in cute and obscure compact forms but rather to represent data in an easy-to-digest, programmer-friendly form that leverages Super Binary. -{{< /tip >}} + +{{% /tip %}} The reassembly section is a Super Binary stream. Unlike Parquet, which uses an externally described schema @@ -147,9 +155,11 @@ A super type's integer position in this sequence defines its identifier encoded in the [super column](#the-super-column). This identifier is called the super ID. -{{< tip "Note" >}} +{{% tip "Note" %}} + Change the first N values to type values instead of nulls? -{{< /tip >}} + +{{% /tip %}} The next N+1 records contain reassembly information for each of the N super types where each record defines the column streams needed to reconstruct the original @@ -171,11 +181,13 @@ type signature: In the rest of this document, we will refer to this type as `` for shorthand and refer to the concept as a "segmap". -{{< tip "Note" >}} +{{% tip "Note" %}} + We use the type name "segmap" to emphasize that this information represents a set of byte ranges where data is stored and must be read from *rather than* the data itself. -{{< /tip >}} + +{{% /tip %}} #### The Super Column @@ -216,11 +228,13 @@ This simple top-down arrangement, along with the definition of the other column structures below, is all that is needed to reconstruct all of the original data. -{{< tip "Note" >}} +{{% tip "Note" %}} + Each row reassembly record has its own layout of columnar values and there is no attempt made to store like-typed columns from different schemas in the same physical column. -{{< /tip >}} + +{{% /tip %}} The notation `` refers to any instance of the five column types: * [``](#record-column), @@ -296,9 +310,11 @@ in the same column order implied by the union type, and * `tags` is a column of `int32` values where each subsequent value encodes the tag of the union type indicating which column the value falls within. -{{< tip "TBD" >}} +{{% tip "TBD" %}} + Change code to conform to columns array instead of record{c0,c1,...} -{{< /tip >}} + +{{% /tip %}} The number of times each value of `tags` appears must equal the number of values in each respective column. @@ -350,14 +366,16 @@ data in the file, it will typically fit comfortably in memory and it can be very fast to scan the entire reassembly structure for any purpose. -{{< tip "Example" >}} +{{% tip "Example" %}} + For a given query, a "scan planner" could traverse all the reassembly records to figure out which segments will be needed, then construct an intelligent plan for reading the needed segments and attempt to read them in mostly sequential order, which could serve as an optimizing intermediary between any underlying storage API and the Super Columnar decoding logic. -{{< /tip >}} + +{{% /tip %}} To decode the "next" row, its schema index is read from the root reassembly column stream. diff --git a/docs/install.md b/docs/install.md index 075feadf89..0130874a6d 100644 --- a/docs/install.md +++ b/docs/install.md @@ -40,11 +40,13 @@ This installs the `super` binary in your `$GOPATH/bin`. Once installed, run a [quick test](#quick-tests). -{{< tip "Note" >}} +{{% tip "Note" %}} + If you don't have Go installed, download and install it from the [Go install page](https://golang.org/doc/install). Go 1.23 or later is required. -{{< /tip >}} + +{{% /tip %}} ## Quick Tests diff --git a/docs/integrations/amazon-s3.md b/docs/integrations/amazon-s3.md index a13caa192c..671825a042 100644 --- a/docs/integrations/amazon-s3.md +++ b/docs/integrations/amazon-s3.md @@ -16,11 +16,13 @@ You must specify an AWS region via one of the following: You can create `~/.aws/config` by installing the [AWS CLI](https://aws.amazon.com/cli/) and running `aws configure`. -{{< tip "Note" >}} +{{% tip "Note" %}} + If using S3-compatible storage that does not recognize the concept of regions, a region must still be specified, e.g., by providing a dummy value for `AWS_REGION`. -{{< /tip >}} + +{{% /tip %}} ## Credentials diff --git a/docs/integrations/fluentd.md b/docs/integrations/fluentd.md index f8700871d2..cfe29c4e76 100644 --- a/docs/integrations/fluentd.md +++ b/docs/integrations/fluentd.md @@ -81,13 +81,15 @@ The default settings when running `zed create` set the field and sort the stored data in descending order by that key. This configuration is ideal for Zeek log data. -{{< tip "Note" >}} +{{% tip "Note" %}} + The [Zui](https://zui.brimdata.io/) desktop application automatically starts a Zed lake service when it launches. Therefore if you are using Zui you can skip the first set of commands shown above. The pool can be created from Zui by clicking **+**, selecting **New Pool**, then entering `ts` for the [pool key](../commands/super-db.md#pool-key). -{{< /tip >}} + +{{% /tip %}} ### Fluentd @@ -366,7 +368,8 @@ leverage, you can reduce the lake's storage footprint by periodically running storage that contain the granular commits that have already been rolled into larger objects by compaction. -{{< tip "Note" >}} +{{% tip "Note" %}} + As described in issue [super/4934](https://github.com/brimdata/super/issues/4934), even after running `zed vacuum`, some files related to commit history are currently still left behind below the lake storage path. The issue describes @@ -374,7 +377,8 @@ manual steps that can be taken to remove these files safely, if desired. However, if you find yourself needing to take these steps in your environment, please [contact us](#contact-us) as it will allow us to boost the priority of addressing the issue. -{{< /tip >}} + +{{% /tip %}} ## Ideas For Enhancement diff --git a/docs/integrations/zed-lake-auth/index.md b/docs/integrations/zed-lake-auth/index.md index 0a6a4c2259..6ddd770d71 100644 --- a/docs/integrations/zed-lake-auth/index.md +++ b/docs/integrations/zed-lake-auth/index.md @@ -30,10 +30,12 @@ and then clicking the **Create API** button. 2. Enter any **Name** and URL **Identifier** for the API, then click the **Create** button. -{{< tip "Tip" >}} +{{% tip "Tip" %}} + Note the value you enter for the **Identifier** as you'll need it later for the Zed lake service configuration. -{{< /tip >}} + +{{% /tip %}} ![api-name-identifier](api-name-identifier.png) @@ -50,11 +52,13 @@ need it later for the Zed lake service configuration. 1. Begin creating a new application by clicking **Applications** in the left navigation menu and then clicking the **Create Application** button. -{{< tip "Note" >}} +{{% tip "Note" %}} + Neither the "Zed lake (Test Application)" that was created for us automatically when we created our API nor the Default App that came with the trial are used in this configuration. -{{< /tip >}} + +{{% /tip %}} ![create-application](create-application.png) diff --git a/docs/integrations/zeek/data-type-compatibility.md b/docs/integrations/zeek/data-type-compatibility.md index b1fbb3379b..3f8f9a9619 100644 --- a/docs/integrations/zeek/data-type-compatibility.md +++ b/docs/integrations/zeek/data-type-compatibility.md @@ -49,7 +49,8 @@ applicable to handling certain types. | [`vector`](https://docs.zeek.org/en/current/script-reference/types.html#type-vector) | [`array`](../../formats/zed.md#22-array | | | [`record`](https://docs.zeek.org/en/current/script-reference/types.html#type-record) | [`record`](../../formats/zed.md#21-record | See [`record` details](#record) | -{{< tip "Note" >}} +{{% tip "Note" %}} + The [Zeek data types](https://docs.zeek.org/en/current/script-reference/types.html) page describes the types in the context of the [Zeek scripting language](https://docs.zeek.org/en/master/scripting/index.html). @@ -57,7 +58,8 @@ The Zeek types available in scripting are a superset of the data types that may appear in Zeek log files. The encodings of the types also differ in some ways between the two contexts. However, we link to this reference because there is no authoritative specification of the Zeek TSV log format. -{{< /tip >}} + +{{% /tip %}} ## Example diff --git a/docs/integrations/zeek/shaping-zeek-json.md b/docs/integrations/zeek/shaping-zeek-json.md index 73377ab5b5..6ae365d0ee 100644 --- a/docs/integrations/zeek/shaping-zeek-json.md +++ b/docs/integrations/zeek/shaping-zeek-json.md @@ -193,11 +193,13 @@ specification. ... ``` -{{< tip "Note" >}} +{{% tip "Note" %}} + See [the role of `_path`](reading-zeek-log-formats.md#the-role-of-_path) for important details if you're using Zeek's built-in [ASCII logger](https://docs.zeek.org/en/current/scripts/base/frameworks/logging/writers/ascii.zeek.html) rather than the [JSON Streaming Logs](https://github.com/corelight/json-streaming-logs) package. -{{< /tip >}} + +{{% /tip %}} ### Zed Pipeline diff --git a/docs/language/functions/cast.md b/docs/language/functions/cast.md index 3f6a4342b5..632a978b7b 100644 --- a/docs/language/functions/cast.md +++ b/docs/language/functions/cast.md @@ -41,11 +41,13 @@ to match the output type's order but rather just modifies the leaf values. If a cast fails, an error is returned when casting to primitive types and the input value is returned when casting to complex types. -{{< tip "Note" >}} +{{% tip "Note" %}} + Many users seeking to `cast` record values prefer to use the [`shape` function](./shape.md) which applies the `cast`, [`fill`](./fill.md), and [`order`](./order.md) functions simultaneously. -{{< /tip >}} + +{{% /tip %}} ### Examples diff --git a/docs/language/functions/fill.md b/docs/language/functions/fill.md index 898c9b5d42..3d125931bc 100644 --- a/docs/language/functions/fill.md +++ b/docs/language/functions/fill.md @@ -20,11 +20,13 @@ you want to be sure that all fields in a schema are present in a record. If `val` is not a record, it is returned unmodified. -{{< tip "Tip" >}} +{{% tip "Tip" %}} + Many users seeking the functionality of `fill` prefer to use the [`shape` function](./shape.md) which applies the `fill`, [`cast`](./cast.md), and [`order`](./order.md) functions simultaneously on a record. -{{< /tip >}} + +{{% /tip %}} ### Examples diff --git a/docs/language/functions/grok.md b/docs/language/functions/grok.md index c757014204..bd395832c0 100644 --- a/docs/language/functions/grok.md +++ b/docs/language/functions/grok.md @@ -42,12 +42,14 @@ been published by Elastic and others that provide helpful guidance on becoming proficient in Grok. To help you adapt what you learn from these resources to the use of the `grok` function, review the tips below. -{{< tip "Note" >}} +{{% tip "Note" %}} + As these represent areas of possible future SuperPipe enhancement, links to open issues are provided. If you find a functional gap significantly impacts your ability to use the `grok` function, please add a comment to the relevant issue describing your use case. -{{< /tip >}} + +{{% /tip %}} 1. Logstash's Grok offers an optional data type conversion syntax, e.g., @@ -111,7 +113,8 @@ issue describing your use case. avoid compatibility issues, we recommend building configurations starting from the RE2-based [included patterns](#included-patterns). -{{< tip "Note" >}} +{{% tip "Note" %}} + If you absolutely require features of Logstash's Grok that are not currently present in SuperPipe, you can create a Logstash-based preprocessing pipeline that uses its @@ -121,7 +124,8 @@ and send its output as JSON to SuperPipe. Issue getting started. If you pursue this approach, please add a comment to the issue describing your use case or come talk to us on [community Slack](https://www.brimdata.io/join-slack/). -{{< /tip >}} + +{{% /tip %}} ### Debugging diff --git a/docs/language/functions/order.md b/docs/language/functions/order.md index 94f0d7bac9..0dbf1de6dd 100644 --- a/docs/language/functions/order.md +++ b/docs/language/functions/order.md @@ -26,16 +26,20 @@ the empty record type, i.e., ``` order(val, <{}>) ``` -{{< tip "Tip" >}} +{{% tip "Tip" %}} + Many users seeking the functionality of `order` prefer to use the [`shape` function](./shape.md) which applies the `order`, [`cast`](./cast.md), and [`fill`](./fill.md) functions simultaneously on a record. -{{< /tip >}} -{{< tip "Note" >}} +{{% /tip %}} + +{{% tip "Note" %}} + [Record expressions](../expressions.md#record-expressions) can also be used to reorder fields without specifying types ([example](../shaping.md#order)). -{{< /tip >}} + +{{% /tip %}} ### Examples diff --git a/docs/language/lateral-subqueries.md b/docs/language/lateral-subqueries.md index 2510abd84a..1b51b50dd9 100644 --- a/docs/language/lateral-subqueries.md +++ b/docs/language/lateral-subqueries.md @@ -9,10 +9,12 @@ The inner query may be _any_ pipeline operator sequence (excluding [`from` operators](operators/from.md)) and may refer to values from the outer sequence. -{{< tip "Note" >}} +{{% tip "Note" %}} + This pattern rhymes with the SQL pattern of a "lateral join", which runs a subquery for each row of the outer query's results. -{{< /tip >}} + +{{% /tip %}} Lateral subqueries are created using the scoped form of the [`over` operator](operators/over.md). They may be nested to arbitrary depth @@ -125,9 +127,11 @@ parenthesized form: ( over [, ...] [with = [, ... [=]] |> ) ``` -{{< tip "Note" >}} +{{% tip "Note" %}} + The parentheses disambiguate a lateral expression from a [lateral pipeline operator](operators/over.md). -{{< /tip >}} + +{{% /tip %}} This form must always include a [lateral scope](#lateral-scope) as indicated by ``. diff --git a/docs/language/operators/cut.md b/docs/language/operators/cut.md index df28728cd7..1fd2c5a8c9 100644 --- a/docs/language/operators/cut.md +++ b/docs/language/operators/cut.md @@ -76,11 +76,13 @@ echo '1 {a:1,b:2,c:3}' | super -z -c 'cut a,b' - ``` _Invoke a function while cutting to set a default value for a field_ -{{< tip "Tip" >}} +{{% tip "Tip" %}} + This can be helpful to transform data into a uniform record type, such as if the output will be exported in formats such as `csv` or `parquet` (see also: [`fuse`](fuse.md)). -{{< /tip >}} + +{{% /tip %}} ```mdtest-command echo '{a:1,b:null}{a:1,b:2}' | super -z -c 'cut a,b:=coalesce(b, 0)' - diff --git a/docs/language/operators/file.md b/docs/language/operators/file.md index 985c5721f5..3e86ae1b0f 100644 --- a/docs/language/operators/file.md +++ b/docs/language/operators/file.md @@ -6,7 +6,9 @@ `file` is a shorthand notation for `from`. See the [from operator](from.md) documentation for details. -{{< tip "Note" >}} +{{% tip "Note" %}} + The `file` shorthand is exclusively for working with inputs to [`super`](../../commands/super.md) and is not available for use with [SuperDB data lakes](../../commands/super-db.md). -{{< /tip >}} + +{{% /tip %}} diff --git a/docs/language/operators/from.md b/docs/language/operators/from.md index ae779c48c7..f6b64719d1 100644 --- a/docs/language/operators/from.md +++ b/docs/language/operators/from.md @@ -28,9 +28,11 @@ their data to its output. A data source can be * an HTTP, HTTPS, or S3 URI; or * the [`pass` operator](pass.md), to treat the upstream pipeline branch as a source. -{{< tip "Note" >}} +{{% tip "Note" %}} + File paths and URIs may be followed by an optional [format](../../commands/super.md#input-formats) specifier. -{{< /tip >}} + +{{% /tip %}} Sourcing data from pools is only possible when querying a lake, such as via the [`super db` command](../../commands/super-db.md) or diff --git a/docs/language/operators/join.md b/docs/language/operators/join.md index 6b893919e1..e92e7ccc00 100644 --- a/docs/language/operators/join.md +++ b/docs/language/operators/join.md @@ -14,13 +14,15 @@ | [anti|inner|left|right] join on = [[:=], ...] ``` -{{< tip "Note" >}} +{{% tip "Note" %}} + The first `join` syntax shown above was more recently introduced and is in some ways similar to other languages such as SQL. The second was the original `join` syntax in SuperPipe. Most joins can be expressed using either syntax. See the [join tutorial](../../tutorials/join.md) for details. -{{< /tip >}} + +{{% /tip %}} ### Description diff --git a/docs/language/operators/load.md b/docs/language/operators/load.md index a78a84093a..e726962c39 100644 --- a/docs/language/operators/load.md +++ b/docs/language/operators/load.md @@ -8,11 +8,13 @@ load [@] [author ] [message ] [meta ] ``` -{{< tip "Note" >}} +{{% tip "Note" %}} + The `load` operator is exclusively for working with pools in a [SuperDB data lake](../../commands/super-db.md) and is not available for use in [`super`](../../commands/super.md). -{{< /tip >}} + +{{% /tip %}} ### Description diff --git a/docs/language/pipe-ambiguity.md b/docs/language/pipe-ambiguity.md index eeeb0da444..218ec117fa 100644 --- a/docs/language/pipe-ambiguity.md +++ b/docs/language/pipe-ambiguity.md @@ -85,7 +85,9 @@ So when you want to run SuperSQL on old SQL queries that use top-level bitwise-OR expressions --- arguably a pretty obscure corner case --- just disable SuperSQL shortcuts and everything will work. -{{< tip "Note" >}} +{{% tip "Note" %}} + Note that a config option to disable shortcuts is not yet implemented, but will be available in the future. -{{< /tip >}} + +{{% /tip %}} diff --git a/docs/language/search-expressions.md b/docs/language/search-expressions.md index 6e964e2371..399ff878d1 100644 --- a/docs/language/search-expressions.md +++ b/docs/language/search-expressions.md @@ -65,9 +65,11 @@ _ . : / % # @ ~ A glob must begin with one of these characters or `*` then may be followed by any of these characters, `*`, or digits `0` through `9`. -{{< tip "Note" >}} +{{% tip "Note" %}} + These rules do not allow for a leading digit. -{{< /tip >}} + +{{% /tip %}} For example, a prefix match is easily accomplished via `prefix*`, e.g., ```mdtest-command @@ -125,13 +127,15 @@ is a Boolean comparison between the product `a*b` and `c`. The search patterns described above can be combined with other "search terms" using Boolean logic to form search expressions. -{{< tip "Note" >}} +{{% tip "Note" %}} + When processing [Super Binary](../formats/bsup.md) data, the SuperDB runtime performs a multi-threaded Boyer-Moore scan over decompressed data buffers before parsing any data. This allows large buffers of data to be efficiently discarded and skipped when searching for rarely occurring values. For a [SuperDB data lake](../lake/format.md), a planned feature will use [Super Columnar](../formats/csup.md) files to further accelerate searches. -{{< /tip >}} + +{{% /tip %}} ### Search Terms @@ -228,12 +232,14 @@ is equivalent to where grep("foo", this) ``` -{{< tip "Note" >}} +{{% tip "Note" %}} + This equivalency between keyword search terms and grep semantics will change in the near future when we add support for full-text search. In this case, grep will still support substring match but keyword search will match segmented words from string fields. -{{< /tip >}} + +{{% /tip %}} #### Non-String Literal Search Term diff --git a/docs/tutorials/join.md b/docs/tutorials/join.md index 081e645402..7c186cd822 100644 --- a/docs/tutorials/join.md +++ b/docs/tutorials/join.md @@ -65,9 +65,11 @@ produces ## Left Join -{{< tip "Note" >}} +{{% tip "Note" %}} + In some databases a left join is called a _left outer join_. -{{< /tip >}} + +{{% /tip %}} By performing a left join that targets the same key fields, now all of our fruits will be shown in the results even if no one likes them (e.g., `avocado`). @@ -102,9 +104,11 @@ produces ## Right join -{{< tip "Note" >}} +{{% tip "Note" %}} + In SQL, a right join is called a _right outer join_. -{{< /tip >}} + +{{% /tip %}} Next we'll change the join type from `left` to `right`. Notice that this causes the `note` field from the right-hand input to appear in the joined results. @@ -132,9 +136,11 @@ produces ## Anti join -{{< tip "Note" >}} +{{% tip "Note" %}} + In some databases an anti join is called a _left anti join_. -{{< /tip >}} + +{{% /tip %}} The join type `anti` allows us to see which fruits are not liked by anyone. Note that with anti join only values from the left-hand input appear in the diff --git a/docs/tutorials/zq.md b/docs/tutorials/zq.md index 2a24100ca8..9237f4999e 100644 --- a/docs/tutorials/zq.md +++ b/docs/tutorials/zq.md @@ -43,12 +43,14 @@ To this end, if you want full JSON compatibility without having to delve into th details of Zed, just use the `-j` option with `zq` and this will tell `zq` to expect JSON values as input and produce JSON values as output, much like `jq`. -{{< tip "Tip" >}} +{{% tip "Tip" %}} + If your downstream JSON tooling expects only a single JSON value, we can use `-j` along with [`collect()`](../language/aggregates/collect.md) to aggregate multiple input values into an array. A `collect()` example is shown [later in this tutorial](#running-analytics). -{{< /tip >}} + +{{% /tip %}} ## `this` vs `.`