Skip to content

Commit

Permalink
feat: add nightly version (#918)
Browse files Browse the repository at this point in the history
Signed-off-by: tison <[email protected]>
Co-authored-by: Yiran <[email protected]>
  • Loading branch information
tisonkun and nicecui authored Apr 26, 2024
1 parent 9eeb49d commit 137a6f6
Show file tree
Hide file tree
Showing 444 changed files with 27,674 additions and 4 deletions.
13 changes: 9 additions & 4 deletions docs/.vitepress/theme/serverUtils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ import YAML from 'js-yaml'
import { CURRENT_VERSION, versionMap, websiteMap, LATEST_VERSION, CURRENT_LANG } from '../config/common'
import semver from 'semver'

const NIGHTLY_VERSION = "nightly"

export async function makeSidebar(lang, version) {
const langPath = `/${lang}`
const versionPath = `/${version}`
Expand Down Expand Up @@ -67,10 +69,13 @@ export const getSrcExclude = (versionMap: Array<string>, lang: string, langMap:
} else {
srcExclude.push(`**/${version}/**`)
}
const ver = semver.parse(`${version}.0`);
if (ver > curVer) {
const fixedVer = `${ver.major}-${ver.minor}`
srcExclude.push(`**/release-notes/release-${fixedVer}-*`)

if (version !== NIGHTLY_VERSION) {
const ver = semver.parse(`${version}.0`);
if (ver > curVer) {
const fixedVer = `${ver.major}-${ver.minor}`
srcExclude.push(`**/release-notes/release-${fixedVer}-*`)
}
}
})

Expand Down
34 changes: 34 additions & 0 deletions docs/nightly/en/contributor-guide/benchmarks/nyc-taxi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# NYC taxi benchmark

This benchmark is based on the data from [New York City Taxi & Limousine Commission](https://www.nyc.gov/site/tlc/index.page). From the official site, the data includes:

> taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
The commands in the documentation assume that your current working directory is the root directory of the [GreptimeDB](https://github.com/GreptimeTeam/greptimedb) source code.

## Get the data

First, create a directory to store test data.
```shell
mkdir -p ./benchmarks/data
```

You can find the data in [this page](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page). We support all the **Yellow Taxi Trip Records** since 2022-01. For example, to get the data for January 2022, you can run:

```shell
curl "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-01.parquet" -o ./benchmarks/data/yellow_tripdata_2022-01.parquet
```

## Run benchmark

Before running the benchmark, please make sure you have started the GreptimeDB server. You can start GreptimeDB by using the following command:

```shell
cargo run --release standalone start
```

Our benchmark tools are included in the source code. You can run it by:

```shell
cargo run --release --bin nyc-taxi -- --path "./benchmarks/data/"
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Data Persistence and Indexing

Similar to all LSMT-like storage engines, data in MemTables is persisted to durable storage, for example, the local disk file system or object storage service. GreptimeDB adopts [Apache Parquet][1] as its persistent file format.

## SST File Format

Parquet is an open source columnar format that provides fast data querying and has already been adopted by many projects, such as Delta Lake.

Parquet has a hierarchical structure like "row groups-columns-data pages". Data in a Parquet file is horizontally partitioned into row groups, in which all values of the same column are stored together to form a data page. Data page is the minimal storage unit. This structure greatly improves performance.

First, clustering data by column makes file scanning more efficient, especially when only a few columns are queried, which is very common in analytical systems.

Second, data of the same column tends to be homogeneous which helps with compression when apply techniques like dictionary and Run-Length Encoding (RLE).

![Parquet file format](/parquet-file-format.png)

## Data Persistence

GreptimeDB provides a configuration item `storage.flush.global_write_buffer_size`, which is flush threshold of the total memory usage for all MemTables.

When the size of data buffered in MemTables reaches that threshold, GreptimeDB will pick MemTables and flush them to SST files.

## Indexing Data in SST Files

Apache Parquet file format provides inherent statistics in headers of column chunks and data pages, which are used for pruning and skipping.

![Column chunk header](/column-chunk-header.png)

For example, in the above Parquet file, if you want to filter rows where `name` = `Emily`, you can easily skip row group 0 because the max value for `name` field is `Charlie`. This statistical information reduces IO operations.

## Index Files

For each SST file, GreptimeDB not only maintains an internal index but also generates a separate file to store the index structures specific to that SST file.

The index files utilize the [Puffin][3] format, which offers significant flexibility, allowing for the storage of additional metadata and supporting a broader range of index structures.

![Puffin](/puffin.png)

Currently, the inverted index is the first supported index structure, and it is stored within the index file as a Blob.

## Inverted Index

In version 0.7, GreptimeDB introduced the inverted index to accelerate queries.

The inverted index is a common index structure used for full-text searches, mapping each word in the document to a list of documents containing that word. Greptime has adopted this technology, which originates from search engines, for use in the time series databases.

Search engines and time series databases operate in separate domains, yet the principle behind the applied inverted index technology is similar. This similarity requires some conceptual adjustments:
1. Term: In GreptimeDB, it refers to the column value of the time series.
2. Document: In GreptimeDB, it refers to the data segment containing multiple time series.

The inverted index enables GreptimeDB to skip data segments that do not meet query conditions, thus improving scanning efficiency.

![Inverted index searching](/inverted-index-searching.png)

For instance, the query above uses the inverted index to identify data segments where `job` equals `apiserver`, `handler` matches the regex `.*users`, and `status` matches the regex `4...`. It then scans these data segments to produce the final results that meet all conditions, significantly reducing the number of IO operations.

### Inverted Index Format

![Inverted index format](/inverted-index-format.png)

GreptimeDB builds inverted indexes by column, with each inverted index consisting of an FST and multiple Bitmaps.

The FST (Finite State Transducer) enables GreptimeDB to store mappings from column values to Bitmap positions in a compact format and provides excellent search performance and supports complex search capabilities (such as regular expression matching). The Bitmaps maintain a list of data segment IDs, with each bit representing a data segment.

### Index Data Segments

GreptimeDB divides an SST file into multiple indexed data segments, with each segment housing an equal number of rows. This segmentation is designed to optimize query performance by scanning only the data segments that match the query conditions.

For example, if a data segment contains 1024 rows and the list of data segments identified through the inverted index for the query conditions is `[0, 2]`, then only the 0th and 2nd data segments in the SST file—from rows 0 to 1023 and 2048 to 3071, respectively—need to be scanned.

The number of rows in a data segment is controlled by the engine option `index.inverted_index.segment_row_count`, which defaults to `1024`. A smaller value means more precise indexing and often results in better query performance but increases the cost of index storage. By adjusting this option, a balance can be struck between storage costs and query performance.

## Unified Data Access Layer: OpenDAL

GreptimeDB uses [OpenDAL][2] to provide a unified data access layer, thus, the storage engine does not need to interact with different storage APIs, and data can be migrated to cloud-based storage like AWS S3 seamlessly.

[1]: https://parquet.apache.org
[2]: https://github.com/datafuselabs/opendal
[3]: https://iceberg.apache.org/puffin-spec
39 changes: 39 additions & 0 deletions docs/nightly/en/contributor-guide/datanode/metric-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Metric Engine

## Overview

The `Metric` engine is a component of GreptimeDB, and it's an implementation of the storage engine. It mainly targets scenarios with a large number of small tables for observable metrics.

Its main feature is to use synthetic physical wide tables to store a large amount of small table data, achieving effects such as reuse of the same column and metadata. This reduces storage overhead for small tables and improves columnar compression efficiency. The concept of a table becomes even more lightweight under the `Metric` engine.

## Concepts

The `Metric` engine introduces two new concepts: "logical table" and "physical table". From the user's perspective, logical tables are exactly like ordinary ones. From a storage point-of-view, physical Regions are just regular Regions.

### Logical Table

A logical table refers to user-defined tables. Just like any other ordinary table, its definition includes the name of the table, column definitions, index definitions etc. All operations such as queries or write-ins by users are based on these logical tables. Users don't need to worry about differences between logical and ordinary tables during usage.

From an implementation standpoint, a logical table is virtual; it doesn't directly read or write physical data but maps read/write requests into corresponding requests for physical tables in order to implement data storage and querying.

### Physical Table

A physical table is a table that actually stores data, possessing several physical Regions defined by partition rules.

## Architecture and Design

The main design architecture of the `Metric` engine is as follows:

![Arch](/metric-engine-arch.png)

In the current version implementation, the `Metric` engine reuses the `Mito` engine to achieve storage and query capabilities for physical data. It also provides access to both physical tables and logical tables simultaneously.

Regarding partitioning, logical tables have identical partition rules and Region distribution as physical tables. This makes sense because the data of logical tables are directly stored in physical tables, so their partition rules are consistent.

Concerning routing metadata, the routing address of a logical table is a logical address - what its corresponding physical table is - then through this physical table for secondary routing to obtain the real physical address. This indirect routing method can significantly reduce the number of metadata modifications required when Region migration scheduling occurs in Metric engines.

Operationally speaking, The `Metric` engine only supports limited operations on physical tables to prevent misoperations such as prohibiting writing into or deleting from a physical table which could affect user's logic-table data. Generally speaking, users can consider that they have read-only access to these physical tables.

To improve performance during simultaneous DDL (Data Definition Language) operations on many tables, the 'Metric' engine has introduced some batch DDL operations. These batch DDL operations can merge lots of DDL actions into one request thereby reducing queries and modifications times for metadata thus enhancing performance. This feature is particularly beneficial in scenarios such as the automatic creation requests brought about by large amounts of metrics during Prometheus Remote Write cold start-up, as well as the modification requests for numerous route-tables mentioned earlier during migration of many physical regions.

Apart from physical data regions belonging to physical tables, the 'Metric' engine creates an additional metadata region physically for each individual physical data region used in storing some metadata needed by itself while maintaining mapping and other states. This metadata includes the mapping relationship between logical tables and physical tables, the mapping relationship between logical columns and physical columns etc.
29 changes: 29 additions & 0 deletions docs/nightly/en/contributor-guide/datanode/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Overview

## Introduction

`Datanode` is mainly responsible for storing the actual data for GreptimeDB. As we know, in GreptimeDB,
a `table` can have one or more `Region`s, and `Datanode` is responsible for managing the reading and writing
of these `Region`s. `Datanode` is not aware of `table` and can be considered as a `region server`. Therefore,
`Frontend` and `Metasrv` operate `Datanode` at the granularity of `Region`.

![Datanode](/datanode.png)

## Components

A `Datanode` contains all the components needed for a `region server`. Here we list some of the vital parts:

- A gRPC service is provided for reading and writing region data, and `Frontend` uses this service
to read and write data from `Datanode`s.
- An HTTP service, through which you can obtain metrics, configuration information, etc., of the current node.
- `Heartbeat Task` is used to send heartbeat to the `Metasrv`. The heartbeat plays a crucial role in the
distributed architecture of GreptimeDB and serves as a basic communication channel for distributed coordination.
The upstream heartbeat messages contain important information such as the workload of a `Region`. If the
`Metasrv `has made scheduling(such as `Region` migration) decisions, it will send instructions to the
`Datanode` via downstream heartbeat messages.
- The `Datanode` does not include components like the `Physical Planner`, `Optimizer`, etc. (these are placed in
the `Frontend`). The user's query requests for one or more `Table`s will be transformed into `Region` query
requests in the `Frontend`. The `Datanode` is responsible for handling these `Region` query requests.
- A `Region Manager` is used to manage all `Region`s on a `Datanode`.
- GreptimeDB supports a pluggable multi-engine architecture, with existing engines including `File Engine` and
`Mito Engine`.
30 changes: 30 additions & 0 deletions docs/nightly/en/contributor-guide/datanode/python-scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Python Scripts

## Introduction

Python scripts are methods for analyzing data in GreptimeDB,
by running it in the database directly instead of fetching all the data from the database and running it locally.
This approach saves a lot of data transfer costs.
The image below depicts how the script works.
The `RecordBatch` (which is basically a column in a table with type and nullability metadata)
can come from anywhere in the database,
and the returned `RecordBatch` can be annotated in Python grammar to indicate its metadata,
such as type or nullability.
The script will do its best to convert the returned object to a `RecordBatch`,
whether it is a Python list, a `RecordBatch` computed from parameters,
or a constant (which is extended to the same length as the input arguments).

![Python Coprocessor](/python-coprocessor.png)

## Two optional backends

### CPython Backend powered by PyO3

This backend is powered by [PyO3](https://pyo3.rs/v0.18.1/), enabling the use of your favourite Python libraries (such as NumPy, Pandas, etc.) and allowing Conda to manage your Python environment.

But using it also involves some complications. You must set up the correct Python shared library, which can be a bit challenging. In general, you just need to install the `python-dev` package. However, if you are using Homebrew to install Python on macOS, you must create a proper soft link to `Library/Frameworks/Python.framework`. Detailed instructions on using PyO3 crate with different Python Version can be found [here](https://pyo3.rs/v0.18.1/building_and_distribution#configuring-the-python-version)

### Embedded RustPython Interpreter

An experiment [python interpreter](https://github.com/RustPython/RustPython) to run
the coprocessor script, it supports Python 3.10 grammar. You can use all the very Python syntax, see [User Guide/Python Coprocessor](/user-guide/python-scripts/overview.md) for more!
59 changes: 59 additions & 0 deletions docs/nightly/en/contributor-guide/datanode/query-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Query Engine

## Introduction

GreptimeDB's query engine is built on [Apache DataFusion][1] (subproject under [Apache
Arrow][2]), a brilliant query engine written in Rust. It provides a set of well functional components from
logical plan, physical plan and the execution runtime. Below explains how each component is orchestrated and their positions during execution.

![Execution Procedure](/execution-procedure.png)

The entry point is the logical plan, which is used as the general intermediate representation of a
query or execution logic etc. Two noticeable sources of logical plan are from: 1. the user query, like
SQL through SQL parser and planner; 2. the Frontend's distributed query, which is explained in details in the following section.

Next is the physical plan, or the execution plan. Unlike the logical plan which is a big
enumeration containing all the logical plan variants (except the special extension plan node), the
physical plan is in fact a trait that defines a group of methods invoked during
execution. All data processing logics are packed in corresponding structures that
implement the trait. They are the actual operations performed on the data, like
aggregator `MIN` or `AVG`, and table scan `SELECT ... FROM`.

The optimization phase which improves execution performance by transforming both logical and physical plans, is now all based on rules. It is also called, "Rule Based Optimization". Some of the rules are DataFusion native and others are customized in Greptime DB. In the future, we plan to add more
rules and leverage the data statistics for Cost Based Optimization/CBO.

The last phase "execute" is a verb, stands for the procedure that reads data from storage, performs
calculations and generates the expected results. Although it's more abstract than previously mentioned concepts, you can just
simply imagine it as executing a Rust async function. And it's indeed a future (stream).

`EXPLAIN [VERBOSE] <SQL>` is very useful if you want to see how your SQL is represented in the logical or physical plan.

## Data Representation

GreptimeDB uses [Apache Arrow][2] as the in-memory data representation. It's column-oriented, in
cross-platform format, and also contains many high-performance data operators. These features
make it easy to share data in many different environments and implement calculation logic.

## Indexing

In time series data, there are two important dimensions: timestamp and tag columns (or like
primary key in a general relational database). GreptimeDB groups data in time buckets, so it's efficient
to locate and extract data within the expected time range at a very low cost. The mainly used persistent file format [Apache Parquet][3] in GreptimeDB helps a lot -- it
provides multi-level indices and filters that make it easy to prune data during querying. In the future, we
will make more use of this feature, and develop our separated index to handle more complex use cases.

## Extensibility

Extending operations in GreptimeDB is extremely simple. There are two ways to do it: 1. via the [Python Coprocessor][4] interface; 2. implement your operator like
[this][5].

## Distributed Execution

Covered in [Distributed Querying][6].

[1]: https://github.com/apache/arrow-datafusion
[2]: https://arrow.apache.org/
[3]: https://parquet.apache.org
[4]: python-scripts.md
[5]: https://github.com/GreptimeTeam/greptimedb/blob/main/docs/how-to/how-to-write-aggregate-function.md
[6]: ../frontend/distributed-querying.md
Loading

0 comments on commit 137a6f6

Please sign in to comment.