Skip to content

Commit

Permalink
docs: update the partition grammar (#888)
Browse files Browse the repository at this point in the history
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yiran <cuiyiran3@gmail.com>
  • Loading branch information
waynexia and nicecui authored Apr 15, 2024
1 parent 8a986bd commit 530c943
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 44 deletions.
31 changes: 21 additions & 10 deletions docs/v0.7/en/contributor-guide/frontend/table-sharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,42 @@ in OLTP databases.
In GreptimeDB, a table can be horizontally partitioned in multiple ways and it uses the same
partitioning types (and corresponding syntax) as in MySQL. Currently, GreptimeDB supports "RANGE COLUMNS partitioning".

In "RANGE COLUMNS partitioning", each partition includes only a portion of the data from the table, and is
Each partition includes only a portion of the data from the table, and is
grouped by some column(s) value range. For example, we can partition a table in GreptimeDB like
this:

```sql
CREATE TABLE (...)
PARTITION ON COLUMNS (<COLUMN LIST>) (
<RULE LIST>
);
```

The syntax mainly consists of two parts:
- `PARTITION ON COLUMNS` followed by a comma-separated list of column names, which specifies which columns might be used for partitioning. The partition list specified here is only used as an "allow list", and in reality only a portion of the columns specified here will be used for partitioning.
- `RULE LIST` is a list of multiple partition rules, each of which is a combination of a partition name and a partition condition. The expressions here can use `=`, `!=`, `>`, `>=`, `<`, `<=`, `AND`, `OR`, column name and literals.

Here is a concrete example:

```sql
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION BY RANGE COLUMNS (a) (
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (MAXVALUE),
PARTITION ON COLUMNS (a) (
a < 10,
a >= 10 AND a < 20,
a >= 20,
);
```

`my_table` that we created above has 3 partitions. Partition "p0" contains a portion of data that
only has rows of column "a < 10"; partition "p1" contains rows of "10 <= a < 20"; partition "p2"
includes the remaining rows of "a >= 20".
The above `my_table` has 3 partitions. The first partition contains rows where "a < 10", the second partition contains rows where "10 <= a < 20", and the third partition contains all rows where "a >= 20".

::: warning Important

1. Value ranges must be strictly increased, and finally ends with "`MAXVALUE`".
2. The partition column must be a primary key.
1. The ranges of all partitions must not overlap.
2. The columns used for partitioning must be specified in `ON COLUMNS`

:::

Expand Down
19 changes: 9 additions & 10 deletions docs/v0.7/en/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
[PRIMARY KEY(column1, column2, ...)]
) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
[
PARTITION BY RANGE COLUMNS(column1, column2, ...) (
PARTITION r0 VALUES LESS THAN (expr1),
PARTITION r1 VALUES LESS THAN (expr2),
PARTITION ON COLUMNS(column1, column2, ...) (
<PARTITION EXPR>,
...
)
]
Expand Down Expand Up @@ -82,12 +81,12 @@ The statement won't do anything if the table already exists and `IF NOT EXISTS`

Users can add table options by using `WITH`. The valid options contain the following:

| Option | Description | Value |
| ------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` |
| `regions` | The region number of the table | Integer value, such as 1, 5, 10 etc. |
| `write_buffer_size` | Memtable size of the table | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`. |
| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |
| Option | Description | Value |
| ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` |
| `regions` | The region number of the table | Integer value, such as 1, 5, 10 etc. |
| `write_buffer_size` | Memtable size of the table | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`. |
| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |

For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10:

Expand Down Expand Up @@ -156,7 +155,7 @@ Query OK, 0 rows affected (0.01 sec)

### Region partition rules

TODO by MichaelScofield
Please refer to [Partition](/contributor-guide/frontend/table-sharding#partition) for more details.

## CREATE EXTERNAL TABLE

Expand Down
33 changes: 21 additions & 12 deletions docs/v0.7/zh/contributor-guide/frontend/table-sharding.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,32 +8,41 @@

在 GreptimeDB 中,一张表可以通过多种方式横向分区,并且它使用与 MySQL 相同的分区类型(以及相应的语法)。目前,GreptimeDB 支持 “RANGE COLUMNS 分区”。

在 “RANGE COLUMNS 分区”中,每个分区仅包含表中的一部分数据,并按某些列值范围进行分组。例如,我们可以像这样在 GreptimeDB 中对表进行分区:
每个分区仅包含表中的一部分数据,并按某些列值范围进行分组。例如,我们可以使用这样的语法在 GreptimeDB 中对表进行分区:

```sql
CREATE TABLE (...)
PARTITION ON COLUMNS (<COLUMN LIST>) (
<RULE LIST>
);
```

该语法主要包含两部分:
- `PARTITION ON COLUMNS` 后跟随一个使用逗号分隔的列名列表,用于指定哪些列可能会被用于分区。这里指定的分区列表仅作为“白名单”使用,实际上可能只有其中的一部分列会被用于分区。
- `RULE LIST` 是一个包含多个分区规则的列表,每个规则都是一个分区名称和一个分区条件的组合。此处的表达式可使用 `=``!=``>``>=``<``<=``AND`, `OR`,列名和字面量。

下面是一个具体的例子:

```sql
CREATE TABLE my_table (
a INT PRIMARY KEY,
b STRING,
ts TIMESTAMP TIME INDEX,
)
PARTITION BY RANGE COLUMNS (a) (
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (20),
PARTITION p2 VALUES LESS THAN (MAXVALUE),
PARTITION ON COLUMNS (a) (
a < 10,
a >= 10 AND a < 20,
a >= 20,
);
```

我们在上面创建的 `my_table` 有 3 个分区。分区 "p0" 包含了 "a < 10" 的行;分区 "p1" 包含了 "10 <= a < 20" 的行;分区 "p2" 包含了剩下的 "a >= 20" 的所有行。
我们在上面创建的 `my_table` 有 3 个分区。分别是包含了 "a < 10" 的行;包含了 "10 <= a < 20" 的行; "a >= 20" 的所有行。

::: warning 重要

1. 所有分区的范围必须严格递增,并最终以 "`MAXVALUE`" 结尾。
2. 用于分区的列必须是主键。

:::
1. 所有分区的范围不能重叠。
2. 用于分区的列必须是在 `ON COLUMNS` 中指定。

::: tip 注意
目前 "PARTITION BY RANGE" 语法中不支持表达式,只能使用列名。
:::

## Region
Expand Down
23 changes: 11 additions & 12 deletions docs/v0.7/zh/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
[PRIMARY KEY(column1, column2, ...)]
) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
[
PARTITION BY RANGE COLUMNS(column1, column2, ...) (
PARTITION r0 VALUES LESS THAN (expr1),
PARTITION r1 VALUES LESS THAN (expr2),
PARTITION ON COLUMNS(column1, column2, ...) (
<PARTITION EXPR>,
...
)
]
Expand Down Expand Up @@ -83,12 +82,12 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name

用户可以使用 `WITH` 添加表选项。有效的选项包括以下内容:

| 选项 | 描述 ||
| ------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------- |
| `ttl` | 表数据的存储时间 | 字符串值,例如 `'60m'`, `'1h'` 代表 1 小时, `'14d'` 代表 14 天等。支持的时间单位有:`s` / `m` / `h` / `d` |
| `regions` | 表的 region 值 | 整数值,例如 1, 5, 10 etc. |
| `write_buffer_size` | 表的 memtable 大小 | 表示有效大小的字符串值,例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有:`MB` / `GB`. |
| `storage` | 自定义表的存储引擎,存储引擎提供商的名字 | 字符串,类似 `S3``Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商)|
| 选项 | 描述 | |
| ------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ttl` | 表数据的存储时间 | 字符串值,例如 `'60m'`, `'1h'` 代表 1 小时, `'14d'` 代表 14 天等。支持的时间单位有:`s` / `m` / `h` / `d` |
| `regions` | 表的 region 值 | 整数值,例如 1, 5, 10 etc. |
| `write_buffer_size` | 表的 memtable 大小 | 表示有效大小的字符串值,例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有:`MB` / `GB`. |
| `storage` | 自定义表的存储引擎,存储引擎提供商的名字 | 字符串,类似 `S3``Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商) |

例如,创建一个存储数据 TTL(Time-To-Live) 为七天,region 数为 10 的表:

Expand Down Expand Up @@ -157,9 +156,9 @@ CREATE TABLE system_metrics (
Query OK, 0 rows affected (0.01 sec)
```

### Region partition rules
### Region 分区规则

TODO by MichaelScofield
请参考 [分区](/contributor-guide/frontend/table-sharding#partition) 章节.

## CREATE EXTERNAL TABLE

Expand Down Expand Up @@ -196,7 +195,7 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name
| 选项 | 描述 | 是否必需 |
| ---------- | ------------------------------------------------------------------ | -------- |
| `LOCATION` | 外部表的位置,例如 `s3://<bucket>[<path>]`, `/<path>/[<filename>]` | **** |
| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet, ORC | **** |
| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet, ORC | **** |
| `PATTERN` | 使用正则来匹配文件,例如 `*_today.parquet` | 可选 |

#### S3
Expand Down

0 comments on commit 530c943

Please sign in to comment.