diff --git a/docs/v0.7/en/contributor-guide/frontend/table-sharding.md b/docs/v0.7/en/contributor-guide/frontend/table-sharding.md index 65c2f97e3..3d8a5afe7 100644 --- a/docs/v0.7/en/contributor-guide/frontend/table-sharding.md +++ b/docs/v0.7/en/contributor-guide/frontend/table-sharding.md @@ -11,31 +11,42 @@ in OLTP databases. In GreptimeDB, a table can be horizontally partitioned in multiple ways and it uses the same partitioning types (and corresponding syntax) as in MySQL. Currently, GreptimeDB supports "RANGE COLUMNS partitioning". -In "RANGE COLUMNS partitioning", each partition includes only a portion of the data from the table, and is +Each partition includes only a portion of the data from the table, and is grouped by some column(s) value range. For example, we can partition a table in GreptimeDB like this: +```sql +CREATE TABLE (...) +PARTITION ON COLUMNS () ( + +); +``` + +The syntax mainly consists of two parts: +- `PARTITION ON COLUMNS` followed by a comma-separated list of column names, which specifies which columns might be used for partitioning. The partition list specified here is only used as an "allow list", and in reality only a portion of the columns specified here will be used for partitioning. +- `RULE LIST` is a list of multiple partition rules, each of which is a combination of a partition name and a partition condition. The expressions here can use `=`, `!=`, `>`, `>=`, `<`, `<=`, `AND`, `OR`, column name and literals. + +Here is a concrete example: + ```sql CREATE TABLE my_table ( a INT PRIMARY KEY, b STRING, ts TIMESTAMP TIME INDEX, ) -PARTITION BY RANGE COLUMNS (a) ( - PARTITION p0 VALUES LESS THAN (10), - PARTITION p1 VALUES LESS THAN (20), - PARTITION p2 VALUES LESS THAN (MAXVALUE), +PARTITION ON COLUMNS (a) ( + a < 10, + a >= 10 AND a < 20, + a >= 20, ); ``` -`my_table` that we created above has 3 partitions. Partition "p0" contains a portion of data that -only has rows of column "a < 10"; partition "p1" contains rows of "10 <= a < 20"; partition "p2" -includes the remaining rows of "a >= 20". +The above `my_table` has 3 partitions. The first partition contains rows where "a < 10", the second partition contains rows where "10 <= a < 20", and the third partition contains all rows where "a >= 20". ::: warning Important -1. Value ranges must be strictly increased, and finally ends with "`MAXVALUE`". -2. The partition column must be a primary key. +1. The ranges of all partitions must not overlap. +2. The columns used for partitioning must be specified in `ON COLUMNS` ::: diff --git a/docs/v0.7/en/reference/sql/create.md b/docs/v0.7/en/reference/sql/create.md index d22b703a6..fee4576c5 100644 --- a/docs/v0.7/en/reference/sql/create.md +++ b/docs/v0.7/en/reference/sql/create.md @@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [PRIMARY KEY(column1, column2, ...)] ) ENGINE = engine WITH([TTL | REGIONS] = expr, ...) [ - PARTITION BY RANGE COLUMNS(column1, column2, ...) ( - PARTITION r0 VALUES LESS THAN (expr1), - PARTITION r1 VALUES LESS THAN (expr2), + PARTITION ON COLUMNS(column1, column2, ...) ( + , ... ) ] @@ -82,12 +81,12 @@ The statement won't do anything if the table already exists and `IF NOT EXISTS` Users can add table options by using `WITH`. The valid options contain the following: -| Option | Description | Value | -| ------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- | -| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` | -| `regions` | The region number of the table | Integer value, such as 1, 5, 10 etc. | -| `write_buffer_size` | Memtable size of the table | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`. | -| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). | +| Option | Description | Value | +| ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` | +| `regions` | The region number of the table | Integer value, such as 1, 5, 10 etc. | +| `write_buffer_size` | Memtable size of the table | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`. | +| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). | For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10: @@ -156,7 +155,7 @@ Query OK, 0 rows affected (0.01 sec) ### Region partition rules -TODO by MichaelScofield +Please refer to [Partition](/contributor-guide/frontend/table-sharding#partition) for more details. ## CREATE EXTERNAL TABLE diff --git a/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md b/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md index 718cd6adb..5e853bcbf 100644 --- a/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md +++ b/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md @@ -8,7 +8,20 @@ 在 GreptimeDB 中,一张表可以通过多种方式横向分区,并且它使用与 MySQL 相同的分区类型(以及相应的语法)。目前,GreptimeDB 支持 “RANGE COLUMNS 分区”。 -在 “RANGE COLUMNS 分区”中,每个分区仅包含表中的一部分数据,并按某些列值范围进行分组。例如,我们可以像这样在 GreptimeDB 中对表进行分区: +每个分区仅包含表中的一部分数据,并按某些列值范围进行分组。例如,我们可以使用这样的语法在 GreptimeDB 中对表进行分区: + +```sql +CREATE TABLE (...) +PARTITION ON COLUMNS () ( + +); +``` + +该语法主要包含两部分: +- `PARTITION ON COLUMNS` 后跟随一个使用逗号分隔的列名列表,用于指定哪些列可能会被用于分区。这里指定的分区列表仅作为“白名单”使用,实际上可能只有其中的一部分列会被用于分区。 +- `RULE LIST` 是一个包含多个分区规则的列表,每个规则都是一个分区名称和一个分区条件的组合。此处的表达式可使用 `=`,`!=`,`>`,`>=`,`<`,`<=`,`AND`, `OR`,列名和字面量。 + +下面是一个具体的例子: ```sql CREATE TABLE my_table ( @@ -16,24 +29,20 @@ CREATE TABLE my_table ( b STRING, ts TIMESTAMP TIME INDEX, ) -PARTITION BY RANGE COLUMNS (a) ( - PARTITION p0 VALUES LESS THAN (10), - PARTITION p1 VALUES LESS THAN (20), - PARTITION p2 VALUES LESS THAN (MAXVALUE), +PARTITION ON COLUMNS (a) ( + a < 10, + a >= 10 AND a < 20, + a >= 20, ); ``` -我们在上面创建的 `my_table` 有 3 个分区。分区 "p0" 包含了 "a < 10" 的行;分区 "p1" 包含了 "10 <= a < 20" 的行;分区 "p2" 包含了剩下的 "a >= 20" 的所有行。 +我们在上面创建的 `my_table` 有 3 个分区。分别是包含了 "a < 10" 的行;包含了 "10 <= a < 20" 的行;和 "a >= 20" 的所有行。 ::: warning 重要 -1. 所有分区的范围必须严格递增,并最终以 "`MAXVALUE`" 结尾。 -2. 用于分区的列必须是主键。 - -::: +1. 所有分区的范围不能重叠。 +2. 用于分区的列必须是在 `ON COLUMNS` 中指定。 -::: tip 注意 -目前 "PARTITION BY RANGE" 语法中不支持表达式,只能使用列名。 ::: ## Region diff --git a/docs/v0.7/zh/reference/sql/create.md b/docs/v0.7/zh/reference/sql/create.md index 1988b3cbc..f18e3efda 100644 --- a/docs/v0.7/zh/reference/sql/create.md +++ b/docs/v0.7/zh/reference/sql/create.md @@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name [PRIMARY KEY(column1, column2, ...)] ) ENGINE = engine WITH([TTL | REGIONS] = expr, ...) [ - PARTITION BY RANGE COLUMNS(column1, column2, ...) ( - PARTITION r0 VALUES LESS THAN (expr1), - PARTITION r1 VALUES LESS THAN (expr2), + PARTITION ON COLUMNS(column1, column2, ...) ( + , ... ) ] @@ -83,12 +82,12 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name 用户可以使用 `WITH` 添加表选项。有效的选项包括以下内容: -| 选项 | 描述 | 值 | -| ------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------- | -| `ttl` | 表数据的存储时间 | 字符串值,例如 `'60m'`, `'1h'` 代表 1 小时, `'14d'` 代表 14 天等。支持的时间单位有:`s` / `m` / `h` / `d` | -| `regions` | 表的 region 值 | 整数值,例如 1, 5, 10 etc. | -| `write_buffer_size` | 表的 memtable 大小 | 表示有效大小的字符串值,例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有:`MB` / `GB`. | -| `storage` | 自定义表的存储引擎,存储引擎提供商的名字 | 字符串,类似 `S3`、`Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商)。| +| 选项 | 描述 | 值 | +| ------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | +| `ttl` | 表数据的存储时间 | 字符串值,例如 `'60m'`, `'1h'` 代表 1 小时, `'14d'` 代表 14 天等。支持的时间单位有:`s` / `m` / `h` / `d` | +| `regions` | 表的 region 值 | 整数值,例如 1, 5, 10 etc. | +| `write_buffer_size` | 表的 memtable 大小 | 表示有效大小的字符串值,例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有:`MB` / `GB`. | +| `storage` | 自定义表的存储引擎,存储引擎提供商的名字 | 字符串,类似 `S3`、`Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商)。 | 例如,创建一个存储数据 TTL(Time-To-Live) 为七天,region 数为 10 的表: @@ -157,9 +156,9 @@ CREATE TABLE system_metrics ( Query OK, 0 rows affected (0.01 sec) ``` -### Region partition rules +### Region 分区规则 -TODO by MichaelScofield +请参考 [分区](/contributor-guide/frontend/table-sharding#partition) 章节. ## CREATE EXTERNAL TABLE @@ -196,7 +195,7 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name | 选项 | 描述 | 是否必需 | | ---------- | ------------------------------------------------------------------ | -------- | | `LOCATION` | 外部表的位置,例如 `s3://[]`, `//[]` | **是** | -| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet, ORC | **是** | +| `FORMAT` | 目标文件的格式,例如 JSON,CSV,Parquet, ORC | **是** | | `PATTERN` | 使用正则来匹配文件,例如 `*_today.parquet` | 可选 | #### S3