docs: update the partition grammar (#888)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com> Co-authored-by: Yiran <cuiyiran3@gmail.com>
GreptimeTeam · Apr 15, 2024 · 530c943 · 530c943
1 parent 8a986bd
commit 530c943
Show file tree

Hide file tree

Showing 4 changed files with 62 additions and 44 deletions.
diff --git a/docs/v0.7/en/contributor-guide/frontend/table-sharding.md b/docs/v0.7/en/contributor-guide/frontend/table-sharding.md
@@ -11,31 +11,42 @@ in OLTP databases.
 In GreptimeDB, a table can be horizontally partitioned in multiple ways and it uses the same
 partitioning types (and corresponding syntax) as in MySQL. Currently, GreptimeDB supports "RANGE COLUMNS partitioning".
 
-In "RANGE COLUMNS partitioning", each partition includes only a portion of the data from the table, and is
+Each partition includes only a portion of the data from the table, and is
 grouped by some column(s) value range. For example, we can partition a table in GreptimeDB like
 this:
 
+```sql
+CREATE TABLE (...)
+PARTITION ON COLUMNS (<COLUMN LIST>) (
+    <RULE LIST>
+);
+```
+
+The syntax mainly consists of two parts:
+- `PARTITION ON COLUMNS` followed by a comma-separated list of column names, which specifies which columns might be used for partitioning. The partition list specified here is only used as an "allow list", and in reality only a portion of the columns specified here will be used for partitioning.
+- `RULE LIST` is a list of multiple partition rules, each of which is a combination of a partition name and a partition condition. The expressions here can use `=`, `!=`, `>`, `>=`, `<`, `<=`, `AND`, `OR`, column name and literals.
+
+Here is a concrete example:
+
 ```sql
 CREATE TABLE my_table (
   a INT PRIMARY KEY,
   b STRING,
   ts TIMESTAMP TIME INDEX,
 )
-PARTITION BY RANGE COLUMNS (a) (
-  PARTITION p0 VALUES LESS THAN (10),
-  PARTITION p1 VALUES LESS THAN (20),
-  PARTITION p2 VALUES LESS THAN (MAXVALUE),
+PARTITION ON COLUMNS (a) (
+  a < 10,
+  a >= 10 AND a < 20,
+  a >= 20,
 );
 ```
 
-`my_table` that we created above has 3 partitions. Partition "p0" contains a portion of data that
-only has rows of column "a < 10"; partition "p1" contains rows of "10 <= a < 20"; partition "p2"
-includes the remaining rows of "a >= 20".
+The above `my_table` has 3 partitions. The first partition contains rows where "a < 10", the second partition contains rows where "10 <= a < 20", and the third partition contains all rows where "a >= 20".
 
 ::: warning Important
 
-1. Value ranges must be strictly increased, and finally ends with "`MAXVALUE`".
-2. The partition column must be a primary key.
+1. The ranges of all partitions must not overlap.
+2. The columns used for partitioning must be specified in `ON COLUMNS`
 
 :::
 

diff --git a/docs/v0.7/en/reference/sql/create.md b/docs/v0.7/en/reference/sql/create.md
@@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
     [PRIMARY KEY(column1, column2, ...)]
 ) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
 [
-  PARTITION BY RANGE COLUMNS(column1, column2, ...) (
-    PARTITION r0 VALUES LESS THAN (expr1),
-    PARTITION r1 VALUES LESS THAN (expr2),
+  PARTITION ON COLUMNS(column1, column2, ...) (
+    <PARTITION EXPR>,
     ...
   )
 ]
@@ -82,12 +81,12 @@ The statement won't do anything if the table already exists and `IF NOT EXISTS`
 
 Users can add table options by using `WITH`. The valid options contain the following:
 
-| Option              | Description                        | Value                                                                                                                                               |
-| ------------------- | ---------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `ttl`               | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d`                        |
-| `regions`           | The region number of the table     | Integer value, such as 1, 5, 10 etc.                                                                                                                |
-| `write_buffer_size` | Memtable size of the table         | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`. |
-| `storage` | The name of the table storage engine provider   | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |
+| Option              | Description                                   | Value                                                                                                                                                                        |
+| ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ttl`               | The storage time of the table data            | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d`                                                 |
+| `regions`           | The region number of the table                | Integer value, such as 1, 5, 10 etc.                                                                                                                                         |
+| `write_buffer_size` | Memtable size of the table                    | String value representing a valid size, such as `32MB`, `128MB`, etc. The default value of this option is `32MB`. Supported units are: `MB` / `GB`.                          |
+| `storage`           | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration#storage-engine-provider). |
 
 For example, to create a table with the storage data TTL(Time-To-Live) is seven days and region number is 10:
 
@@ -156,7 +155,7 @@ Query OK, 0 rows affected (0.01 sec)
 
 ### Region partition rules
 
-TODO by MichaelScofield
+Please refer to [Partition](/contributor-guide/frontend/table-sharding#partition) for more details.
 
 ## CREATE EXTERNAL TABLE
 

diff --git a/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md b/docs/v0.7/zh/contributor-guide/frontend/table-sharding.md
@@ -8,32 +8,41 @@
 
 在 GreptimeDB 中，一张表可以通过多种方式横向分区，并且它使用与 MySQL 相同的分区类型（以及相应的语法）。目前，GreptimeDB 支持 “RANGE COLUMNS 分区”。
 
-在 “RANGE COLUMNS 分区”中，每个分区仅包含表中的一部分数据，并按某些列值范围进行分组。例如，我们可以像这样在 GreptimeDB 中对表进行分区：
+每个分区仅包含表中的一部分数据，并按某些列值范围进行分组。例如，我们可以使用这样的语法在 GreptimeDB 中对表进行分区：
+
+```sql
+CREATE TABLE (...)
+PARTITION ON COLUMNS (<COLUMN LIST>) (
+    <RULE LIST>
+);
+```
+
+该语法主要包含两部分：
+- `PARTITION ON COLUMNS` 后跟随一个使用逗号分隔的列名列表，用于指定哪些列可能会被用于分区。这里指定的分区列表仅作为“白名单”使用，实际上可能只有其中的一部分列会被用于分区。
+- `RULE LIST` 是一个包含多个分区规则的列表，每个规则都是一个分区名称和一个分区条件的组合。此处的表达式可使用 `=`，`!=`，`>`，`>=`，`<`，`<=`，`AND`, `OR`，列名和字面量。
+
+下面是一个具体的例子：
 
 ```sql
 CREATE TABLE my_table (
   a INT PRIMARY KEY,
   b STRING,
   ts TIMESTAMP TIME INDEX,
 )
-PARTITION BY RANGE COLUMNS (a) (
-  PARTITION p0 VALUES LESS THAN (10),
-  PARTITION p1 VALUES LESS THAN (20),
-  PARTITION p2 VALUES LESS THAN (MAXVALUE),
+PARTITION ON COLUMNS (a) (
+  a < 10,
+  a >= 10 AND a < 20,
+  a >= 20,
 );
 ```
 
-我们在上面创建的 `my_table` 有 3 个分区。分区 "p0" 包含了 "a < 10" 的行；分区 "p1" 包含了 "10 <= a < 20" 的行；分区 "p2" 包含了剩下的 "a >= 20" 的所有行。
+我们在上面创建的 `my_table` 有 3 个分区。分别是包含了 "a < 10" 的行；包含了 "10 <= a < 20" 的行；和 "a >= 20" 的所有行。
 
 ::: warning 重要
 
-1. 所有分区的范围必须严格递增，并最终以 "`MAXVALUE`" 结尾。
-2. 用于分区的列必须是主键。
-
-:::
+1. 所有分区的范围不能重叠。
+2. 用于分区的列必须是在 `ON COLUMNS` 中指定。
 
-::: tip 注意
-目前 "PARTITION BY RANGE" 语法中不支持表达式，只能使用列名。
 :::
 
 ## Region

diff --git a/docs/v0.7/zh/reference/sql/create.md b/docs/v0.7/zh/reference/sql/create.md
@@ -52,9 +52,8 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
     [PRIMARY KEY(column1, column2, ...)]
 ) ENGINE = engine WITH([TTL | REGIONS] = expr, ...)
 [
-  PARTITION BY RANGE COLUMNS(column1, column2, ...) (
-    PARTITION r0 VALUES LESS THAN (expr1),
-    PARTITION r1 VALUES LESS THAN (expr2),
+  PARTITION ON COLUMNS(column1, column2, ...) (
+    <PARTITION EXPR>,
     ...
   )
 ]
@@ -83,12 +82,12 @@ CREATE TABLE [IF NOT EXISTS] [db.]table_name
 
 用户可以使用 `WITH` 添加表选项。有效的选项包括以下内容：
 
-| 选项                | 描述               | 值                                                                                                         |
-| ------------------- | ------------------ | ---------------------------------------------------------------------------------------------------------- |
-| `ttl`               | 表数据的存储时间   | 字符串值，例如 `'60m'`, `'1h'` 代表 1 小时， `'14d'` 代表 14 天等。支持的时间单位有：`s` / `m` / `h` / `d` |
-| `regions`           | 表的 region 值     | 整数值，例如 1, 5, 10 etc.                                                                                 |
-| `write_buffer_size` | 表的 memtable 大小 | 表示有效大小的字符串值，例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有：`MB` / `GB`.               |
-| `storage` |  自定义表的存储引擎，存储引擎提供商的名字  |  字符串，类似 `S3`、`Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商)。|
+| 选项                | 描述                                     | 值                                                                                                                                                   |
+| ------------------- | ---------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `ttl`               | 表数据的存储时间                         | 字符串值，例如 `'60m'`, `'1h'` 代表 1 小时， `'14d'` 代表 14 天等。支持的时间单位有：`s` / `m` / `h` / `d`                                           |
+| `regions`           | 表的 region 值                           | 整数值，例如 1, 5, 10 etc.                                                                                                                           |
+| `write_buffer_size` | 表的 memtable 大小                       | 表示有效大小的字符串值，例如 `32MB`, `128MB` 等。默认值为 `32MB`。支持的单位有：`MB` / `GB`.                                                         |
+| `storage`           | 自定义表的存储引擎，存储引擎提供商的名字 | 字符串，类似 `S3`、`Gcs` 等。 必须在 `[[storage.providers]]` 列表里配置, 参考 [configuration](/user-guide/operations/configuration#存储引擎提供商)。 |
 
 例如，创建一个存储数据 TTL(Time-To-Live) 为七天，region 数为 10 的表：
 
@@ -157,9 +156,9 @@ CREATE TABLE system_metrics (
 Query OK, 0 rows affected (0.01 sec)
 ```
 
-### Region partition rules
+### Region 分区规则
 
-TODO by MichaelScofield
+请参考 [分区](/contributor-guide/frontend/table-sharding#partition) 章节.
 
 ## CREATE EXTERNAL TABLE
 
@@ -196,7 +195,7 @@ CREATE EXTERNAL TABLE [IF NOT EXISTS] [db.]table_name
 | 选项       | 描述                                                               | 是否必需 |
 | ---------- | ------------------------------------------------------------------ | -------- |
 | `LOCATION` | 外部表的位置，例如 `s3://<bucket>[<path>]`, `/<path>/[<filename>]` | **是**   |
-| `FORMAT`   | 目标文件的格式，例如 JSON，CSV，Parquet, ORC                         | **是**   |
+| `FORMAT`   | 目标文件的格式，例如 JSON，CSV，Parquet, ORC                       | **是**   |
 | `PATTERN`  | 使用正则来匹配文件，例如 `*_today.parquet`                         | 可选     |
 
 #### S3