Skip to content

Commit

Permalink
docs: remove scan parallelism and add pk guide (#1375)
Browse files Browse the repository at this point in the history
Co-authored-by: scl <[email protected]>
  • Loading branch information
evenyag and sunchanglong authored Dec 13, 2024
1 parent 8d5efed commit 20d22d3
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 72 deletions.
42 changes: 17 additions & 25 deletions docs/user-guide/administration/performance-tuning-tips.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Tips for tuning GreptimeDB performance, including query optimization, caching, enlarging cache size, scan parallelism, and using append-only tables. Also covers metrics for diagnosing query and ingestion issues.
description: Tips for tuning GreptimeDB performance, including query optimization, caching, enlarging cache size, primary keys, and using append-only tables. Also covers metrics for diagnosing query and ingestion issues.
---

# Performance Tuning Tips
Expand All @@ -20,14 +20,14 @@ The following metrics help diagnose query performance issues:
| greptime_mito_cache_hit | counter | Total count of cache hit |
| greptime_mito_cache_miss | counter | Total count of cache miss |


### Using cache for object stores

It's highly recommended to enable the object store read cache and the write cache in the storage engine. This could reduce query time by more than 10 times.

> Note: Starting from v0.11, when using remote object storage services, local caching (both read and write) is enabled by default. In most cases, you only need to adjust the cache capacity according to your needs.
The read cache stores objects or ranges on the local disk to avoid fetching the same range from the remote again. The following example shows how to enable the read cache for S3.

- The `cache_path` is the directory to store cached objects, defaults to `{data_home}/object_cache/read` since `v0.11`.
- The `cache_capacity` is the capacity of the cache, defaults to `5Gib` since `v0.11`. It's recommended to leave at least 1/10 of the total disk space for it.

Expand All @@ -45,12 +45,12 @@ cache_capacity = "10G"
```

The write cache acts as a write-through cache that stores files on the local disk before uploading them to the object store. This reduces the first query latency. The following example shows how to enable the write cache.

- The `enable_experimental_write_cache` flag enables the write cache, enabled by default when configuring remote object stores since `v0.11`.
- The `experimental_write_cache_size` sets the capacity of the cache, defaults to `5Gib` since `v0.11`.
- The `experimental_write_cache_path` sets the path to store cached files, defaults to `{data_home}/object_cache/write` since `v0.11`.
- The `experimental_write_cache_ttl` sets the TTL of the cached files.


```toml
[[region_engine]]
[region_engine.mito]
Expand Down Expand Up @@ -90,62 +90,54 @@ staging_size = "10GB"
```

Some tips:

- 1/10 of disk space for the `experimental_write_cache_size` at least
- 1/4 of total memory for the `page_cache_size` at least if the memory usage is under 20%
- Double the cache size if the cache hit ratio is less than 50%
- If using full-text index, leave 1/10 of disk space for the `staging_size` at least

### Avoid adding high cardinality columns to the primary key

### Enlarging scan parallelism

The storage engine limits the number of concurrent scan tasks to 1/4 of CPU cores for each query. Enlarging the parallelism can reduce the query latency if the machine's workload is relatively low.

```toml
[[region_engine]]
[region_engine.mito]
scan_parallelism = 8
```
Putting high cardinality columns, such as `trace_id` or `uuid`, into the primary key can negatively impact both write and query performance. Instead, consider using an [append-only table](/reference/sql/create.md#create-an-append-only-table) and setting these high cardinality columns as fields.

### Using append-only table if possible

In general, append-only tables have a higher scan performance as the storage engine can skip merging and deduplication. What's more, the query engine can use statistics to speed up some queries if the table is append-only.

We recommend enabling the [append_mode](/reference/sql/create.md##create-an-append-only-table) for the table if it doesn't require deduplication or performance is prioritized over deduplication. For example, a log table should be append-only as log messages may have the same timestamp.
We recommend enabling the [append_mode](/reference/sql/create.md#create-an-append-only-table) for the table if it doesn't require deduplication or performance is prioritized over deduplication. For example, a log table should be append-only as log messages may have the same timestamp.

## Ingestion

### Metrics

The following metrics help diagnose ingestion issues:

| Metric | Type | Description |
|---|---|---|
| greptime_mito_write_stage_elapsed_bucket | histogram | The elapsed time of different phases of processing a write request in the storage engine |
| greptime_mito_write_buffer_bytes | gauge | The current estimated bytes allocated for the write buffer (memtables). |
| greptime_mito_write_rows_total | counter | The number of rows written to the storage engine |
| greptime_mito_write_stall_total | gauge | The number of rows currently stalled due to high memory pressure |
| greptime_mito_write_reject_total | counter | The number of rows rejected due to high memory pressure |
| raft_engine_sync_log_duration_seconds_bucket | histogram | The elapsed time of flushing the WAL to the disk |
| greptime_mito_flush_elapsed | histogram | The elapsed time of flushing the SST files |

| Metric | Type | Description |
| -------------------------------------------- | --------- | ---------------------------------------------------------------------------------------- |
| greptime_mito_write_stage_elapsed_bucket | histogram | The elapsed time of different phases of processing a write request in the storage engine |
| greptime_mito_write_buffer_bytes | gauge | The current estimated bytes allocated for the write buffer (memtables). |
| greptime_mito_write_rows_total | counter | The number of rows written to the storage engine |
| greptime_mito_write_stall_total | gauge | The number of rows currently stalled due to high memory pressure |
| greptime_mito_write_reject_total | counter | The number of rows rejected due to high memory pressure |
| raft_engine_sync_log_duration_seconds_bucket | histogram | The elapsed time of flushing the WAL to the disk |
| greptime_mito_flush_elapsed | histogram | The elapsed time of flushing the SST files |

### Batching rows

Batching means sending multiple rows to the database over the same request. This can significantly improve ingestion throughput. A recommended starting point is 1000 rows per batch. You can enlarge the batch size if latency and resource usage are still acceptable.

### Writing by time window

Although GreptimeDB can handle out-of-order data, it still affects performance. GreptimeDB infers a time window size from ingested data and partitions the data into multiple time windows according to their timestamps. If the written rows are not within the same time window, GreptimeDB needs to split them, which affects write performance.

Generally, real-time data doesn't have the issues mentioned above as they always use the latest timestamp. If you need to import data with a long time range into the database, we recommend creating the table in advance and [specifying the compaction.twcs.time_window option](/reference/sql/create.md#create-a-table-with-custom-compaction-options).


## Schema

### Using multiple fields

While designing the schema, we recommend putting related metrics that can be collected together in the same table. This can also improve the write throughput and compression ratio.


For example, the following three tables collect the CPU usage metrics.

```sql
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ GreptimeDB 提供了各种指标来帮助监控和排查性能问题。官方仓
| greptime_mito_cache_miss | counter | 缓存未命中总数 |


## 为对象存储开启缓存
### 为对象存储开启缓存

我们推荐在使用对象存储时启用读取缓存和写入缓存。这可以将查询耗时缩短 10 倍以上。

Expand Down Expand Up @@ -60,7 +60,7 @@ experimental_write_cache_ttl = "8h"
# experimental_write_cache_path = "/path/to/write/cache"
```

## 增大缓存大小
### 增大缓存大小

可以监控 `greptime_mito_cache_bytes``greptime_mito_cache_miss` 指标来确定是否需要增加缓存大小。这些指标中的 `type` 标签表示缓存的类型。

Expand Down Expand Up @@ -95,21 +95,17 @@ staging_size = "10GB"
- 如果缓存命中率低于 50%,则可以将缓存大小翻倍
- 如果使用全文索引,至少将 `staging_size` 设置为磁盘空间的 1/10

## 扩大扫描并行度

存储引擎将每个查询的并发扫描任务数限制为 CPU 内核数的 1/4。如果机器的工作负载相对较低,扩大并行度可以减少查询延迟。
### 避免将高基数的列放到主键中

将高基数的列,如 `trace_id``uuid` 等列设置为主键会降低写入和查询的性能。建议建表时使用 [append-only](/reference/sql/create.md#create-an-append-only-table) 表并将这些高基数的列设置为 fields。

```toml
[[region_engine]]
[region_engine.mito]
scan_parallelism = 8
```

## 尽可能使用 append-only 表
### 尽可能使用 append-only 表

一般来说,append-only 表具有更高的扫描性能,因为存储引擎可以跳过合并和去重操作。此外,如果表是 append-only 表,查询引擎可以使用统计信息来加速某些查询。

如果表不需要去重或性能优先于去重,我们建议为表启用 [append_mode](/reference/sql/create.md##create-an-append-only-table)。例如,日志表应该是 append-only 表,因为日志消息可能具有相同的时间戳。
如果表不需要去重或性能优先于去重,我们建议为表启用 [append_mode](/reference/sql/create.md#create-an-append-only-table)。例如,日志表应该是 append-only 表,因为日志消息可能具有相同的时间戳。


## 写入
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ GreptimeDB 提供了各种指标来帮助监控和排查性能问题。官方仓
| greptime_mito_cache_miss | counter | 缓存未命中总数 |


## 为对象存储开启缓存
### 为对象存储开启缓存

我们推荐在使用对象存储时启用读取缓存和写入缓存。这可以将查询耗时缩短 10 倍以上。

Expand Down Expand Up @@ -60,7 +60,7 @@ experimental_write_cache_ttl = "8h"
# experimental_write_cache_path = "/path/to/write/cache"
```

## 增大缓存大小
### 增大缓存大小

可以监控 `greptime_mito_cache_bytes``greptime_mito_cache_miss` 指标来确定是否需要增加缓存大小。这些指标中的 `type` 标签表示缓存的类型。

Expand Down Expand Up @@ -95,21 +95,17 @@ staging_size = "10GB"
- 如果缓存命中率低于 50%,则可以将缓存大小翻倍
- 如果使用全文索引,至少将 `staging_size` 设置为磁盘空间的 1/10

## 扩大扫描并行度

存储引擎将每个查询的并发扫描任务数限制为 CPU 内核数的 1/4。如果机器的工作负载相对较低,扩大并行度可以减少查询延迟。
### 避免将高基数的列放到主键中

将高基数的列,如 `trace_id``uuid` 等列设置为主键会降低写入和查询的性能。建议建表时使用 [append-only](/reference/sql/create.md#create-an-append-only-table) 表并将这些高基数的列设置为 fields。

```toml
[[region_engine]]
[region_engine.mito]
scan_parallelism = 8
```

## 尽可能使用 append-only 表
### 尽可能使用 append-only 表

一般来说,append-only 表具有更高的扫描性能,因为存储引擎可以跳过合并和去重操作。此外,如果表是 append-only 表,查询引擎可以使用统计信息来加速某些查询。

如果表不需要去重或性能优先于去重,我们建议为表启用 [append_mode](/reference/sql/create.md##create-an-append-only-table)。例如,日志表应该是 append-only 表,因为日志消息可能具有相同的时间戳。
如果表不需要去重或性能优先于去重,我们建议为表启用 [append_mode](/reference/sql/create.md#create-an-append-only-table)。例如,日志表应该是 append-only 表,因为日志消息可能具有相同的时间戳。


## 写入
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Tips for tuning GreptimeDB performance, including query optimization, caching, enlarging cache size, scan parallelism, and using append-only tables. Also covers metrics for diagnosing query and ingestion issues.
description: Tips for tuning GreptimeDB performance, including query optimization, caching, enlarging cache size, primary keys, and using append-only tables. Also covers metrics for diagnosing query and ingestion issues.
---

# Performance Tuning Tips
Expand All @@ -20,14 +20,14 @@ The following metrics help diagnose query performance issues:
| greptime_mito_cache_hit | counter | Total count of cache hit |
| greptime_mito_cache_miss | counter | Total count of cache miss |


### Using cache for object stores

It's highly recommended to enable the object store read cache and the write cache in the storage engine. This could reduce query time by more than 10 times.

> Note: Starting from v0.11, when using remote object storage services, local caching (both read and write) is enabled by default. In most cases, you only need to adjust the cache capacity according to your needs.
The read cache stores objects or ranges on the local disk to avoid fetching the same range from the remote again. The following example shows how to enable the read cache for S3.

- The `cache_path` is the directory to store cached objects, defaults to `{data_home}/object_cache/read` since `v0.11`.
- The `cache_capacity` is the capacity of the cache, defaults to `5Gib` since `v0.11`. It's recommended to leave at least 1/10 of the total disk space for it.

Expand All @@ -45,12 +45,12 @@ cache_capacity = "10G"
```

The write cache acts as a write-through cache that stores files on the local disk before uploading them to the object store. This reduces the first query latency. The following example shows how to enable the write cache.

- The `enable_experimental_write_cache` flag enables the write cache, enabled by default when configuring remote object stores since `v0.11`.
- The `experimental_write_cache_size` sets the capacity of the cache, defaults to `5Gib` since `v0.11`.
- The `experimental_write_cache_path` sets the path to store cached files, defaults to `{data_home}/object_cache/write` since `v0.11`.
- The `experimental_write_cache_ttl` sets the TTL of the cached files.


```toml
[[region_engine]]
[region_engine.mito]
Expand Down Expand Up @@ -90,62 +90,54 @@ staging_size = "10GB"
```

Some tips:

- 1/10 of disk space for the `experimental_write_cache_size` at least
- 1/4 of total memory for the `page_cache_size` at least if the memory usage is under 20%
- Double the cache size if the cache hit ratio is less than 50%
- If using full-text index, leave 1/10 of disk space for the `staging_size` at least

### Avoid adding high cardinality columns to the primary key

### Enlarging scan parallelism

The storage engine limits the number of concurrent scan tasks to 1/4 of CPU cores for each query. Enlarging the parallelism can reduce the query latency if the machine's workload is relatively low.

```toml
[[region_engine]]
[region_engine.mito]
scan_parallelism = 8
```
Putting high cardinality columns, such as `trace_id` or `uuid`, into the primary key can negatively impact both write and query performance. Instead, consider using an [append-only table](/reference/sql/create.md#create-an-append-only-table) and setting these high cardinality columns as fields.

### Using append-only table if possible

In general, append-only tables have a higher scan performance as the storage engine can skip merging and deduplication. What's more, the query engine can use statistics to speed up some queries if the table is append-only.

We recommend enabling the [append_mode](/reference/sql/create.md##create-an-append-only-table) for the table if it doesn't require deduplication or performance is prioritized over deduplication. For example, a log table should be append-only as log messages may have the same timestamp.
We recommend enabling the [append_mode](/reference/sql/create.md#create-an-append-only-table) for the table if it doesn't require deduplication or performance is prioritized over deduplication. For example, a log table should be append-only as log messages may have the same timestamp.

## Ingestion

### Metrics

The following metrics help diagnose ingestion issues:

| Metric | Type | Description |
|---|---|---|
| greptime_mito_write_stage_elapsed_bucket | histogram | The elapsed time of different phases of processing a write request in the storage engine |
| greptime_mito_write_buffer_bytes | gauge | The current estimated bytes allocated for the write buffer (memtables). |
| greptime_mito_write_rows_total | counter | The number of rows written to the storage engine |
| greptime_mito_write_stall_total | gauge | The number of rows currently stalled due to high memory pressure |
| greptime_mito_write_reject_total | counter | The number of rows rejected due to high memory pressure |
| raft_engine_sync_log_duration_seconds_bucket | histogram | The elapsed time of flushing the WAL to the disk |
| greptime_mito_flush_elapsed | histogram | The elapsed time of flushing the SST files |

| Metric | Type | Description |
| -------------------------------------------- | --------- | ---------------------------------------------------------------------------------------- |
| greptime_mito_write_stage_elapsed_bucket | histogram | The elapsed time of different phases of processing a write request in the storage engine |
| greptime_mito_write_buffer_bytes | gauge | The current estimated bytes allocated for the write buffer (memtables). |
| greptime_mito_write_rows_total | counter | The number of rows written to the storage engine |
| greptime_mito_write_stall_total | gauge | The number of rows currently stalled due to high memory pressure |
| greptime_mito_write_reject_total | counter | The number of rows rejected due to high memory pressure |
| raft_engine_sync_log_duration_seconds_bucket | histogram | The elapsed time of flushing the WAL to the disk |
| greptime_mito_flush_elapsed | histogram | The elapsed time of flushing the SST files |

### Batching rows

Batching means sending multiple rows to the database over the same request. This can significantly improve ingestion throughput. A recommended starting point is 1000 rows per batch. You can enlarge the batch size if latency and resource usage are still acceptable.

### Writing by time window

Although GreptimeDB can handle out-of-order data, it still affects performance. GreptimeDB infers a time window size from ingested data and partitions the data into multiple time windows according to their timestamps. If the written rows are not within the same time window, GreptimeDB needs to split them, which affects write performance.

Generally, real-time data doesn't have the issues mentioned above as they always use the latest timestamp. If you need to import data with a long time range into the database, we recommend creating the table in advance and [specifying the compaction.twcs.time_window option](/reference/sql/create.md#create-a-table-with-custom-compaction-options).


## Schema

### Using multiple fields

While designing the schema, we recommend putting related metrics that can be collected together in the same table. This can also improve the write throughput and compression ratio.


For example, the following three tables collect the CPU usage metrics.

```sql
Expand Down

0 comments on commit 20d22d3

Please sign in to comment.