Skip to content

Commit

Permalink
docs: update object storage caching docs (#1366)
Browse files Browse the repository at this point in the history
Co-authored-by: Yiran <[email protected]>
  • Loading branch information
killme2008 and nicecui authored Dec 10, 2024
1 parent c5c23f7 commit 5ec4382
Show file tree
Hide file tree
Showing 4 changed files with 63 additions and 18 deletions.
12 changes: 7 additions & 5 deletions docs/user-guide/administration/performance-tuning-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ The following metrics help diagnose query performance issues:

It's highly recommended to enable the object store read cache and the write cache in the storage engine. This could reduce query time by more than 10 times.

> Note: Starting from v0.11, when using remote object storage services, local caching (both read and write) is enabled by default. In most cases, you only need to adjust the cache capacity according to your needs.
The read cache stores objects or ranges on the local disk to avoid fetching the same range from the remote again. The following example shows how to enable the read cache for S3.
- The `cache_path` is the directory to store cached objects.
- The `cache_capacity` is the capacity of the cache. It's recommended to leave at least 1/10 of the total disk space for it.
- The `cache_path` is the directory to store cached objects, defaults to `{data_home}/object_cache/read` since `v0.11`.
- The `cache_capacity` is the capacity of the cache, defaults to `5Gib` since `v0.11`. It's recommended to leave at least 1/10 of the total disk space for it.

```toml
[storage]
Expand All @@ -43,9 +45,9 @@ cache_capacity = "10G"
```

The write cache acts as a write-through cache that stores files on the local disk before uploading them to the object store. This reduces the first query latency. The following example shows how to enable the write cache.
- The `enable_experimental_write_cache` flag enables the write cache
- The `experimental_write_cache_size` sets the capacity of the cache
- The `experimental_write_cache_path` sets the path to store cached files. It is under the data home by default.
- The `enable_experimental_write_cache` flag enables the write cache, enabled by default when configuring remote object stores since `v0.11`.
- The `experimental_write_cache_size` sets the capacity of the cache, defaults to `5Gib` since `v0.11`.
- The `experimental_write_cache_path` sets the path to store cached files, defaults to `{data_home}/object_cache/write` since `v0.11`.
- The `experimental_write_cache_ttl` sets the TTL of the cached files.


Expand Down
28 changes: 24 additions & 4 deletions docs/user-guide/deployments/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -299,7 +299,11 @@ For storage from the same provider, if you want to use different S3 buckets as s

### Object storage cache

When using S3, OSS or Azure Blob Storage, it's better to enable object storage caching for speedup data querying:
When using remote storage services like AWS S3, Alibaba Cloud OSS, or Azure Blob Storage, fetching data during queries can be time-consuming. To address this, GreptimeDB provides a local cache mechanism to speed up repeated data access.

Since version `v0.11`, GreptimeDB enables local file caching for remote object storage by default. The default cache directory is located at `{data_home}/object_cache`, with both read and write cache capacity set to `5GiB`.

For versions before v0.11, you need to manually enable the read cache by configuring `cache_path` in the storage settings:

```toml
[storage]
Expand All @@ -309,11 +313,27 @@ root = "/greptimedb"
access_key_id = "<access key id>"
secret_access_key = "<secret access key>"
## Enable object storage caching
cache_path = "/var/data/s3_local_cache"
cache_capacity = "256MiB"
cache_path = "/var/data/s3_read_cache"
cache_capacity = "5Gib"
```

The `cache_path` is the local file directory that keeps cache files, and the `cache_capacity` is the maximum total file size in the cache directory.
The `cache_path` specifies the local directory for storing cache files, while `cache_capacity` determines the maximum total file size allowed in the cache directory in bytes. You can disable the read cache by setting `cache_path` to an empty string.

For write cache in versions before v0.11, you need to enable it by setting `enable_experimental_write_cache` to `true` in the `[region_engine.mito]` section:

```toml
[[region_engine]]
[region_engine.mito]

enable_experimental_write_cache = true
experimental_write_cache_path = "/var/data/s3_write_cache"
experimental_write_cache_size = "5GiB"
```

The default value of `experimental_write_cache_path` is `{data_home}/object_cache/write`.
To disable the write cache, set `enable_experimental_write_cache` to `false`.

Read [Performance Tuning Tips](/user-guide/administration/performance-tuning-tips) for more detailed info.

### WAL options

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ GreptimeDB 提供了各种指标来帮助监控和排查性能问题。官方仓

我们推荐在使用对象存储时启用读取缓存和写入缓存。这可以将查询耗时缩短 10 倍以上。

> 提示: 从 v0.11 版本开始,在使用远程对象存储服务时,系统会默认启用本地缓存(包括读取和写入)。通常情况下,您只需要根据需求调整缓存容量即可。
读取缓存将对象或一段范围的数据存储在本地磁盘上,以避免再次从远程读取相同的数据。以下示例展示了如何为 S3 启用读取缓存。
- `cache_path` 是存储缓存对象的目录。
- `cache_capacity` 是缓存的容量。建议至少留出总磁盘空间的 1/10 用于缓存。
- `cache_path` 是存储缓存对象的目录,从 v0.11 版本开始默认值为 `{data_home}/object_cache/read` 目录
- `cache_capacity` 是缓存的容量。从 0.11 版本开始,默认初始值为 `5Gib`建议至少留出总磁盘空间的 1/10 用于缓存。

```toml
[storage]
Expand All @@ -43,8 +45,8 @@ cache_capacity = "10G"
```

写入缓存起到 write-through 缓存的作用,在将文件上传到对象存储之前,会先将它们存储在本地磁盘上。这可以减少第一次查询的延迟。以下示例展示了如何启用写入缓存。
- `enable_experimental_write_cache` 开关可用来启用写入缓存
- `experimental_write_cache_size` 用来设置缓存的容量
- `enable_experimental_write_cache` 开关可用来启用写入缓存。从 `v0.11` 版本开始,当配置对象存储服务的时候,该值将默认设置为 `true`,即启用。
- `experimental_write_cache_size` 用来设置缓存的容量。从 0.11 版本开始,默认初始值为 `5Gib`
- `experimental_write_cache_path` 用来设置存储缓存文件的路径。默认情况下它位于数据主目录下。
- `experimental_write_cache_ttl` 用来设置缓存文件的 TTL。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,11 @@ credential_path = "<gcs credential path>"

### 对象存储缓存

当使用 S3、阿里云 OSS 等对象存储的时候,最好开启缓存来加速查询:
在使用 AWS S3、阿里云 OSS 或 Azure Blob Storage 等远程存储服务时,查询过程中获取数据通常会很耗时,尤其在公有云环境。为了解决这个问题,GreptimeDB 提供了本地缓存机制来加速重复数据的访问。

从 v0.11 版本开始,GreptimeDB 默认启用远程对象存储的本地文件缓存。默认的缓存目录位于 `{data_home}/object_cache`,读取和写入缓存容量都设置为 `5GiB`

对于 v0.11 之前的版本,你需要通过在存储设置中配置 `cache_path` 来手动启用读取缓存:

```toml
[storage]
Expand All @@ -296,12 +300,29 @@ bucket = "test_greptimedb"
root = "/greptimedb"
access_key_id = "<access key id>"
secret_access_key = "<secret access key>"
## 开启对象存储缓存
cache_path = "/var/data/s3_local_cache"
cache_capacity = "256MiB"
## 启用对象存储缓存
cache_path = "/var/data/s3_read_cache"
cache_capacity = "5Gib"
```

`cache_path` 指定存储缓存文件的本地目录,而 `cache_capacity` 则决定缓存目录中允许的最大文件总大小(以字节为单位)。你可以通过将 `cache_path` 设置为空字符串来禁用读取缓存。

对于 v0.11 之前版本的写入缓存,你需要在 `[region_engine.mito]` 部分将 `enable_experimental_write_cache` 设置为 `true` 来启用:

```toml
[[region_engine]]
[region_engine.mito]

enable_experimental_write_cache = true
experimental_write_cache_path = "/var/data/s3_write_cache"
experimental_write_cache_size = "5GiB"
```

`cache_path` 指定本地的缓存目录, `cache_capacity` 指定缓存的最大大小(字节)。
`experimental_write_cache_path` 的默认值是 `{data_home}/object_cache/write`
要禁用写入缓存,请将 `enable_experimental_write_cache` 设置为 `false`

更详细的信息请参阅[性能调优技巧](/user-guide/administration/performance-tuning-tips)


### WAL 选项

Expand Down

0 comments on commit 5ec4382

Please sign in to comment.