Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add region failover section #1056

Merged
merged 24 commits into from
Jul 16, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6a2b994
docs: add region failover section
WenyXu Jul 12, 2024
42f959a
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 12, 2024
af5d394
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 12, 2024
30bf8bf
fix: fix typo
WenyXu Jul 12, 2024
6c55849
chore: apply suggestions from CR
WenyXu Jul 15, 2024
3b0d9b8
fix: fix typo
WenyXu Jul 15, 2024
adace2a
refactor: refine read amplification definition
WenyXu Jul 15, 2024
edfc913
chore: fix typo
WenyXu Jul 15, 2024
6155665
chore: apply suggestions from CR
WenyXu Jul 15, 2024
78510cb
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
f249bdd
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
76957fd
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
3c3260b
Update docs/nightly/en/user-guide/operations/region-failover.md
WenyXu Jul 15, 2024
9ca03d8
chore: apply suggestions from CR
WenyXu Jul 15, 2024
3661768
chore: apply suggestions from CR
WenyXu Jul 15, 2024
b4c1a61
chore: apply suggestions from CR
WenyXu Jul 15, 2024
d0e37c3
docs: add zh part
WenyXu Jul 16, 2024
2be66b3
chore: apply suggestions from CR
WenyXu Jul 16, 2024
326996f
Apply suggestions from code review
nicecui Jul 16, 2024
3ad907f
typo
nicecui Jul 16, 2024
6a3d663
docs: add zh part
WenyXu Jul 16, 2024
531b7ec
chore: lint markdown
WenyXu Jul 16, 2024
ea6c64d
Merge branch 'main' into feat/failover
nicecui Jul 16, 2024
c65a978
refine the docs
nicecui Jul 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/auto-imports.d.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
/* eslint-disable */
/* prettier-ignore */
// @ts-nocheck
// noinspection JSUnusedGlobalSymbols
// Generated by unplugin-auto-import
export {}
declare global {
Expand Down
1 change: 1 addition & 0 deletions docs/nightly/en/summary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@
- quick-start
- cluster-deployment
- region-migration
- region-failover
- monitoring
- tracing
# TODO
Expand Down
157 changes: 120 additions & 37 deletions docs/nightly/en/user-guide/operations/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,31 +153,31 @@ with_metric_engine = true

The following table describes the options in detail:

| Option | Key | Type | Description |
| ---------- | ------------------ | ------- | ------------------------------------------------------------------------------- |
| http | | | HTTP server options |
| | addr | String | Server address, "127.0.0.1:4000" by default |
| | timeout | String | HTTP request timeout, "30s" by default |
| | body_limit | String | HTTP max body size, "64MB" by default |
| | is_strict_mode | Boolean | Whether to enable the strict verification mode of the protocol, which will slightly affect performance. False by default. |
| grpc | | | gRPC server options |
| | addr | String | Server address, "127.0.0.1:4001" by default |
| | runtime_size | Integer | The number of server worker threads, 8 by default |
| mysql | | | MySQL server options |
| | enable | Boolean | Whether to enable MySQL protocol, true by default |
| | add | String | Server address, "127.0.0.1:4002" by default |
| | runtime_size | Integer | The number of server worker threads, 2 by default |
| influxdb | | | InfluxDB Protocol options |
| | enable | Boolean | Whether to enable InfluxDB protocol in HTTP API, true by default |
| opentsdb | | | OpenTSDB Protocol options |
| | enable | Boolean | Whether to enable OpenTSDB protocol in HTTP API, true by default |
| prom_store | | | Prometheus remote storage options |
| | enable | Boolean | Whether to enable Prometheus Remote Write and read in HTTP API, true by default |
| | with_metric_engine | Boolean | Whether to use the metric engine on Prometheus Remote Write, true by default |
| postgres | | | PostgresSQL server options |
| | enable | Boolean | Whether to enable PostgresSQL protocol, true by default |
| | addr | String | Server address, "127.0.0.1:4003" by default |
| | runtime_size | Integer | The number of server worker threads, 2 by default |
| Option | Key | Type | Description |
| ---------- | ------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| http | | | HTTP server options |
| | addr | String | Server address, "127.0.0.1:4000" by default |
| | timeout | String | HTTP request timeout, "30s" by default |
| | body_limit | String | HTTP max body size, "64MB" by default |
| | is_strict_mode | Boolean | Whether to enable the strict verification mode of the protocol, which will slightly affect performance. False by default. |
| grpc | | | gRPC server options |
| | addr | String | Server address, "127.0.0.1:4001" by default |
| | runtime_size | Integer | The number of server worker threads, 8 by default |
| mysql | | | MySQL server options |
| | enable | Boolean | Whether to enable MySQL protocol, true by default |
| | add | String | Server address, "127.0.0.1:4002" by default |
| | runtime_size | Integer | The number of server worker threads, 2 by default |
WenyXu marked this conversation as resolved.
Show resolved Hide resolved
| influxdb | | | InfluxDB Protocol options |
| | enable | Boolean | Whether to enable InfluxDB protocol in HTTP API, true by default |
| opentsdb | | | OpenTSDB Protocol options |
| | enable | Boolean | Whether to enable OpenTSDB protocol in HTTP API, true by default |
| prom_store | | | Prometheus remote storage options |
| | enable | Boolean | Whether to enable Prometheus Remote Write and read in HTTP API, true by default |
| | with_metric_engine | Boolean | Whether to use the metric engine on Prometheus Remote Write, true by default |
| postgres | | | PostgresSQL server options |
| | enable | Boolean | Whether to enable PostgresSQL protocol, true by default |
| | addr | String | Server address, "127.0.0.1:4003" by default |
| | runtime_size | Integer | The number of server worker threads, 2 by default |


### Storage options
Expand Down Expand Up @@ -240,7 +240,7 @@ secret_access_key = "<secret access key>"

### Storage engine provider

`[[storage.providers]]` setups the table storage engine providers. Based on these providers, you can create a table with a specified storage, see [create table](/reference/sql/create#create-table):
`[[storage.providers]]` setups the table storage engine providers. Based on these providers, you can create a table with a specified storage, see [create table](/reference/sql/create#create-table):

```toml
# Allows using multiple storages
Expand Down Expand Up @@ -282,8 +282,11 @@ The `cache_path` is the local file directory that keeps cache files, and the `ca

The `[wal]` section in datanode or standalone config file configures the options of Write-Ahead-Log:

#### Local WAL

```toml
[wal]
provider = "raft_engine"
file_size = "256MB"
purge_threshold = "4GB"
purge_interval = "10m"
Expand All @@ -296,6 +299,28 @@ sync_write = false
- `purge_threshold` and `purge_interval`: control the purging of wal files, default is `4GB`.
- `sync_write`: whether to call `fsync` when writing every log.

#### Remote WAL

```toml
[wal]
provider = "kafka"
broker_endpoints = ["127.0.0.1:9092"]
max_batch_bytes = "1MB"
consumer_wait_timeout = "100ms"
backoff_init = "500ms"
backoff_max = "10s"
backoff_base = 2
backoff_deadline = "5mins"
```

- `broker_endpoints`: The Kafka broker endpoints.
- `max_batch_bytes`: The max size of a single producer batch.
- `consumer_wait_timeout`: The consumer wait timeout.
- `backoff_init`: The initial backoff delay.
- `backoff_max`: The maximum backoff delay.
- `backoff_base`: The exponential backoff rate.
- `backoff_deadline`: The deadline of retries.

### Logging options

`frontend`, `metasrv`, `datanode` and `standalone` can all configure log and tracing related parameters in the `[logging]` section:
Expand Down Expand Up @@ -324,7 +349,6 @@ How to use distributed tracing, please reference [Tracing](./tracing.md#tutorial

The parameters corresponding to different storage engines can be configured for `datanode` and `standalone` in the `[region_engine]` section. Currently, only options for `mito` region engine is available.


Frequently used options:

```toml
Expand Down Expand Up @@ -353,8 +377,8 @@ intermediate_path = ""
type = "time_series"
```


The `mito` engine provides an experimental memtable which optimizes for write performance and memory efficiency under large amounts of time-series. Its read performance might not as fast as the default `time_series` memtable.

```toml
[region_engine.mito.memtable]
type = "partition_tree"
Expand Down Expand Up @@ -463,7 +487,6 @@ headers = { Authorization = "Basic Z3JlcHRpbWVfdXNlcjpncmVwdGltZV9wd2Q=" }
- `url`: URL specified by Prometheus Remote-Write protocol.
- `headers`: Some optional HTTP parameters, such as authentication information.


### Mode option

The `mode` option is valid in `datanode`, `frontend` and `standalone`, which specify the running mode of the component.
Expand Down Expand Up @@ -498,16 +521,76 @@ store_addr = "127.0.0.1:2379"
selector = "LeaseBased"
# Store data in memory, false by default.
use_memory_store = false
## Whether to enable region failover.
## This feature is only available on GreptimeDB running on cluster mode and
## - Using Remote WAL
## - Using shared storage (e.g., s3).
enable_region_failover = false

[wal]
# Available wal providers:
# - `raft_engine` (default): there're none raft-engine wal config since metasrv only involves in remote wal currently.
# - `kafka`: metasrv **have to be** configured with kafka wal config when using kafka wal provider in datanode.
provider = "raft_engine"

# Kafka wal config.

## The broker endpoints of the Kafka cluster.
broker_endpoints = ["127.0.0.1:9092"]

## Number of topics to be created upon start.
num_topics = 64

## Topic selector type.
## Available selector types:
## - `round_robin` (default)
selector_type = "round_robin"

## A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`.
topic_name_prefix = "greptimedb_wal_topic"

## Expected number of replicas of each partition.
replication_factor = 1

## Above which a topic creation operation will be cancelled.
create_topic_timeout = "30s"
## The initial backoff for kafka clients.
backoff_init = "500ms"

## The maximum backoff for kafka clients.
backoff_max = "10s"

## Exponential backoff rate, i.e. next backoff = base * current backoff.
backoff_base = 2

## Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate.
backoff_deadline = "5mins"
```

| Key | Type | Description |
| ---------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| data_home | String | The working home of Metasrv, `"/tmp/metasrv/"` by default |
| bind_addr | String | The bind address of Metasrv, `"127.0.0.1:3002"` by default. |
| server_addr | String | The communication server address for frontend and datanode to connect to Metasrv, `"127.0.0.1:3002"` by default for localhost |
| store_addr | String | etcd server addresses, `"127.0.0.1:2379"` by default, server address separated by commas, in the format of `"ip1:port1,ip2:port2,..."`. |
| selector | String | Load balance strategy to choose datanode when creating new tables, see [Selector](/contributor-guide/metasrv/selector.md) |
| use_memory_store | Boolean | Only used for testing when you don't have an etcd cluster, store data in memory, `false` by default. |
| Key | Type | Default | Descriptions |
| -------------------------- | ------- | ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `data_home` | String | `/tmp/metasrv/` | The working home directory. |
| `bind_addr` | String | `127.0.0.1:3002` | The bind address of metasrv. |
| `server_addr` | String | `127.0.0.1:3002` | The communication server address for frontend and datanode to connect to metasrv, "127.0.0.1:3002" by default for localhost. |
| `store_addr` | String | `127.0.0.1:2379` | Etcd server address. |
| `selector` | String | `lease_based` | Datanode selector type.<br/>- `lease_based` (default value).<br/>- `load_based`<br/>For details, see [Selector](/contributor-guide/metasrv/selector.md) |
| `use_memory_store` | Bool | `false` | Store data in memory. |
| `enable_telemetry` | Bool | `true` | Whether to enable greptimedb telemetry. |
| `store_key_prefix` | String | `""` | If it's not empty, the metasrv will store all data with this key prefix. |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `enable_region_failover` | Bool | `false` | Whether to enable region failover.<br/>This feature is only available on GreptimeDB running on cluster mode and<br/>- Using Remote WAL<br/>- Using shared storage (e.g., s3). |
| `wal` | -- | -- | -- |
| `wal.provider` | String | `raft_engine` | -- |
| `wal.broker_endpoints` | Array | -- | The broker endpoints of the Kafka cluster. |
| `wal.num_topics` | Integer | `64` | Number of topics to be created upon start. |
| `wal.selector_type` | String | `round_robin` | Topic selector type.<br/>Available selector types:<br/>- `round_robin` (default) |
| `wal.topic_name_prefix` | String | `greptimedb_wal_topic` | A Kafka topic is constructed by concatenating `topic_name_prefix` and `topic_id`. |
| `wal.replication_factor` | Integer | `1` | Expected number of replicas of each partition. |
| `wal.create_topic_timeout` | String | `30s` | Above which a topic creation operation will be cancelled. |
| `wal.backoff_init` | String | `500ms` | The initial backoff for kafka clients. |
| `wal.backoff_max` | String | `10s` | The maximum backoff for kafka clients. |
| `wal.backoff_base` | Integer | `2` | Exponential backoff rate, i.e. next backoff = base \* current backoff. |
| `wal.backoff_deadline` | String | `5mins` | Stop reconnecting if the total wait time reaches the deadline. If this config is missing, the reconnecting won't terminate. |
WenyXu marked this conversation as resolved.
Show resolved Hide resolved

### Datanode-only configuration

Expand Down
Loading