Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move remote WAL under the disaster recovery #1186

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/getting-started/installation/greptimedb-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ By default, the data will be stored in `/tmp/greptimedb-cluster-docker-compose`.

## Deploy the GreptimeDB cluster in Kubernetes

Please refer to [Deploy on Kubernetes](/user-guide/operations/deploy-on-kubernetes/overview.md).
Please refer to [Deploy on Kubernetes](/user-guide/deployments/deploy-on-kubernetes/overview.md).

## Next Steps

Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/installation/greptimedb-standalone.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GreptimeDB Standalone

We use the simplest configuration for you to get started. For a comprehensive list of configurations available in GreptimeDB, see the [configuration documentation](/user-guide/operations/configuration.md).
We use the simplest configuration for you to get started. For a comprehensive list of configurations available in GreptimeDB, see the [configuration documentation](/user-guide/deployments/configuration.md).

## Binary

Expand Down Expand Up @@ -120,7 +120,7 @@ docker run -p 0.0.0.0:4000-4003:4000-4003 \
</TabItem>
</Tabs>

You can also refer to the [Configuration](/user-guide/operations/configuration.md) document to modify the bind address in the configuration file.
You can also refer to the [Configuration](/user-guide/deployments/configuration.md) document to modify the bind address in the configuration file.

## Next Steps

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/command-lines.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Starts GreptimeDB in standalone mode with customized configurations:
greptime --log-dir=/tmp/greptimedb/logs --log-level=info standalone start -c config/standalone.example.toml
```

The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuration files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/operations/configuration.md).
The `standalone.example.toml` configuration file comes from the `config` directory of the `[GreptimeDB](https://github.com/GreptimeTeam/greptimedb/)` repository. You can find more example configuration files there. The `-c` option specifies the configuration file, for more information check [Configuration](../user-guide/deployments/configuration.md).

To start GreptimeDB in distributed mode, you need to start each component separately. The following commands show how to start each component with customized configurations or command line arguments.

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/sql/create.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ Users can add table options by using `WITH`. The valid options contain the follo
| Option | Description | Value |
| ------------------- | --------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ttl` | The storage time of the table data | String value, such as `'60m'`, `'1h'` for one hour, `'14d'` for 14 days etc. Supported time units are: `s` / `m` / `h` / `d` |
| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/operations/configuration.md#storage-engine-provider). |
| `storage` | The name of the table storage engine provider | String value, such as `S3`, `Gcs`, etc. It must be configured in `[[storage.providers]]`, see [configuration](/user-guide/deployments/configuration.md#storage-engine-provider). |
| `compaction.type` | Compaction strategy of the table | String value. Only `twcs` is allowed. |
| `compaction.twcs.max_active_window_files` | Max num of files that can be kept in active writing time window | String value, such as '8'. Only available when `compaction.type` is `twcs`. You can refer to this [document](https://cassandra.apache.org/doc/latest/cassandra/managing/operating/compaction/twcs.html) to learn more about the `twcs` compaction strategy. |
| `compaction.twcs.max_inactive_window_files` | Max num of files that can be kept in inactive time window. | String value, such as '1'. Only available when `compaction.type` is `twcs`. |
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Create a cluster

Please refer to [Kubernetes](./operations/deploy-on-kubernetes/overview.md) to get the information about creating a Kubernetes cluster.
Please refer to [Kubernetes](./deployments/deploy-on-kubernetes/overview.md) to get the information about creating a Kubernetes cluster.

## Distributed Read/Write

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/concepts/features-that-you-concern.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Since 0.8, GreptimeDB added a new function called `Flow`, which is used for cont
## Can I store data in object storage in the cloud?

Yes, GreptimeDB's data access layer is based on [OpenDAL](https://github.com/apache/incubator-opendal), which supports most kinds of object storage services.
The data can be stored in cost-effective cloud storage services such as AWS S3 or Azure Blob Storage, please refer to storage configuration guide [here](./../operations/configuration.md#storage-options).
The data can be stored in cost-effective cloud storage services such as AWS S3 or Azure Blob Storage, please refer to storage configuration guide [here](./../deployments/configuration.md#storage-options).

GreptimeDB also offers a fully-managed cloud service [GreptimeCloud](https://greptime.com/product/cloud) to help you manage data in the cloud.

Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/concepts/storage-location.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,14 @@ The storage file structure of GreptimeDB includes of the following:
```

- `metadata`: The internal metadata directory that keeps catalog, database and table info, procedure states, etc. In cluster mode, this directory does not exist, because all those states including region route info are saved in `Metasrv`.
- `data`: The files in data directory store time series data and index files of GreptimeDB. To customize this path, please refer to [Storage option](../operations/configuration.md#storage-options). The directory is organized in a two-level structure of catalog and schema.
- `data`: The files in data directory store time series data and index files of GreptimeDB. To customize this path, please refer to [Storage option](../deployments/configuration.md#storage-options). The directory is organized in a two-level structure of catalog and schema.
- `logs`: The log files contains all the logs of operations in GreptimeDB.
- `wal`: The wal directory contains the write-ahead log files.
- `index_intermediate`: the temporary intermediate data while indexing.

## Cloud storage

The `data` directory in the file structure can be stored in cloud storage. Please refer to [Storage option](../operations/configuration.md#storage-options) for more details.
The `data` directory in the file structure can be stored in cloud storage. Please refer to [Storage option](../deployments/configuration.md#storage-options) for more details.

Please note that only storing the data directory in object storage is not sufficient to ensure data reliability and disaster recovery. The `wal` and `metadata` also need to be considered for disaster recovery. Please refer to the [disaster recovery documentation](/user-guide/operations/disaster-recovery/overview.md).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -386,9 +386,9 @@ default_ratio = 1.0
- `enable_otlp_tracing`: whether to turn on tracing, not turned on by default.
- `otlp_endpoint`: Export the target endpoint of tracing using gRPC-based OTLP protocol, the default value is `localhost:4317`.
- `append_stdout`: Whether to append logs to stdout. Defaults to `true`.
- `tracing_sample_ratio`: This field can configure the sampling rate of tracing. How to use `tracing_sample_ratio`, please refer to [How to configure tracing sampling rate](./tracing.md#guide-how-to-configure-tracing-sampling-rate).
- `tracing_sample_ratio`: This field can configure the sampling rate of tracing. How to use `tracing_sample_ratio`, please refer to [How to configure tracing sampling rate](/user-guide/operations/tracing.md#guide-how-to-configure-tracing-sampling-rate).

How to use distributed tracing, please reference [Tracing](./tracing.md#tutorial-use-jaeger-to-trace-greptimedb)
How to use distributed tracing, please reference [Tracing](/user-guide/operations/tracing.md#tutorial-use-jaeger-to-trace-greptimedb)

### Region engine options

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,17 @@ The GreptimeDB Kubernetes Operator simplifies deploying GreptimeDB on both priva
GreptimeDB provides a [Helm-compatible repository](https://github.com/GreptimeTeam/helm-charts) for easy deployment. Follow these steps to install the Operator using Helm:

### Add the GreptimeDB Operator repository
First, add the GreptimeDB Operator Helm repository:
```bash
helm repo add greptime https://greptimeteam.github.io/helm-charts/
```

Validate the repository by searching for the Operator chart:
Ensure you have [added the GreptimeDB Helm repository](/user-guide/deployments/deploy-on-kubernetes/overview.md#add-helm-repository).
You can then validate the GreptimeDB Operator repository by searching for the Operator chart:

```bash
helm search repo greptimedb-operator
```

You should see output similar to this:
```

```shell
NAME CHART VERSION APP VERSION DESCRIPTION
greptime/greptimedb-operator 0.2.3 0.1.0-alpha.29 The greptimedb-operator Helm chart for Kubernetes.
```
Expand Down
28 changes: 28 additions & 0 deletions docs/user-guide/deployments/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Overview

## Configuration

Before deploying GreptimeDB, you need to [configure the server](configuration.md) to meet your requirements. This includes setting up protocol options, storage options, and more.

## Authentication

By default, GreptimeDB does not have authentication enabled. Learn how to [configure authentication](/user-guide/operations/authentication.md) for your deployment manually.

## Deploy on Kubernetes

The step-by-step instructions for [deploying GreptimeDB on a Kubernetes cluster](./deploy-on-kubernetes/overview.md).

## Run on Android

Learn how to [run GreptimeDB on Android devices](run-on-android.md).

## Capacity plan

Understand how to [plan for capacity](/user-guide/operations/capacity-plan.md) to ensure your GreptimeDB deployment can handle your workload.

## GreptimeCloud

Instead of managing your own GreptimeDB cluster,
you can use [GreptimeCloud](https://greptime.cloud) to manage GreptimeDB instances, monitor metrics, and set up alerts.
GreptimeCloud is a cloud service powered by fully-managed serverless GreptimeDB, providing a scalable and efficient solution for time-series data platforms and Prometheus backends.
For more information, see the [GreptimeCloud documentation](/greptimecloud/overview.md).
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ remote_read:
# password: greptime_pwd
```

- The host and port in the URL represent the GreptimeDB server. In this example, the server is running on `localhost:4000`. You can replace it with your own server address. For the HTTP protocol configuration in GreptimeDB, please refer to the [protocol options](/user-guide/operations/configuration.md#protocol-options).
- The host and port in the URL represent the GreptimeDB server. In this example, the server is running on `localhost:4000`. You can replace it with your own server address. For the HTTP protocol configuration in GreptimeDB, please refer to the [protocol options](/user-guide/deployments/configuration.md#protocol-options).
- The `db` parameter in the URL represents the database to which we want to write data. It is optional. By default, the database is set to `public`.
- `basic_auth` is the authentication configuration. Fill in the username and password if GreptimeDB authentication is enabled. Please refer to the [authentication document](/user-guide/operations/authentication.md).

Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/ingest-data/for-observerbility/vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ password = "<password>"
```

GreptimeDB uses gRPC to communicate with Vector, so the default port for the Vector sink is `4001`.
If you have changed the default gRPC port when starting GreptimeDB with [custom configurations](/user-guide/operations/configuration.md#configuration-file), use your own port instead.
If you have changed the default gRPC port when starting GreptimeDB with [custom configurations](/user-guide/deployments/configuration.md#configuration-file), use your own port instead.

</div>

Expand Down
8 changes: 4 additions & 4 deletions docs/user-guide/operations/admin.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ This document addresses strategies and practices used in the operation of Grepti
## Database/Cluster management

* [Installation](/getting-started/installation/overview.md) for GreptimeDB and the [g-t-control](/reference/gtctl.md) command line tool.
* Database Configuration, please read the [Configuration](./configuration.md) reference.
* [Monitoring](./monitoring.md) and [Tracing](./tracing.md) for GreptimeDB.
* GreptimeDB [Disaster Recovery](./disaster-recovery/overview.md).
* Database Configuration, please read the [Configuration](/user-guide/deployments/configuration.md) reference.
* [Monitoring](/user-guide/operations/monitoring.md) and [Tracing](/user-guide/operations/tracing.md) for GreptimeDB.
* GreptimeDB [Disaster Recovery](/user-guide/operations/disaster-recovery/overview.md).

### Runtime information

Expand Down Expand Up @@ -36,7 +36,7 @@ The `INFORMATION_SCHEMA` database provides access to system metadata, such as th
## Data management

* [The Storage Location](/user-guide/concepts/storage-location.md).
* Cluster Failover for GreptimeDB by [Setting Remote WAL](./remote-wal/quick-start.md).
* Cluster Failover for GreptimeDB by [Setting Remote WAL](/user-guide/operations/disaster-recovery/remote-wal/quick-start.md).
* [Flush and Compaction for Table & Region](/reference/sql/admin.md#admin-functions).
* Partition the table by regions, read the [Table Sharding](/contributor-guide/frontend/table-sharding.md) reference.
* [Migrate the Region](./region-migration.md) for Load Balance.
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide/operations/capacity-plan.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ there are several key considerations:
- Data retention policy
- Hardware costs

To monitor the various metrics of GreptimeDB, please refer to [Monitoring](./monitoring.md).
To monitor the various metrics of GreptimeDB, please refer to [Monitoring](/user-guide/operations/monitoring.md).

## CPU

Expand Down Expand Up @@ -42,7 +42,7 @@ This allows GreptimeDB to store large amounts of data in a significantly smaller

Data can be stored either in a local file system or in cloud storage, such as AWS S3.
FOr more information on storage options,
please refer to the [storage configuration](./configuration.md#storage-options) documentation.
please refer to the [storage configuration](/user-guide/deployments/configuration.md#storage-options) documentation.

Cloud storage is highly recommended for data storage due to its simplicity in managing storage.
With cloud storage, only about 200GB of local storage space is needed for query-related caches and Write-Ahead Log (WAL).
Expand Down
3 changes: 2 additions & 1 deletion docs/user-guide/operations/disaster-recovery/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Before digging into the specific DR solution, let's explain the architecture of
GreptimeDB is designed with a cloud-native architecture based on storage-compute separation:
* **Frontend**: the ingestion and query service layer, which forwards requests to Datanode and processes, and merges responses from Datanode.
* **Datanode**: the storage layer of GreptimeDB, and is an LSM storage engine. Region is the basic unit for storing and scheduling data in Datanode. A region is a table partition, a collection of data rows. The data in region is saved into Object Storage (such as AWS S3). Unflushed Memtable data is written into WAL and can be recovered in DR.
* **WAL**: persists the unflushed Memtable data in memory. It will be truncated when the Memtable is flushed into SSTable files. It can be local disk-based (local WAL) or Kafka cluster-based (remote WAL).
* **WAL**: persists the unflushed Memtable data in memory. It will be truncated when the Memtable is flushed into SSTable files. It can be local disk-based (local WAL) or [Kafka cluster-based (remote WAL)](./remote-wal/quick-start.md).
* **Object Storage**: persists the SSTable data and index.

The GreptimeDB stores data in object storage such as [AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html) or its compatible services, which is designed to provide 99.999999999% durability and 99.99% availability of objects over a given year. And services such as S3 provide [replications in Single-Region or Cross-Region](https://docs.aws.amazon.com/AmazonS3/latest/userguide/replication.html), which is naturally capable of DR.
Expand Down Expand Up @@ -118,6 +118,7 @@ By comparing these DR solutions, you can decide on the final option based on the

## References

* [Remote WAL](./remote-wal/quick-start.md)
* [Backup & restore data](./back-up-&-restore-data.md)
* [DR solution for GreptimeDB Standalone](./dr-solution-for-standalone.md)
* [DR solution based on Active-Active Failover ](./dr-solution-based-on-active-active-failover.md)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ To avoid accidently exit the Docker container, you may want to run it in the "de
the `docker run` command.
:::

We use the [environment variables](/user-guide/operations/configuration.md#environment-variable) to specify the provider:
We use the [environment variables](/user-guide/deployments/configuration.md#environment-variable) to specify the provider:

- `GREPTIMEDB_STANDALONE__WAL__PROVIDER`: Set `kafka` to use Kafka remote WAL;
- `GREPTIMEDB_STANDALONE__WAL__BROKER_ENDPOINTS`: Specify the advertised listeners for all brokers in the Kafka cluster. In this example, we will use the Kafka container name, and the bridge network will resolve it into IPv4;
Expand Down
2 changes: 1 addition & 1 deletion docs/user-guide/operations/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ the `docker run` command.

You can also save metrics to GreptimeDB itself for convenient querying and analysis using SQL statements.
This section provides some configuration examples.
For more details about configuration, please refer to the [Monitor metrics options](./configuration.md#monitor-metrics-options).
For more details about configuration, please refer to the [Monitor metrics options](/user-guide/deployments/configuration.md#monitor-metrics-options).

### Standalone

Expand Down
4 changes: 0 additions & 4 deletions docs/user-guide/operations/overview.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,8 @@
# Overview

* [Administration](./admin.md)
* [Configuration](./configuration.md)
* [Capacity Plan](./capacity-plan.md)
* [Kubernetes](./deploy-on-kubernetes/overview.md)
* [Running on Android](./run-on-android.md)
* [Disaster Recovery](./disaster-recovery/overview.md)
* [Monitoring](./monitoring.md)
* [Tracing](./tracing.md)
* [Remote WAL](./remote-wal/quick-start.md)
* [Region Migration](./region-migration.md)
6 changes: 3 additions & 3 deletions docs/user-guide/operations/region-failover.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ Region Failover provides the ability to recover regions from region failures wit
This feature is only available on GreptimeDB running on distributed mode and

- Using Kafka WAL
- Using [shared storage](/user-guide/operations/configuration.md#storage-options) (e.g., AWS S3)
- Using [shared storage](/user-guide/deployments/configuration.md#storage-options) (e.g., AWS S3)

### Via configuration file
Set the `enable_region_failover=true` in [metasrv](/user-guide/operations/configuration.md#metasrv-only-configuration) configuration file.
Set the `enable_region_failover=true` in [metasrv](/user-guide/deployments/configuration.md#metasrv-only-configuration) configuration file.

### Via GreptimeDB Operator

Expand Down Expand Up @@ -38,7 +38,7 @@ The data belonging to a specific region consists of data files plus data in the

Although multiple regions share the same topic, allowing the Datanode to support more regions, the cost of this approach is read amplification during WAL replay.

For example, configure 128 topics for [metasrv](/user-guide/operations/configuration.md#metasrv-only-configuration), and if the whole cluster holds 1024 regions (physical regions), every 8 regions will share one topic.
For example, configure 128 topics for [metasrv](/user-guide/deployments/configuration.md#metasrv-only-configuration), and if the whole cluster holds 1024 regions (physical regions), every 8 regions will share one topic.

![Read Amplification](/remote-wal-read-amplification.png)

Expand Down
Loading
Loading