-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
9dea761
commit 73a825f
Showing
2 changed files
with
230 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
228 changes: 228 additions & 0 deletions
228
...aurus-plugin-content-docs/version-0.11/user-guide/administration/manage-etcd.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
--- | ||
keywords: [etcd] | ||
description: etcd 管理文档. | ||
--- | ||
|
||
# 管理 ETCD | ||
|
||
## 先决条件 | ||
|
||
- [Kubernetes](https://kubernetes.io/docs/setup/) >= v1.23 | ||
- [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) >= v1.18.0 | ||
- [Helm](https://helm.sh/docs/intro/install/) >= v3.0.0 | ||
|
||
## 安装 | ||
|
||
GreptimeDB 集群需要 etcd 集群用于元数据存储。让我们使用 Bitnami 的 etcd Helm [chart](https://github.com/bitnami/charts/tree/main/bitnami/etcd) 安装 etcd 集群. | ||
|
||
```bash | ||
helm upgrade --install etcd \ | ||
oci://registry-1.docker.io/bitnamicharts/etcd \ | ||
--version 10.2.12 \ | ||
--set replicaCount=3 \ | ||
--set auth.rbac.create=false \ | ||
--set auth.rbac.token.enabled=false \ | ||
--create-namespace \ | ||
-n etcd-cluster | ||
``` | ||
|
||
等待 etcd 集群运行: | ||
|
||
```bash | ||
kubectl get po -n etcd-cluster | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```bash | ||
NAME READY STATUS RESTARTS AGE | ||
etcd-0 1/1 Running 0 64s | ||
etcd-1 1/1 Running 0 65s | ||
etcd-2 1/1 Running 0 72s | ||
``` | ||
</details> | ||
|
||
etcd [initialClusterState](https://etcd.io/docs/v3.5/op-guide/configuration/) 参数指定启动 etcd 节点时 etcd 集群的初始状态。它对于确定节点如何加入集群非常重要。该参数可以采用以下两个值: | ||
|
||
- **new**: 表示 etcd 集群是新的。所有节点将作为新集群的一部分启动,并且不会使用任何先前的状态. | ||
- **existing**: 表示该节点将加入一个已经存在的 etcd 集群,这种情况下必须确保 initialCluster 参数配置了当前集群所有节点的信息. | ||
|
||
etcd集群运行起来后,我们需要设置 initialClusterState 参数为 **existing** : | ||
|
||
```bash | ||
helm upgrade --install etcd \ | ||
oci://registry-1.docker.io/bitnamicharts/etcd \ | ||
--version 10.2.12 \ | ||
--set initialClusterState="existing" \ | ||
--set removeMemberOnContainerTermination=false \ | ||
--set replicaCount=3 \ | ||
--set auth.rbac.create=false \ | ||
--set auth.rbac.token.enabled=false \ | ||
--create-namespace \ | ||
-n etcd-cluster | ||
``` | ||
|
||
等待 etcd 集群运行完毕,使用以下命令检查 etcd 集群的健康状态: | ||
|
||
```bash | ||
kubectl -n etcd-cluster \ | ||
exec etcd-0 -- etcdctl \ | ||
--endpoints etcd-0.etcd-headless.etcd-cluster:2379,etcd-1.etcd-headless.etcd-cluster:2379,etcd-2.etcd-headless.etcd-cluster:2379 \ | ||
endpoint status -w table | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```bash | ||
+----------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ||
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | | ||
+----------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ||
| etcd-0.etcd-headless.etcd-cluster:2379 | 680910587385ae31 | 3.5.15 | 20 kB | false | false | 4 | 73991 | 73991 | | | ||
| etcd-1.etcd-headless.etcd-cluster:2379 | d6980d56f5e3d817 | 3.5.15 | 20 kB | false | false | 4 | 73991 | 73991 | | | ||
| etcd-2.etcd-headless.etcd-cluster:2379 | 12664fc67659db0a | 3.5.15 | 20 kB | true | false | 4 | 73991 | 73991 | | | ||
+----------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ||
``` | ||
</details> | ||
|
||
## 备份 | ||
|
||
添加以下配置,并将其命名为 `etcd-backup.yaml` 文件,注意需要将 **existingClaim** 修改为你的 nfs pvc 名称: | ||
|
||
```yaml | ||
replicaCount: 3 | ||
|
||
auth: | ||
rbac: | ||
create: false | ||
token: | ||
enabled: false | ||
|
||
initialClusterState: "existing" | ||
removeMemberOnContainerTermination: false | ||
|
||
disasterRecovery: | ||
enabled: true | ||
cronjob: | ||
schedule: "*/30 * * * *" | ||
historyLimit: 2 | ||
snapshotHistoryLimit: 2 | ||
pvc: | ||
existingClaim: "${YOUR_NFS_PVC_NAME_HERE}" | ||
``` | ||
重新部署 etcd 集群: | ||
```bash | ||
helm upgrade --install etcd \ | ||
oci://registry-1.docker.io/bitnamicharts/etcd \ | ||
--version 10.2.12 \ | ||
--create-namespace \ | ||
-n etcd-cluster --values etcd-backup.yaml | ||
``` | ||
|
||
你可以看到 etcd 备份计划任务: | ||
|
||
```bash | ||
kubectl get cronjob -n etcd-cluster | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```bash | ||
NAME SCHEDULE TIMEZONE SUSPEND ACTIVE LAST SCHEDULE AGE | ||
etcd-snapshotter */30 * * * * <none> False 0 <none> 36s | ||
``` | ||
</details> | ||
|
||
```bash | ||
kubectl get pod -n etcd-cluster | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```bash | ||
NAME READY STATUS RESTARTS AGE | ||
etcd-0 1/1 Running 0 35m | ||
etcd-1 1/1 Running 0 36m | ||
etcd-2 0/1 Running 0 6m28s | ||
etcd-snapshotter-28936038-tsck8 0/1 Completed 0 4m49s | ||
``` | ||
</details> | ||
|
||
```bash | ||
kubectl logs etcd-snapshotter-28936038-tsck8 -n etcd-cluster | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```log | ||
etcd-0.etcd-headless.etcd-cluster.svc.cluster.local:2379 is healthy: successfully committed proposal: took = 2.698457ms | ||
etcd 11:18:07.47 INFO ==> Snapshotting the keyspace | ||
{"level":"info","ts":"2025-01-06T11:18:07.579095Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/snapshots/db-2025-01-06_11-18.part"} | ||
{"level":"info","ts":"2025-01-06T11:18:07.580335Z","logger":"client","caller":"[email protected]/maintenance.go:212","msg":"opened snapshot stream; downloading"} | ||
{"level":"info","ts":"2025-01-06T11:18:07.580359Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"etcd-0.etcd-headless.etcd-cluster.svc.cluster.local:2379"} | ||
{"level":"info","ts":"2025-01-06T11:18:07.582124Z","logger":"client","caller":"[email protected]/maintenance.go:220","msg":"completed snapshot read; closing"} | ||
{"level":"info","ts":"2025-01-06T11:18:07.582688Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"etcd-0.etcd-headless.etcd-cluster.svc.cluster.local:2379","size":"20 kB","took":"now"} | ||
{"level":"info","ts":"2025-01-06T11:18:07.583008Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/snapshots/db-2025-01-06_11-18"} | ||
Snapshot saved at /snapshots/db-2025-01-06_11-18 | ||
``` | ||
</details> | ||
|
||
接下来,可以在 nfs 服务器中看到 etcd 备份快照: | ||
|
||
```bash | ||
ls ${NFS_SERVER_DIRECTORY} | ||
``` | ||
|
||
<details> | ||
<summary>Expected Output</summary> | ||
```bash | ||
db-2025-01-06_11-18 db-2025-01-06_11-20 db-2025-01-06_11-22 | ||
``` | ||
</details> | ||
|
||
## 恢复 | ||
|
||
添加以下配置文件,命名为 `etcd-restore.yaml`。注意,**existingClaim** 是你的 nfs pvc 的名字,**snapshotFilename** 为 etcd 快照文件名: | ||
|
||
```yaml | ||
replicaCount: 3 | ||
|
||
auth: | ||
rbac: | ||
create: false | ||
token: | ||
enabled: false | ||
|
||
startFromSnapshot: | ||
enabled: true | ||
existingClaim: "${YOUR_NFS_PVC_NAME_HERE}" | ||
snapshotFilename: "${YOUR_ETCD_SNAPSHOT_FILE_NAME}" | ||
``` | ||
部署 etcd 恢复集群: | ||
```bash | ||
helm upgrade --install etcd-recover \ | ||
oci://registry-1.docker.io/bitnamicharts/etcd \ | ||
--version 10.2.12 \ | ||
--create-namespace \ | ||
-n etcd-cluster --values etcd-restore.yaml | ||
``` | ||
|
||
等待 etcd 恢复集群运行后,重新部署 etcd 恢复集群: | ||
|
||
```bash | ||
helm upgrade --install etcd-recover \ | ||
oci://registry-1.docker.io/bitnamicharts/etcd \ | ||
--version 10.2.12 \ | ||
--set initialClusterState="existing" \ | ||
--set removeMemberOnContainerTermination=false \ | ||
--set replicaCount=3 \ | ||
--set auth.rbac.create=false \ | ||
--set auth.rbac.token.enabled=false \ | ||
--create-namespace \ | ||
-n etcd-cluster | ||
``` | ||
|
||
接下来完成 etcd 恢复. |