This project stems from etcd-ansible-operator. It is an effort to implement a POC operator for using stateful sets
to deploy etcd, with a wider objective of using it in the original etcd-operator. Please check out a small demo showing that the issue of loosing quorum in an etcd cluster is solved with this implementation.
A kubernetes cluster deployed with kubectl
correctly configured. Minikube is the easiest way to get started.
The stateful sets use persistent volumes, the cluster needs to be configured with a dynamic persistent volume provisioner. In case of minikube, the guidelines can be found here
To follow this guide, make sure you are in the default
namespace.
-
Create RBAC
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/rbac.yaml
-
Create CRD
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/crd.yaml
-
Create EtcdRestore CRD
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/restore_crd.yaml
-
Create EtcdBackup CRD
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/backup_crd.yaml
-
Deploy the operator
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/operator.yaml
-
Create an etcd cluster
kubectl create -f https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/cr.yaml
-
Verify that cluster is up by
kubectl get pods -l app=etcd
. You should see something like this$ kubectl get pods -l app=etcd NAME READY STATUS RESTARTS AGE example-etcd-cluster-0 1/1 Running 0 27s example-etcd-cluster-1 1/1 Running 0 21s example-etcd-cluster-2 1/1 Running 0 18s
If you are using minikube:
- Create a service to access etcd cluster from outside the cluster by
kubectl create -f https://raw.githubusercontent.com/coreos/etcd-operator/master/example/example-etcd-cluster-nodeport-service.json
- Install etcdctl
- Set etcd version
export ETCDCTL_API=3
- Set etcd endpoint
export ETCDCTL_ENDPOINTS=$(minikube service example-etcd-cluster-client-service --url)
- Set a key in etcd
etcdctl put hello world
If you are inside the cluster, set the etcd endpoint to: http://<cluster-name>-client.<namespace>.svc:2379
and it should work. If you are using secure client, use https
protocol for the endpoint.
Recovering from loss of all the pods is the key purpose behind the idea of using stateful set to deploy etcd. Here are the steps to check it out:
- Bring an etcd cluster up.
- Insert some data into the etcd cluster
$etcdctl put hello world
- Watch members of etcd cluster by running
watch etcdctl member list
in a separate terminal. You need to export environment variables(ETCDCTL_ENDPOINTS) - Delete all the pods to simulate failure recovery
$kubectl delete pod -l app=etcd
- Within sometime, you should see all the pods going away and being replaced by a new pods, something like this.
- After sometime, the cluster will be available again.
- Check if the data exists:
$ etcdctl get hello
hello
world
- Bring a cluster up.
- Delete the cluster by
kubectl delete etcdcluster example-etcd-cluster
. This should delete all the pods and services created because of this cluster
This project only supports restoring a cluster from S3 bucket right now. To restore you will need the AWS config
and credentials
file. Place the config and credentials file in a directory and run following commands to create a secret:
$ export AWS_DIR="/path/to/aws/credentials"
$ cat $AWS_DIR/credentials
[default]
aws_access_key_id = XXX
aws_secret_access_key = XXX
$ cat $AWS_DIR/config
[default]
region = <region>
$ kubectl create secret generic aws --from-file=$AWS_DIR/credentials --from-file=$AWS_DIR/config
Run the following commands to create the EtcdCluster CR, replacing mybucket/etcd.backup
with the full path of the backup file:
$ wget https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/restore_cr.yaml
$ sed -e 's|<full-s3-path>|mybucket/etcd.backup|g' \
-e 's|<aws-secret>|aws|g' \
restore_cr.yaml \
| kubectl create -f -
This will start the restore process, wait till the status.phase
of EtcdRestore cr is Complete
with the following command:
$ kubectl get -w etcdrestore example-etcd-cluster -o jsonpath='{.status.phase}'
This project only supports backing up on S3 bucket right now. To backup you will need the AWS config
and credentials
file. Place the config and credentials file in a directory and run following commands to create a secret:
$ export AWS_DIR="/path/to/aws/credentials"
$ cat $AWS_DIR/credentials
[default]
aws_access_key_id = XXX
aws_secret_access_key = XXX
$ cat $AWS_DIR/config
[default]
region = <region>
$ kubectl create secret generic aws --from-file=$AWS_DIR/credentials --from-file=$AWS_DIR/config
Run the following commands to backup the EtcdCluster CR, replacing mybucket/etcd.backup
with the full path of the backup file, <etcd-cluster-name>
with the EtcdCluster CR name and <namespace>
with namespace of the EtcdCluster CR:
$ wget https://raw.githubusercontent.com/alaypatel07/etcd-sts-operator/master/deploy/backup_cr.yaml
$ sed -e 's|<full-s3-path>|mybucket/etcd.backup|g' \
-e 's|<aws-secret>|aws|g' \
-e 's|<etcd-cluster-endpoints>|"http://<etcd-cluster-name>-client.<namespace>.svc.cluster.local:2379"|g' \
example/etcd-backup-operator/backup_cr.yaml \
| kubectl create -f -
The operator assumes that the admin has created TLS certificates and corresponding secrets for the EtcdCluster CR. For trying out, run the following commands to create certificates for example-etcd-cluster
$ docker run -it -e USERNAME=$(id -un) -e HOME=$HOME --entrypoint /bin/bash -u $(id -u):0 -w $(pwd) -v $HOME:$HOME:Z quay.io/operator-framework/ansible-operator -c 'echo "${USERNAME}:x:$(id -u):$(id -g)::${HOME}:/bin/bash" >>/etc/passwd && bash'
ansible-playbook tls_playbook.yaml
exit
Now, create EtcdCluster CR with TLS enabled with kubectl create -f deploy/cr_tls.yaml