Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] etcd cluster execute ops VerticalScaling always Running and lead to pod CrashLoopBackOff #5650

Closed
linghan-hub opened this issue Oct 27, 2023 · 0 comments · Fixed by #5272
Assignees
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Milestone

Comments

@linghan-hub
Copy link
Collaborator

kbcli version
Kubernetes: v1.22.15-aliyun.1
KubeBlocks: 0.7.0-beta.12
kbcli: 0.7.0-beta.12

Quick reproduction

make test-e2e TEST_TYPE=etcd CONFIG_TYPE=oss

Manual reproduction

  1. create cluster
---
# Source: etcd-cluster/templates/cluster.yaml
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: etcd-cluster
  labels:
    helm.sh/chart: etcd-cluster-0.1.0
    app.kubernetes.io/name: etcd-cluster
    app.kubernetes.io/instance: etcd-cluster
    app.kubernetes.io/version: "v3.5.6"
    app.kubernetes.io/managed-by: Helm
spec:
  clusterDefinitionRef: etcd
  clusterVersionRef: etcd-v3.5.6
  terminationPolicy: Halt
  affinity:
    topologyKeys: 
      - kubernetes.io/hostname
  componentSpecs:
    - name: etcd
      componentDefRef: etcd
      monitor: false
      replicas: 3
      serviceAccountName: kb-etcd-cluster
  1. create ops
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  generateName: etcd-cluster-vscale-
spec:
  clusterRef: etcd-cluster
  type: VerticalScaling
  verticalScaling:
    - componentName: etcd
      requests:
        cpu: "500m"
        memory: 500Mi

3.see resources

k get cluster
NAME           CLUSTER-DEFINITION   VERSION       TERMINATION-POLICY   STATUS     AGE
etcd-cluster   etcd                 etcd-v3.5.6   Halt                 Updating   5m16s
k get pod
NAME                   READY   STATUS             RESTARTS       AGE
csi-attacher-s3-0      1/1     Running            0              5h51m
csi-provisioner-s3-0   2/2     Running            0              5h51m
csi-s3-hz7vs           2/2     Running            0              5h51m
csi-s3-ngtbn           2/2     Running            0              5h51m
csi-s3-rgxgl           2/2     Running            0              5h51m
etcd-cluster-etcd-0    1/2     CrashLoopBackOff   5 (100s ago)   4m33s
etcd-cluster-etcd-1    2/2     Running            0              5m19s
etcd-cluster-etcd-2    2/2     Running            0              5m19s
k get ops
NAME                        TYPE              CLUSTER        STATUS    PROGRESS   AGE
etcd-cluster-vscale-dql9n   VerticalScaling   etcd-cluster   Running   0/3        4m38s
  1. see logs
k describe cluster etcd-cluster
Name:         etcd-cluster
Namespace:    default
Labels:       app.kubernetes.io/instance=etcd-cluster
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=etcd-cluster
              app.kubernetes.io/version=v3.5.6
              clusterdefinition.kubeblocks.io/name=etcd
              clusterversion.kubeblocks.io/name=etcd-v3.5.6
              helm.sh/chart=etcd-cluster-0.1.0
Annotations:  kubeblocks.io/ops-request: [{"name":"etcd-cluster-vscale-dql9n","type":"VerticalScaling"}]
              kubeblocks.io/reconcile: 2023-10-27T08:44:41.277459048Z
API Version:  apps.kubeblocks.io/v1alpha1
Kind:         Cluster
Metadata:
  Creation Timestamp:  2023-10-27T08:38:50Z
  Finalizers:
    cluster.kubeblocks.io/finalizer
  Generation:        3
  Resource Version:  1271378603
  UID:               b66ce57c-cf93-42bd-b178-06bffc5f11a5
Spec:
  Affinity:
    Pod Anti Affinity:  Preferred
    Tenancy:            SharedNode
    Topology Keys:
      kubernetes.io/hostname
  Cluster Definition Ref:  etcd
  Cluster Version Ref:     etcd-v3.5.6
  Component Specs:
    Class Def Ref:
      Class:
    Component Def Ref:  etcd
    Monitor:            false
    Name:               etcd
    No Create PDB:      false
    Replicas:           3
    Resources:
      Requests:
        Cpu:               500m
        Memory:            500Mi
    Service Account Name:  kb-etcd-cluster
  Monitor:
  Resources:
    Cpu:     0
    Memory:  0
  Storage:
    Size:              0
  Termination Policy:  Halt
Status:
  Cluster Def Generation:  2
  Components:
    Etcd:
      Consensus Set Status:
        Followers:
          Access Mode:  ReadWrite
          Name:         follower
          Pod:          etcd-cluster-etcd-1
        Leader:
          Access Mode:  ReadWrite
          Name:         leader
          Pod:          etcd-cluster-etcd-2
      Members Status:
        Pod Name:  etcd-cluster-etcd-2
        Role:
          Access Mode:  ReadWrite
          Can Vote:     true
          Is Leader:    true
          Name:         leader
        Pod Name:       etcd-cluster-etcd-1
        Role:
          Access Mode:  ReadWrite
          Can Vote:     true
          Is Leader:    false
          Name:         follower
      Phase:            Updating
      Pods Ready:       false
      Pods Ready Time:  2023-10-27T08:39:23Z
  Conditions:
    Last Transition Time:  2023-10-27T08:38:50Z
    Message:               The operator has started the provisioning of Cluster: etcd-cluster
    Observed Generation:   3
    Reason:                PreCheckSucceed
    Status:                True
    Type:                  ProvisioningStarted
    Last Transition Time:  2023-10-27T08:39:34Z
    Message:               Successfully applied for resources
    Observed Generation:   3
    Reason:                ApplyResourcesSucceed
    Status:                True
    Type:                  ApplyResources
    Last Transition Time:  2023-10-27T08:39:34Z
    Message:               pods are not ready in Components: [etcd], refer to related component message in Cluster.status.components
    Reason:                ReplicasNotReady
    Status:                False
    Type:                  ReplicasReady
    Last Transition Time:  2023-10-27T08:39:34Z
    Message:               pods are unavailable in Components: [etcd], refer to related component message in Cluster.status.components
    Reason:                ComponentsNotReady
    Status:                False
    Type:                  Ready
  Observed Generation:     3
  Phase:                   Updating
Events:
  Type     Reason                    Age                    From                Message
  ----     ------                    ----                   ----                -------
  Normal   ComponentPhaseTransition  6m6s                   cluster-controller  Create a new component
  Warning  ApplyResourcesFailed      6m6s                   cluster-controller  Operation cannot be fulfilled on pods "etcd-cluster-etcd-0": the object has been modified; please apply your changes to the latest version and try again
  Normal   ClusterReady              5m33s                  cluster-controller  Cluster: etcd-cluster is ready, current phase is Running
  Normal   ComponentPhaseTransition  5m33s                  cluster-controller  component is Running
  Normal   AllReplicasReady          5m33s                  cluster-controller  all pods of components are ready, waiting for the probe detection successful
  Normal   Running                   5m33s                  cluster-controller  Cluster: etcd-cluster is ready, current phase is Running
  Normal   ApplyResourcesSucceed     5m22s (x5 over 6m6s)   cluster-controller  Successfully applied for resources
  Normal   PreCheckSucceed           5m22s (x3 over 6m6s)   cluster-controller  The operator has started the provisioning of Cluster: etcd-cluster
  Normal   ComponentPhaseTransition  5m22s (x4 over 5m22s)  cluster-controller  component is Updating
  Warning  ApplyResourcesFailed      5m22s (x3 over 5m22s)  cluster-controller  Operation cannot be fulfilled on replicatedstatemachines.workloads.kubeblocks.io "etcd-cluster-etcd": the object has been modified; please apply your changes to the latest version and try again
  Warning  ReplicasNotReady          5m22s                  cluster-controller  pods are not ready in Components: [etcd], refer to related component message in Cluster.status.components
  Warning  ComponentsNotReady        5m22s                  cluster-controller  pods are unavailable in Components: [etcd], refer to related component message in Cluster.status.components
  Warning  BackOff                   15s (x5 over 4m43s)    event-controller    Pod etcd-cluster-etcd-0: Back-off restarting failed container
k logs etcd-cluster-etcd-0
Defaulted container "etcd" out of: etcd, kb-checkrole
start etcd...
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_SERVICE_HOST=192.168.26.215"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_SERVICE_PORT_CLIENT=2379"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_PORT_2379_TCP_ADDR=192.168.26.215"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_PORT=tcp://192.168.26.215:2379"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_SERVICE_PORT=2379"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_PORT_2379_TCP_PORT=2379"}
{"level":"warn","ts":"2023-10-27T08:45:17.315Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_PORT_2379_TCP_PROTO=tcp"}
{"level":"warn","ts":"2023-10-27T08:45:17.316Z","caller":"flags/flag.go:93","msg":"unrecognized environment variable","environment-variable":"ETCD_CLUSTER_ETCD_PORT_2379_TCP=tcp://192.168.26.215:2379"}
{"level":"info","ts":"2023-10-27T08:45:17.316Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","etcd-cluster-etcd-0","--listen-peer-urls","http://0.0.0.0:2380","--listen-client-urls","http://0.0.0.0:2379","--advertise-client-urls","http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2379","--initial-advertise-peer-urls","http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380","--initial-cluster-token","etcd-cluster-1","--initial-cluster","etcd-cluster-etcd-0=http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380,etcd-cluster-etcd-1=http://etcd-cluster-etcd-1.etcd-cluster-etcd-headless.default.svc.cluster.local:2380,etcd-cluster-etcd-2=http://etcd-cluster-etcd-2.etcd-cluster-etcd-headless.default.svc.cluster.local:2380","--initial-cluster-state","new","--data-dir","/var/run/etcd/default.etcd"]}
{"level":"info","ts":"2023-10-27T08:45:17.316Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/run/etcd/default.etcd","dir-type":"member"}
{"level":"info","ts":"2023-10-27T08:45:17.316Z","caller":"embed/etcd.go:124","msg":"configuring peer listeners","listen-peer-urls":["http://0.0.0.0:2380"]}
{"level":"info","ts":"2023-10-27T08:45:17.316Z","caller":"embed/etcd.go:132","msg":"configuring client listeners","listen-client-urls":["http://0.0.0.0:2379"]}
{"level":"info","ts":"2023-10-27T08:45:17.316Z","caller":"embed/etcd.go:306","msg":"starting an etcd server","etcd-version":"3.5.6","git-sha":"cecbe35ce","go-version":"go1.16.15","go-os":"linux","go-arch":"amd64","max-cpu-set":8,"max-cpu-available":8,"member-initialized":false,"name":"etcd-cluster-etcd-0","data-dir":"/var/run/etcd/default.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/run/etcd/default.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380"],"listen-peer-urls":["http://0.0.0.0:2380"],"advertise-client-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2379"],"listen-client-urls":["http://0.0.0.0:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"etcd-cluster-etcd-0=http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380,etcd-cluster-etcd-1=http://etcd-cluster-etcd-1.etcd-cluster-etcd-headless.default.svc.cluster.local:2380,etcd-cluster-etcd-2=http://etcd-cluster-etcd-2.etcd-cluster-etcd-headless.default.svc.cluster.local:2380","initial-cluster-state":"new","initial-cluster-token":"etcd-cluster-1","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2023-10-27T08:45:17.317Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/run/etcd/default.etcd/member/snap/db","took":"123.706µs"}
{"level":"info","ts":"2023-10-27T08:45:17.320Z","caller":"embed/etcd.go:373","msg":"closing etcd server","name":"etcd-cluster-etcd-0","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380"],"advertise-client-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2379"]}
{"level":"info","ts":"2023-10-27T08:45:17.320Z","caller":"embed/etcd.go:375","msg":"closed etcd server","name":"etcd-cluster-etcd-0","data-dir":"/var/run/etcd/default.etcd","advertise-peer-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2380"],"advertise-client-urls":["http://etcd-cluster-etcd-0.etcd-cluster-etcd-headless.default.svc.cluster.local:2379"]}
{"level":"fatal","ts":"2023-10-27T08:45:17.320Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"member 937d5c010da698b4 has already been bootstrapped","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:32\nruntime.main\n\truntime/proc.go:225"}
@linghan-hub linghan-hub added kind/bug Something isn't working severity/major Great chance user will encounter the same problem labels Oct 27, 2023
@linghan-hub linghan-hub added this to the Release 0.7.0 milestone Oct 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug kind/bug Something isn't working severity/major Great chance user will encounter the same problem
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants