Corrupt ETCD node after underlying ESXI storage failure #16030

sternma · 2023-06-07T19:17:26Z

sternma
Jun 7, 2023

Has anyone seen this failure in etcd in a Kubernetes environment?

Let me break down our setup:
We have a stacked K8s cluster running "baremetal" on VMs. These VMs are distributed across various ESXIs that use a shared PureStorage setup.

In one of our clusters, we had a failure in the link from an ESXI to the PS. In this scenario, all VMs on the affected ESXI immediately failed as they no longer had access to their mounted disk.

When the VMs were migrated to a different ESXI, they all rejoined the cluster successfully. However, the affected control plane node, which in our stacked setup is also an ETCD node, had constant crashes from ETCD, with symptoms that pointed to a corrupt mount. Is there any bug or improvement open for this currently, or have others had this issue?

ahrtr · 2023-06-08T01:15:01Z

ahrtr
Jun 8, 2023
Maintainer

an ETCD node, had constant crashes from ETCD, with symptoms that pointed to a corrupt mount

Could you provide more details, e.g the call stack and logs?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrupt ETCD node after underlying ESXI storage failure #16030

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Corrupt ETCD node after underlying ESXI storage failure #16030

sternma Jun 7, 2023

Replies: 1 comment

ahrtr Jun 8, 2023 Maintainer

sternma
Jun 7, 2023

ahrtr
Jun 8, 2023
Maintainer