Replies: 2 comments
-
I don't think there is such functionality for CSI driver to auto recover a broken mount, for the blobfuse mount, it depends on blobfuse driver, not sure whether blobfuse driver has such auto recover functionality cc @cvvz |
Beta Was this translation helpful? Give feedback.
-
We've realized that a liveness check that confirms the mounts are working inside the pod solves the 'alerting' portion of my question. However it appears that a pod restart due to a liveness check failure does not cleanly remount blobfuse: Does anyone know if there's a way to hook into the pod lifecycle to issue a restart that actually re-mounts the blobfuse-based PVC? |
Beta Was this translation helpful? Give feedback.
-
Hi blob-csi-developers,
Is there a way to have the CSI driver periodically check that all blobfuse mounts are still functioning? I am using the Azure-managed blob CSI driver on an AKS 1.25 cluster. I ran an experiment to see what happens if a blobfuse2 process is killed on the node and it appears that neither the
csi-blob-node
daemonset nor theblobfuse-proxy
systemd service on the node detect that a process has been killed. Within the pod themount
command still shows the mount as active, and the only indication that the mount has failed is when you try to do anything in the mount point, which returnscannot open directory '.': Transport endpoint is not connected
.To reproduce:
sudo pkill -9 blobfuse2
to kill blobfuse2 mount processesThis is an admittedly contrived example, but I've run into an actual manifestation of this problem before. An AKS node temporarily lost internet connectivity, leading to all blobfuse2 processes failing. The only way to force a reconnect was to restart the pod, which unmounts and re-mounts the PVC.
While it would be nice for the CSI driver to actually recover a failed mount, I am more concerned about proactively detecting the failure. Right now I can either periodically
ls
the mount point to see if I get an error, or check whether the number of blobfuse2 processes on a node matches the number of expected PVCs. But both approaches feel very janky.It would be a lot more convenient if the driver could detect a failed / missing process and raise an event on the pod.
Beta Was this translation helpful? Give feedback.
All reactions