Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-32881: [CSI] Add CSI socket auto-recover when deleted #2322

Merged
merged 2 commits into from
Sep 14, 2023

Conversation

ggriffiths
Copy link
Contributor

@ggriffiths ggriffiths commented Aug 12, 2023

What this PR does / why we need it:
Add CSI socket auto-recover when deleted.

Which issue(s) this PR fixes (optional)
Closes #

Special notes for your reviewer:

  • I first investigated using fsnotify, but it added a lot of complexity and seemed overkill. It was also unreliable compared to checking periodically.
  • This only takes care of the CSI Driver socket at /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock. There is another UDS created by the kubelet registration system for registering the CSI node. That is at /var/lib/kubelet/plugins_registry/pxd.portworx.com-reg.sock. A PX Pod restart is required to get that to re-register. Looking into how that can be automated too. This PR only fixes the issue of having to restart portworx to get the CSI Driver socket re-created.

Test Results:

Delete socket on server:

[root@ggriffiths-k8s1-node1 pxd.portworx.com]# rm -rf csi.sock

CSI Server restarted:

@ggriffiths-k8s1-node1 portworx[32494]: time="2023-08-12T00:23:21Z" level=info msg="csi.NodePublishVolume request received. VolumeID: 327234080797915498, TargetPath: /var/lib/kubelet/pods/dbc4f54c-e31c-43e9-bcb2-1966ff2ac44f/volumes/kubernetes.io~csi/pvc-824951cd-ca92-414e-9b7d-b3d3ca8b97a3/mount" component=csi-driver correlation-id=cb2f0116-2ea6-42d9-b824-3a6e52b6de53 origin=csi-driver
@ggriffiths-k8s1-node1 portworx[32494]: time="2023-08-12T00:23:30Z" level=info msg="Detected CSI socket deleted at path /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock. Stopping CSI server" file="csi.go:337" component=openstorage/csi
@ggriffiths-k8s1-node1 portworx[32494]: time="2023-08-12T00:23:30Z" level=info msg="CSI K8s filter being added for kubernetes scheduler" file="csi.go:299" component=openstorage/csi
@ggriffiths-k8s1-node1 portworx[32494]: time="2023-08-12T00:23:30Z" level=info msg="Restarting CSI gRPC server at /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="csi.go:348" component=openstorage/csi
@ggriffiths-k8s1-node1 portworx[32494]: time="2023-08-12T00:23:30Z" level=info msg="CSI 1.7 gRPC Server ready on /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="grpcserver.go:119" component=openstorage/pkg/grpcserver

socket re-created

[root@ggriffiths-k8s1-node1 pxd.portworx.com]# ls
csi.sock

csi/csi.go Outdated
}

logrus.Infof("Detected CSI socket deleted at path %s. Stopping CSI server", socketPath)
s.Stop()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the socket path was deleted, the process is still running. We need to stop the server and re-create it.

s.Stop() handles killing the net.Listener goroutine.

Copy link
Contributor

@zoxpx zoxpx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change is OK -- but I'm confused as why do we need it (did customer request it?)

@ggriffiths
Copy link
Contributor Author

Hold off on reviews, re-working this PR.

@ggriffiths ggriffiths changed the title PWX-32881: [CSI] Add CSI socket auto-recover when deleted WIP - PWX-32881: [CSI] Add CSI socket auto-recover when deleted Aug 14, 2023
@ggriffiths ggriffiths force-pushed the csisocket_autorecover branch from cd65528 to 40512cf Compare September 12, 2023 20:40
@ggriffiths ggriffiths requested a review from lpabon September 12, 2023 20:44
@ggriffiths ggriffiths changed the title WIP - PWX-32881: [CSI] Add CSI socket auto-recover when deleted PWX-32881: [CSI] Add CSI socket auto-recover when deleted Sep 12, 2023
@ggriffiths ggriffiths force-pushed the csisocket_autorecover branch 2 times, most recently from 827c136 to 598890d Compare September 12, 2023 22:43

// Start server
logrus.Infof("Restarting CSI gRPC server at %s", socketPath)
if err := s.Start(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of failing fast, can we continuously retry?

@ggriffiths ggriffiths force-pushed the csisocket_autorecover branch from efc4836 to 36d4a96 Compare September 13, 2023 20:49
@ggriffiths ggriffiths force-pushed the csisocket_autorecover branch from 36d4a96 to 636f7ce Compare September 13, 2023 20:58
@ggriffiths ggriffiths requested a review from lpabon September 13, 2023 21:50
@ggriffiths
Copy link
Contributor Author

Tested again after the PR feedback, still restarts the CSI server:

Sep 13 22:22:33 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:22:33Z" level=info msg="Detected CSI socket deleted at path /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock. Stopping CSI gRPC server" file="csi.go:319" component=openstorage/csi
Sep 13 22:22:33 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:22:33Z" level=info msg="CSI K8s filter being added for kubernetes scheduler" file="csi.go:276" component=openstorage/csi
Sep 13 22:22:33 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:22:33Z" level=info msg="Restarting CSI gRPC server at /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="csi.go:330" component=openstorage/csi
Sep 13 22:22:33 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:22:33Z" level=info msg="CSI 1.7 gRPC Server ready on /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="grpcserver.go:119" component=openstorage/pkg/grpcserver
Sep 13 22:23:03 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:23:03Z" level=info msg="Detected CSI socket deleted at path /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock. Stopping CSI gRPC server" file="csi.go:319" component=openstorage/csi
Sep 13 22:23:03 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:23:03Z" level=info msg="CSI K8s filter being added for kubernetes scheduler" file="csi.go:276" component=openstorage/csi
Sep 13 22:23:03 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:23:03Z" level=info msg="Restarting CSI gRPC server at /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="csi.go:330" component=openstorage/csi
Sep 13 22:23:03 ggriffiths-k8s1-node0 portworx[2543962]: time="2023-09-13T22:23:03Z" level=info msg="CSI 1.7 gRPC Server ready on /var/lib/kubelet/plugins/pxd.portworx.com/csi.sock" file="grpcserver.go:119" component=openstorage/pkg/grpcserver

Signed-off-by: Grant Griffiths <[email protected]>
@ggriffiths ggriffiths force-pushed the csisocket_autorecover branch from 636f7ce to 61374da Compare September 13, 2023 22:36
@ggriffiths ggriffiths merged commit 6b9d48d into libopenstorage:master Sep 14, 2023
@ggriffiths ggriffiths deleted the csisocket_autorecover branch September 14, 2023 15:38
@github-actions
Copy link

This pull request cannot be automatically cherry-picked to the target release branch.
This is likely due to a merge conflict. Please cherry-pick this change yourself and handle the merge conflict.

ggriffiths pushed a commit to ggriffiths/openstorage that referenced this pull request Sep 14, 2023
ggriffiths pushed a commit to ggriffiths/openstorage that referenced this pull request Sep 14, 2023
ggriffiths pushed a commit that referenced this pull request Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants