Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't fail when notifiying a server that is shutting down when using NotifyAlive #13811

Closed
tomponline opened this issue Jul 24, 2024 · 0 comments · Fixed by #13883
Closed

Don't fail when notifiying a server that is shutting down when using NotifyAlive #13811

tomponline opened this issue Jul 24, 2024 · 0 comments · Fixed by #13883
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@tomponline
Copy link
Member

If a cluster notification is sent to all alive members during the shutdown process of a LXD server then the notification process fails.

If the notifier could detect that the response to the request was not a failure to handle the specific notification, but that it failed due to the server shutting down, then we could treat this the same as the server not being alive, and not fail the entire notification process.

This was originally reported from the Anbox team when one of their cluster members was snap refreshed during adding a certificate:

2024-06-27T04:57:35Z ams.ams[84977]: I0627 04:57:35.313418   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T04:57:48Z ams.ams[84977]: E0627 04:57:48.131973   84977 nodes.go:362] Backend: Failed to add node lxd12: failed to add client cert to cluster: failed to notify peer 172.20.30.182:8443: LXD is shutting down
2024-06-27T04:57:58Z ams.ams[84977]: I0627 04:57:58.053144   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T04:58:03Z ams.ams[84977]: E0627 04:58:03.921723   84977 nodes.go:362] Backend: Failed to add node lxd12: Failed to update cluster trust: Failed getting existing certificate: not authorized
2024-06-27T04:58:18Z ams.ams[84977]: I0627 04:58:18.368382   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T04:58:24Z ams.ams[84977]: E0627 04:58:24.396656   84977 nodes.go:362] Backend: Failed to add node lxd12: Failed to update cluster trust: Failed getting existing certificate: not authorized
2024-06-27T04:58:48Z ams.ams[84977]: I0627 04:58:48.175689   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T04:58:54Z ams.ams[84977]: E0627 04:58:54.089149   84977 nodes.go:362] Backend: Failed to add node lxd12: Failed to update cluster trust: Failed getting existing certificate: not authorized
2024-06-27T04:59:37Z ams.ams[84977]: I0627 04:59:37.572317   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T04:59:43Z ams.ams[84977]: E0627 04:59:43.514248   84977 nodes.go:362] Backend: Failed to add node lxd12: Failed to update cluster trust: Failed getting existing certificate: not authorized
2024-06-27T05:01:08Z ams.ams[84977]: I0627 05:01:08.055997   84977 nodes.go:301] Done configuring remote LXD instance
2024-06-27T05:01:13Z ams.ams[84977]: E0627 05:01:13.862790   84977 nodes.go:362] Backend: Failed to add node lxd12: Failed to update cluster trust: Failed getting existing certificate: not authorized
@tomponline tomponline added the Bug Confirmed to be a bug label Jul 24, 2024
@tomponline tomponline added this to the lxd-6.2 milestone Jul 24, 2024
MggMuggins added a commit to MggMuggins/lxd that referenced this issue Aug 6, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue Aug 7, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue Aug 7, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue Aug 7, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue Aug 7, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline added a commit that referenced this issue Aug 19, 2024
…otifyAlive (#13883)

Fixes #13811 

There doesn't seem to be a well-established pattern for differentiating
between error types other than `strings.Contains`. I'd be happy to drop
something like [`IsConnectionError`
](https://github.com/canonical/lxd/blob/main/shared/network.go#L53) in a
new file `shared/errors.go` if you're concerned about fragility when
comparing error messages.

LXD-1371
tomponline pushed a commit to tomponline/lxd that referenced this issue Aug 20, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
kadinsayani pushed a commit to kadinsayani/lxd that referenced this issue Aug 28, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
hamistao pushed a commit to hamistao/lxd that referenced this issue Sep 4, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline pushed a commit to tomponline/lxd that referenced this issue Sep 13, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline pushed a commit to tomponline/lxd that referenced this issue Sep 13, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline pushed a commit to tomponline/lxd that referenced this issue Sep 13, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline pushed a commit to tomponline/lxd that referenced this issue Sep 13, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
tomponline pushed a commit to tomponline/lxd that referenced this issue Sep 13, 2024
...that is shutting down when using NotifyAlive.

Fixes canonical#13811

Signed-off-by: Wesley Hershberger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants