Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Container: Match container disconnection with VMs #14075

Closed
wants to merge 2 commits into from

Conversation

hamistao
Copy link
Contributor

@hamistao hamistao commented Sep 9, 2024

Match container disconnection with VMs on lxc exec, making it return a 129 independentely on how the container disconnected.
The previous behavior was getting a 137 if the container was stopped forcefully and either a 129 or 143 if it was stopped gracefully.

An alternative would be changing VMs to return a 137 when stopped forcefully as well and stardardize the clean stop case to 129.

@hamistao hamistao force-pushed the match_container_disconnect branch 2 times, most recently from 96cbab1 to 7e54870 Compare September 9, 2024 14:49
@hamistao hamistao force-pushed the match_container_disconnect branch from 7e54870 to e0846d5 Compare September 9, 2024 15:18
@hamistao hamistao marked this pull request as ready for review September 9, 2024 16:26
@hamistao hamistao requested a review from tomponline September 9, 2024 16:26
@simondeziel
Copy link
Member

Match container disconnection with VMs on lxc exec, making it return a 129 independentely on how the container disconnected. The previous behavior was getting a 137 if the container was stopped forcefully and either a 129 or 143 if it was stopped gracefully.

If I understand the meaning of those code, 129 would indicate a swift and clean shutdown (129 = 128 + SIGHUP) while a 143 would in have been lxc stop hitting a timeout and sending a termination signal (143 = 128 + SIGTERM). Lastly 137 would have been a lxc stop -f (137 = 128 + SIGKILL) where an abrupt termination was requested without prior notice.

As such, I think this kind of information is valuable.

An alternative would be changing VMs to return a 137 when stopped forcefully as well and stardardize the clean stop case to 129.

I think that's be preferable, indeed.

@hamistao
Copy link
Contributor Author

hamistao commented Sep 9, 2024

As such, I think this kind of information is valuable.

@simondeziel may I assume your point being that I should add some of this info as comment on the code?

I think that's be preferable, indeed.

I would agree if not by the fact that we do not have direct access to the exit codes from VMs, so the best we could do is derive that the process was sigkilled based on the connection reset by peer error (if the VM is stopped cleanly we get a Unexpected EOF instead). Thus, this error could also be indication of a different problem and so we can not be sure of the signal that was used internally.

I should have mentioned this in the PR description as well, sorry about that.

@simondeziel
Copy link
Member

As such, I think this kind of information is valuable.

@simondeziel may I assume your point being that I should add some of this info as comment on the code?

My interpretation of the exit code numbers is not based on reading the code, I'm just making an educated guess based on the information I gathered here an there. So I might be way off.

Assuming my interpretation of those 3 different exit codes for containers is right, I think it's nice to have them to know what actually happened to the instance.

I think that's be preferable, indeed.

I would agree if not by the fact that we do not have direct access to the exit codes from VMs, so the best we could do is derive that the process was sigkilled based on the connection reset by peer error (if the VM is stopped cleanly we get a Unexpected EOF instead). Thus, this error could also be indication of a different problem and so we can not be sure of the signal that was used internally.

For the VM case, I see a few possible scenario:

  • User initiated clean shutdown
  • User initiated abrupt shutdown
  • LXD initiated stop clean shutdown (143)
  • LXD initiated stop hitting a timeout leading to SIGKILL'ing QEMU? (137)
  • LXD initiated forced stop, immediately SIGKILL'ing QEMU? (137)

For the user initiated ones, is there a different error being received (maybe connection reset by peer vs Unexpected EOF) that could let it pick a different exit code?

For the LXD initiated ones, I think a clean shutdown sould be 143 and if SIGKILL is being used then a 137 would make sense.

@hamistao
Copy link
Contributor Author

hamistao commented Sep 9, 2024

I'm just making an educated guess based on the information I gathered here an there. So I might be way off.

I think you are spot on!

For the user initiated ones, is there a different error being received (maybe connection reset by peer vs Unexpected EOF) that could let it pick a different exit code?

This is the case, connection reset by peer for force stop and Unexpected EOF for clean stop. But both of them feel too vague to assume it always implies a certain signal.

For example, I believe (not tested) that both a SIGHUP and SIGTERM cause a Unexpected EOF. If needed I can do more digging to make sure this is the case.

@hamistao
Copy link
Contributor Author

@tomponline If we are just going to keep the differences between VMs and containers, this PR can be closed.

Copy link
Member

@tomponline tomponline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, yeah lets keep the extra container exit codes

@tomponline tomponline closed this Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants