Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow proper shutdown in Linux #25746

Merged
merged 1 commit into from
Apr 5, 2024
Merged

Allow proper shutdown in Linux #25746

merged 1 commit into from
Apr 5, 2024

Conversation

landswellsong
Copy link
Contributor

This addresses 2 problems for Linux builds:

  1. If the code hangs during initializations in HAL_Linux::run (e.g. missing a critical sensor) the whole process would lock up and not respond to SIGINT / SIGTERM, which makes it inconvenient to work with. The solution is suboptimal because in such a shutdown scenario the teardown would be skipped, however otherwise callbacks->setup() in particular needs to be made interruptible instead.
  2. Signal handler runs in a separate context, so the compiler may choose to optimize the loop condition out. Marking it as volatile explicitly prevents this.

@rmackay9
Copy link
Contributor

Hi @landswellsong,

I'm not qualified to review this but I've tagged @lucasdemarchi and maybe @Williangalvani has some input too.

BTW, could you modify the commit so that it is prefixed with, "AP_HAL_Linux:"? To be clear I mean the commit, not the PR description.

@khancyr
Copy link
Contributor

khancyr commented Dec 12, 2023

No problem for 2.

For 1. I still don't understand the point. signal handler should be settled early , if this doesn't works then we got an issue that need to be fixed. Moving the signal setting isn't solving the issue , so that isn't great move IMHO

// NOTE: signal handlers are only set before the main loop, so
// that if anything before the loops hangs, the default signals
// can still stop the process proprely, although without proper
// teardown
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this... what is hanging on the init calls?

@landswellsong
Copy link
Contributor Author

@lucasdemarchi @khancyr hello

Thanks for the comments. My rationale for #1 is the following:

  1. The signal handler prevents normal termination on typical signals and defers this to the main loop
  2. The code before the loop may hang (e.g. set up a linux board with an onboard baro, boot without the baro connected)
  3. If that happens the process can not be terminated withouts SIGKILL which essentially prevents the teardown part of the code from being run either way
  4. It also hangs systemd and/or the system shutdown overall

Of course this is very niche, but I just thought we only want to set the signal handler when we are able to respond to it. Maybe some other logic is due here, e.g. set some watchdog type timer whenever we hit the signal handler before the loop and hard-exit if we hang in initialization for X time.

As for #2: My impression is that non-volatile accesses may be optimized out by the compiler, is that not the case?

@lucasdemarchi
Copy link
Contributor

Ideally the init part should be non-blocking. Any part that waits indefinitely should be in a mainloop-like that would check and process signals, which would allow terminating the program normally. We can improve on that so it's cleaner, but I'm also ok with keeping your simpler solution for now.

@lucasdemarchi
Copy link
Contributor

As for the volatile... I doubt the compiler will optimize that out as the code that updates it is reachable from inside the main loop since we are using an fd to deliver signals.

@landswellsong
Copy link
Contributor Author

@lucasdemarchi the issue is that the code that can lock up isn't inside Linux HAL so it knows nothing about the flag variable used to exit. Again a easily verifiable case is a system with disconnected baro. I honestly don't like my solution either, but it's a little better than having an unkillable process.

As to the flag variable itself, I tried cause the compiler to do it on a small snipped, but couldn't. However all the reference material points out that the type of globals that the signal handler can access without UB must be either volatile sig_atomic_t or guaranteed to be atomic otherwise.

@lucasdemarchi
Copy link
Contributor

Sounds good. It's definitely better than the behavior we currently have. Just please use a "AP_HAL_Linux: " prefix in the commit message so it follows the commit style in this repo. With that this LGTM.

@landswellsong
Copy link
Contributor Author

Fixed the commit message, changed to sig_atomic_t and expanded the comment.

Allow default signals before full initialization in Linux, this makes sure we don't get an unkillable process if it hangs on initialization

Exit flag marked volatile to counteract possible compiler optimization due to the handler code running in a different context
@peterbarker
Copy link
Contributor

Merging this based on @lucasdemarchi 's LGTM above.

@peterbarker peterbarker merged commit 364e6f0 into ArduPilot:master Apr 5, 2024
60 checks passed
@peterbarker
Copy link
Contributor

Merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants