Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prearm check surrounding presence of crashdump data #26710

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

peterbarker
Copy link
Contributor

Under certain circumstances ArduPilot writes "CrashDump" data to flash memory, past where the ArduPilot code resides. It is cleared when the flash is zeroed (e.g. when flashing a new firmware).

When a log file is opened, we log this data into that log file and warn the user that we've had to do that.

At boot time we attempt to save the dump to the SD card, and warn the user we've done that.

Those two things were insufficient to make users aware that the firmware present on the board had crashed at some point in the past. This PR makes the presence of crashdump data impossible to miss by adding a pre-arm check which fails if the crashdump data is present.

It is possible to set a new parameter BRD_CRSHDMP_ACK to 1 to avoid the pre-arm failure. In that case we still emit a warning as part of the prearm logic. This parameter is set to zero any time the board boots and no crash dump data is present, so the user must acknowledge the presence of any new crashdump data which gets written.

In the following screenshot the "acknowledgement" parameter was written, allowing the vehicle to then arm:
image

I have checked that re-flashing the firmware clears the ack flag, forcing re-ack of further crashes.

@peterbarker peterbarker added DevCallTopic WikiNeeded needs wiki update labels Apr 7, 2024
@peterbarker peterbarker force-pushed the pr/crashdump-prearm-ack branch 3 times, most recently from 296ae40 to 79d5860 Compare April 7, 2024 07:43
libraries/AP_Arming/AP_Arming.cpp Outdated Show resolved Hide resolved
libraries/AP_BoardConfig/AP_BoardConfig.cpp Outdated Show resolved Hide resolved
libraries/AP_BoardConfig/AP_BoardConfig.cpp Outdated Show resolved Hide resolved
@peterbarker peterbarker force-pushed the pr/crashdump-prearm-ack branch from 79d5860 to 2c1693b Compare April 8, 2024 04:25
@peterbarker
Copy link
Contributor Author

I've retested this after applying @tridge 's suggestions.

@peterbarker peterbarker force-pushed the pr/crashdump-prearm-ack branch from 2c1693b to 6ce1028 Compare April 8, 2024 06:04
@MichelleRos
Copy link
Contributor

Just one suggestion, you could add a few words to the parameter description explaining what a watchdog means - eg "the firmware rebooted, which may happen in-flight, causing the vehicle to crash" or something if you want. Otherwise people might just set the parameter anyway.
Of course, it looks like they'd have to search for the documentation to find out what parameter they need to set.

The other thing I'm wondering, is what happens with a watchdog on a Plane, for example - would this mean that when it comes back up you can't arm with the arm switch/rudder arming (eg people who don't have a GCS), or are there usually some pre-arms anyway? (I remember something about a there being a "fast boot" after a watchdog that skips a lot of the checks?)

@peterbarker
Copy link
Contributor Author

Just one suggestion, you could add a few words to the parameter description explaining what a watchdog means - eg "the firmware rebooted, which may happen in-flight, causing the vehicle to crash" or something if you want. Otherwise people might just set the parameter anyway. Of course, it looks like they'd have to search for the documentation to find out what parameter they need to set.

The other thing I'm wondering, is what happens with a watchdog on a Plane, for example - would this mean that when it comes back up you can't arm with the arm switch/rudder arming (eg people who don't have a GCS), or are there usually some pre-arms anyway? (I remember something about a there being a "fast boot" after a watchdog that skips a lot of the checks?)

Yep, you're quite right, you can now force-arm past this.

@tridge
Copy link
Contributor

tridge commented Apr 9, 2024

ARMING_CRASHACK ? ARMING_CRASHDUMP? ARMING_CRASH_IGN ?

#if AP_CRASHDUMP_ACK_ENABLED
// @Param: CRSHDMP_ACK
// @DisplayName: CrashDump Acknowledgement Required
// @Description: Must have value "1" if a crashdump is present on the system, or prearm failure will be raised. Do not set this parameter unless the risks of doing so are fully understood. The presence of a crash dump means that the firmware currently installed has crashed with a hardfault or watchdog. The crash dump file gives diagnostic information which can help in finding the issue. Please check the ArduPilot documentation for more details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arming check enable/disable when CrashDumps files are found on the SD Card. CrashDumps are saved after critical software failures resulting in the autopilot immediately rebooting. If these are present the vehicle is likely unsafe to fly. Please report to the ArduPilot support team.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arming check enable/disable when CrashDumps files are found on the SD Card. CrashDumps are saved after critical software failures resulting in the autopilot immediately rebooting. If these are present the vehicle is likely unsafe to fly. Please report to the ArduPilot support team.

We do put crashdump data onto SD cards - but that's not the data this PR is checking. CrashDump flashes it into the STM32 flash area. We copy it from there onto the SD card on subsequent boots.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the wording here. It's turning into a bit of an essay.

@MichelleRos
Copy link
Contributor

Yep, you're quite right, you can now force-arm past this.

I'm not sure what you mean? I mentioned the example of people who don't always have a groundstation connected. They won't be able to force-arm at all, right?

So my question was whether this check would mean that they now can't arm at all, or whether there's usually other pre-arms active anyway, meaning that people without a GCS will already not be able to re-arm?

@andyp1per
Copy link
Collaborator

Yeah, lots of people fly without GCS using yaapu. This feels like it could be a usability impediment with people unable to arm but unsure why and lots of complaints on the forums. I'm not convinced making this the default is the right thing to do...

@MichelleRos
Copy link
Contributor

I'm not convinced making this the default is the right thing to do...

Given how frequently people just miss the current notification about there being a crash dump, I think turning it into a pre-arm is necessary.

But if we can make it not be a pre-arm on the first bootup after the watchdog (when they might still be in the air) then that would really help...
Not sure how hard that would be code-wise?

@rmackay9
Copy link
Contributor

rmackay9 commented Apr 9, 2024

@andyp1per

CrashDumps should be very rare and I'm not sure we want users just going off and flying again after one of these.

@IamPete1
Copy link
Member

IamPete1 commented Apr 9, 2024

ArduPilot/WebTools#137 adds a alert to the Hardware Report WebTool. We just need to decide where were going to send users to make a report (both that and this arming check should send users to the same place)

@timtuxworth
Copy link
Contributor

It is possible to set a new parameter BRD_CRSHDMP_ACK to 1 to avoid the pre-arm failure. In that case we still emit a warning as part of the prearm logic. This parameter is set to zero any time the board boots and no crash dump data is present, so the user must acknowledge the presence of any new crashdump data which gets written.

Why a new parameter? Why not use ARMING_OPTIONS?

@peterbarker
Copy link
Contributor Author

It is possible to set a new parameter BRD_CRSHDMP_ACK to 1 to avoid the pre-arm failure. In that case we still emit a warning as part of the prearm logic. This parameter is set to zero any time the board boots and no crash dump data is present, so the user must acknowledge the presence of any new crashdump data which gets written.

Why a new parameter? Why not use ARMING_OPTIONS?

If we use a bit in ARMING_OPTIONS then we will set that parameter in storage. That means it will no longer change when we update the defaults for that parameter, and given there are many bits in there, that could be bad. It could really surprise someone running an OEM setup with a defaults.parm, for example - remember that parameter being set-in-storage will persist after they've corrected and re-flashed the firmware.

@peterbarker
Copy link
Contributor Author

I'm not convinced making this the default is the right thing to do...

Given how frequently people just miss the current notification about there being a crash dump, I think turning it into a pre-arm is necessary.

But if we can make it not be a pre-arm on the first bootup after the watchdog (when they might still be in the air) then that would really help... Not sure how hard that would be code-wise?

Yep, good call. We bypass this check - Plane Just Does That when we've detected a watchdog-when-armed event.

My initial formulation PR'd didn't have this behaviour, I changed it to not be a mandatory check to allow for mid-air-rearm.

@peterbarker peterbarker force-pushed the pr/crashdump-prearm-ack branch from 6ce1028 to 4e9e50a Compare April 10, 2024 02:11
@peterbarker
Copy link
Contributor Author

I've force-pushed this branch with my re-tested changes.

It is now wholly contained within AP_Arming which is nice.

I think I've addressed all comments here, with the exception of Andy's musings on people that don't have a GCS being unable to get themselves out of this situation. Randy's rebutted that saying they probably shouldn't be flying anyway - but OTOH we are offering this parameter to anybody with a GCS!

Fundamentally we need the people running yaapu to acknowledge the problem. I've often thought it would be nice to be able to set an arbitrary parameter with yaapu, even if it is a little painful interface-wise (less so with touch-screen input!).

I don't think we should delay merging this based on an inability of non-GCS-users to work around it in the field. I do acknowledge there's a UI problem there, but at least with yaapu GCS there should be a way forward.

We've seen direct evidence that people are flying very large, heavy, expensive, and dangerous vehicles being armed with firmware which has created crash-dumps, and we need something in to stop that happening.

@peterbarker peterbarker merged commit f83cde7 into ArduPilot:master Apr 10, 2024
91 checks passed
@peterbarker peterbarker deleted the pr/crashdump-prearm-ack branch April 10, 2024 12:10
@rmackay9
Copy link
Contributor

I did not add this to 4.5.2-beta1 due to a merge conflict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 4.5.2-beta1
Development

Successfully merging this pull request may close these issues.

8 participants