Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refinement] boot_control.grub: improve robustness against boot load/init, split out ota_status control and slot mount logic #257

Merged
merged 48 commits into from
Sep 22, 2023

Conversation

Bodong-Yang
Copy link
Member

@Bodong-Yang Bodong-Yang commented Sep 6, 2023

Description

This PR introduces several internal refinements to grub boot_control implementation as follow. With this PR, grub boot_control implementation aligns with grub boot control design documentation.

Robustness improved over boot control load/init

This PR refines the grub boot_control procedures, aligns the behavior with the design document. Comparing to the previous implementation, it handles the condition of ota-partition symlink broken and migration from non-ota-partition booted system properly during boot_control loading, please check Behavior changes for more details.

Also, during finalizing switch boot, previously otaclient will do the ota-partition symlink switching before doing grub-update, this potentially breaks the atomic switching boot. This PR fixes this issue.

OTAStatusFilesControl and SlotMountHelper

These two classes are common shared helper classes that implement ota_status files control and slots preparing/mounting, which can be used by any detailed boot_control implementation(previously only used by rpi_boot, this PR make grub_boot also use these classes). They are introduced since #182 .

The goal for introducing these classes is to abstract ota_status files control and slots preparing/mounting from detailed boot_control implementation, improve boot_control implementation's cohesive, reduce code duplication and ease maintaining.
Especially, the OTAStatusFilesControl class aligns with ota_status control design.

Split out ota_status files control and slot preparing/mounting from boot_control implementation

This PR splits and removes the ota_status files control logic from grub boot_control implementation, using OTAStatusFilesControl and SlotMountHelper instead of mixins combination (OTAStatusMixin, VersionControlMixin, PrepareMountMixin, these mixins are deprecated in flavor of OTAStatusFilesControl and SlotMountHelper classes).

Now grub boot_control doesn't implement the ota_status files control and slot preparing/mounting, but using APIs from OTAStatusFilesControl and SlotMountHelper, making grub boot_control implementation more focuses on grub control logic, reduce code duplication and ease the maintaining.

Related documents

  1. ota_status control design
  2. grub boot control design

Check list

  • related documents are provided.
  • test file(s) that covers the change(s) is implemented.
  • local test is passed.
  • VM test: boot migrate from non-ota-partition booted system test.
  • VM test: try to fix ota-partition symlink missing/corrupted.
  • VM test: normal OTA.

Changes

  1. common.OTAStatusFilesControl: split loading and parsing status and slot_in_use file into two methods,
  2. common.OTAStatusFilesControl: takes force_initialize param in __init__, ota_status will initialize if boot_control is initialized,
  3. common.SlotMountHelper: standby_boot_dir is always <standby_slot_mp>/boot, standby_slot creator no-longer needs to care about where the kernel/initrd.img should be placed, this will be handled by boot_controller,
  4. protocol: rename get_ota_status to get_booted_ota_status to distinguish this ota_status with live_ota_status maintained by otaclient,
  5. grub_boot: remove _SymlinkABPartitionDetecter, only detect AB partition by partition information(GrubABPartitionDetecter),
  6. grub_boot: use OTAStatusFilesControl to handle ota_status files control, switch boot detecting, etc, remove superseded/overridden codes,
  7. grub_boot: use SlotMountHelper to handle slots preparing/mounting in each update phases, remove superseded/overridden codes,
  8. grub_boot: refine boot loading procedures, checking the present of ota-partition files during loading, handles the case of ota-partition symlink corrupted, in such case a full boot_control initializing will engage and all ota-partition files will be recreated,
  9. grub_boot: indicates OTAStatusFilesControl to initialize unconditionally if boot_control is initialized,
  10. grub_boot: _get_current_booted_kernel_and_initrd now use standard way to detect booted initrd.img,
  11. grub_boot: _prepare_kernel_initrd_links now uses first found kernel, and try to find the corresponding initrd.img with the same version string,
  12. grub_boot: split ensure_ota_partition_symlinks into two new methods _ensure_ota_partition_symlinks and _ensure_standby_slot_boot_files_symlinks,
  13. grub_boot: remove reprepare_standby_ota_partition_file and reprepare_active_ota_partition_file methods, now grub_reboot_to_standby and finalizing_switch_boot implements boot control with helper methods on their own,
  14. grub_boot: grub_reboot_to_standby and finalizing_switch_boot now ensure the present of all ota-partition symlinks before grub-update,
  15. grub_boot: finalizing_switch_boot now does switch boot after grub-update, before grub-update all ota-partition symlinks will be ensured,
  16. rpi_boot: change according to boot_control.ota_status -> boot_control.booted_ota_status rename,
  17. config: add DEFAULT_VERSION_STR, value is empty string to align with previous implementation,
  18. ota_client: change according to boot_control.get_ota_status -> boot_control.get_booted_ota_status API rename,
  19. fix up test_grub to make it work again,
  20. add new test_ota_status_control,
  21. minor fix up to test_rpi_boot and test_ota_client.

Behavior changes

Does this PR introduce behavior change(s)?

  • Yes, internal behaivor (will not impact user experience).

Previous behavior

  1. otaclient will crash on ota-partition symlink missing/corrupted, even the system is booted normally.
  2. otaclient cannot correctly migrates non-ota-partition booted system in some cases.
  3. grub boot_control implements the detailed ota_status files control, including reading/writing/updating ota_status files during each OTA stages, switch boot mode detection, ota_status transition, etc.
  4. grub boot_control prepares slots devices and mounting slots directly using commands provided by CMDHelperFuncs.
  5. during finalizing switch boot, otaclient does ota-partition symlink switch before doing grub-update.
  6. in _prepare_kernel_initrd_links, grub boot_control will use the first bound kernel and first found initrd.img, but these two files might have different version.

Behavior with this PR

  1. otaclient now will not crash on ota-partition symlink missing, but instead try to fix it first or migrate.
  2. otaclient now can properly migrate from non-ota-partition booted system.
  3. grub boot_control no-longer directly managing ota_status files, but calling corresponding APIs fromOTAStatusFilesControl during each OTA stages.
  4. grub boot_control no-longer directly managing slots mounting/preparing, but calling corresponding APIs from SlotMountHelper during each OTA stages.
  5. behavior of finalizing switch boot now aligns with documentation, which otaclient will do grub-update before switching ota-partition symlink.
  6. in _prepare_kernel_initrd_links, grub boot control will use the first found kernel, and use its version string to find corresponding initrd.img.

Breaking change

Does this PR introduce breaking change?

  • No.

@Bodong-Yang Bodong-Yang added the refinement Improve the performance, code quality(like improving code structure/readability/error handling/robus label Sep 6, 2023
@Bodong-Yang Bodong-Yang marked this pull request as draft September 13, 2023 08:12
@Bodong-Yang
Copy link
Member Author

@obi-t4 Some changes are required due to update to the documentation, so I will turn the PR to draft for now.

@Bodong-Yang Bodong-Yang marked this pull request as ready for review September 19, 2023 02:46
Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm yet reviewing but did you confirm that this is working on the USB installer environment?

Comment on lines +58 to 59
GRUB_CFG_FNAME: str = "grub.cfg"
GRUB_CFG_PATH: str = "/boot/grub/grub.cfg"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GRUB_CFG_FNAME: str = "grub.cfg"
GRUB_CFG_PATH: str = "/boot/grub/grub.cfg"
GRUB_CFG_FNAME: str = "grub.cfg"
GRUB_CFG_PATH: str = f"/boot/grub/{GRUB_CFG_FNAME}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And fix spelling inconsistencies: FNAME and FILE

GRUB_CFG_FNAME
..
BOOT_OTA_PARTITION_FILE

Copy link
Member Author

@Bodong-Yang Bodong-Yang Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obi-t4 Ah about the constant naming, currently I am using the following rules:

  1. FNAME means the constant only contains the file name,
  2. FPATH means the constant contains the file full path,

A single FILE is not good, I will prepare another PR to update the configs constant naming.

@Bodong-Yang
Copy link
Member Author

Bodong-Yang commented Sep 21, 2023

I'm yet reviewing but did you confirm that this is working on the USB installer environment?

@obi-t4 Yes, otaclient grub boot controller supports automatically migrating a normal ubuntu installation to use ota-partition boot mechanism(and it will take effect starts from next reboot).

@Bodong-Yang
Copy link
Member Author

Bodong-Yang commented Sep 21, 2023

( More precisely is, otaclient grub boot controller supports automatically migrating any system booted with general grub installation(booted with /boot/grub/grub.cfg, /boot/vmlinuz and /boot/initrd.img) with A/B partition presented to use ota-partition boot mechanism.

Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, please have a check.

def get_active_slot(self) -> str:
return self.active_slot
def __init__(self) -> None:
ab_detecter = GrubABPartitionDetecter()
Copy link
Contributor

@obi-t4 obi-t4 Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the correct spelling is detector.

Suggested change
ab_detecter = GrubABPartitionDetecter()
ab_detector = GrubABPartitionDetector()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bodong-Yang Do you have a plan to fix this spelling in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obi-t4 Ah sorry I am waiting for the test result from your side. After test is completed I will resolve all the comments at once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and fixed at 48c06ac.

Comment on lines +669 to +670
self._prepare_kernel_initrd_links(self.active_ota_partition_folder)
self._ensure_ota_partition_symlinks(active_slot=self.active_slot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to know why these two lines are needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two methods are to ensure the required symlinks are presented at the time just before grub-reboot is called.

If required symlinks are presented, these two methods won't do others things, otherwise they will re-generate the missing symlinks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the re_symlink_atomic method used by these two methods will check the symlink before actually creating symlink again. So if the symlink is presented and already points to correct target, this method will do nothing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comments for this method are added at ebb0c4c.

@obi-t4
Copy link
Contributor

obi-t4 commented Sep 21, 2023

@Bodong-Yang
I'll approve this PR if USB installer is OK. Image building w/ this PR has been started.

@obi-t4
Copy link
Contributor

obi-t4 commented Sep 22, 2023

NOTE:
20230921-otaclient-grub-robust (USB install)
-> 20220621083828
-> 20221007121134-20.04-otaclient-v3.1.6
-> 20230921-otaclient-grub-robust

@obi-t4
Copy link
Contributor

obi-t4 commented Sep 22, 2023

@Bodong-Yang Test is working as expected!

@Bodong-Yang
Copy link
Member Author

Bodong-Yang commented Sep 22, 2023

@Bodong-Yang Test is working as expected!

@obi-t4 I will resolve all the pending PR comments now, please wait a moment.

Updated: Done.

@Bodong-Yang Bodong-Yang enabled auto-merge (squash) September 22, 2023 07:37
Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Bodong-Yang Bodong-Yang merged commit 1403768 into main Sep 22, 2023
3 checks passed
@Bodong-Yang Bodong-Yang deleted the refine/grub_robust branch September 22, 2023 08:04
@yn-mrse
Copy link

yn-mrse commented Dec 19, 2023

https://github.com/tier4/autoware_ecu_system_setup/pull/817 発行のタイミングでOTA image作成が実施されており、正常動作確認はパスしている。
このPRとしてのパーティション等復旧機能自体はユニットテストを全てパスしている。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refinement Improve the performance, code quality(like improving code structure/readability/error handling/robus
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants