Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] grub: workaround for OTA failure/otaclient crashing if rootfs is not sda #258

Merged
merged 9 commits into from
Nov 6, 2023

Conversation

Bodong-Yang
Copy link
Member

@Bodong-Yang Bodong-Yang commented Oct 27, 2023

Description

This PR applies a workaround to deal with otaclient crashing and OTA failed when rootfs device is not assigned name as sda.

With this PR, the slot naming is decoupled with the actual device name, the slot name will be fixed to sda<partition_id>, and otaclient will search for ota-partition folder with the slot name.
For example, if rootfs device is nvme0n1, and active dev: nvme0n1p3, standby dev: nvme0n1p4, then the slot_id for active slot is still sda3, for standby slot is still sda4.

WIth workaround implemented this PR, OTA will work if no matter when the rootfs device's name changed and what name is assigned to rootfs device.

Warning

Deprecation Warning: the current slot managing/detecting mechanism design is proved to be not so robust, with the inaccurate assumption that rootfs device will always be assigned name as sda.

This PR is served as a quick workaround to make otaclient properly functioning if the above assumption is broken due to rootfs device is not name as sda. This PR doesn't mean to solve the problem fundamentally, but just for maintaining backward compatibility.

Important

A new mechanism to managing/detecting slot will be introduced in the future to replace the current mechanism.

Check list

  • local test files are passed.
  • normal OTA on VM works.
  • OTA when rootfs is not sda works.
  • rootfs naming changed between OTA reboot works.
  • extra ota-partition folder with name other than ota-partition.sda doesn't impact otaclient's functionality.

Check issue reproduce and testing report for more details.

Bug fix

Current behavior

  1. otaclient will crash during OTA finalizing switching boot if ota-partition folders other than ota-partition.sda* presented,
  2. OTA will fail if rootfs device's name changed before and after OTA reboot,
  3. OTA status will be lost if rootfs device's name changed, corresponding extra ota-partition folders will be created unexpectedly(i.e., ota-partition.sdb*, ota-partition.nvme0n1p*, etc).

Behaivor after fix

  1. slot naming is fixed to sda<partition_id>,
  2. OTA will work and otaclient will not crash even if rootfs device's name changed at any time, otaclient will always locate the correct ota-partition folder.

Related links & ticket

RT4-7472

TODO

(unrelated) the keys and certs for testing metadata are expired, in the future generate these keys and certs on every test.

@Bodong-Yang Bodong-Yang requested a review from obi-t4 October 27, 2023 04:08
@Bodong-Yang Bodong-Yang self-assigned this Oct 27, 2023
@Bodong-Yang Bodong-Yang marked this pull request as ready for review October 27, 2023 07:00
@github-actions
Copy link
Contributor

github-actions bot commented Oct 27, 2023

Coverage

Coverage Report
FileStmtsMissCoverMissing
otaclient/app
   __init__.py00100% 
   __main__.py330%16–19
   common.py2681495%55, 185, 200, 275–277, 287, 296–298, 404, 416, 535–539
   configs.py750100% 
   copy_tree.py81396%45, 96, 125
   downloader.py2714484%70, 83–84, 299, 304, 308, 326–327, 377–381, 400–410, 431–449, 458, 533–535, 551, 571–589
   ecu_info.py59788%78–79, 92, 97, 120–121, 130
   errors.py179597%55, 58, 116, 119, 139
   interface.py17476%32, 43, 47, 51
   log_setting.py26773%27–28, 55–65
   main.py31294%43–44
   ota_client.py3188773%69–70, 92, 198–222, 259–261, 271–276, 286–288, 331–332, 337–344, 357–371, 377–380, 405–429, 443–448, 457, 460–468, 549, 554–572
   ota_client_call.py38684%42–44, 80–82
   ota_client_service.py29390%58–60
   ota_client_stub.py40511472%79–84, 92–102, 106–128, 133–142, 145–152, 160–165, 207–209, 214, 249, 274, 277, 280, 384, 410, 412, 438, 488, 550, 620–621, 660, 680–691, 695–709, 713–716, 775, 862–871, 903–953, 956
   ota_metadata.py3153190%147, 152, 188–189, 199–203, 215, 273, 306–308, 325–328, 408, 411, 419–421, 434, 443–448, 721–732
   ota_status.py15287%36, 44
   proxy_info.py51786%82–87, 122–129, 137–138
   update_stats.py105298%162, 172
otaclient/app/boot_control
   __init__.py40100% 
   _cboot.py28613951%91–99, 108–121, 127–130, 134, 141, 145, 149, 153–157, 161–165, 170–189, 192–235, 241, 244, 247, 250, 253, 256, 259, 262–263, 266–267, 270–274, 277–278, 293, 296, 329, 356–359, 369–372, 380, 419, 436–439, 448–451, 456–458, 467–473, 476, 483, 513–515, 552–554, 557–567, 570–575
   _common.py36914361%65–74, 83–84, 88–89, 97–98, 109–110, 115–120, 125–130, 139, 148, 152, 156, 160, 174–177, 187–191, 196, 200, 210–214, 222–234, 245–249, 253–257, 269–271, 283–285, 296–310, 320–341, 360–376, 393–404, 412–418, 513–523, 582–583, 637, 641–642, 645, 653–656, 712–713, 716, 730–738, 753–754, 828–831, 852–855, 868–869, 878
   _errors.py471960%43–69, 89–90
   _grub.py3809376%221, 269–272, 278–282, 319–320, 327–347, 358–364, 373–379, 458–464, 515, 521, 547, 569–574, 589, 618–627, 682–687, 745–747, 789, 795, 815–818, 830, 833, 836, 839, 843–845, 863–865, 891–893, 896–903, 906–913
   _rpi_boot.py26012153%89–91, 97–152, 176–178, 184–186, 199–201, 207–209, 222–243, 248, 252, 256, 260, 294, 324, 327–329, 337, 346–348, 358–361, 365–372, 451–455, 474–477, 482, 485, 506–511, 514–522, 525–533, 545–548, 552–554, 557
   configs.py58297%46–47
   firmware.py32584%63, 65, 76–78
   protocol.py28582%53, 57, 61, 65, 73
   selecter.py392731%45–69, 77–94
otaclient/app/create_standby
   __init__.py12558%30–36
   common.py2164579%74, 77–78, 82–95, 141, 189–197, 200–203, 207, 218, 292–300, 312, 356–361, 377–378, 392–396, 421–422
   interface.py19384%55, 59, 63
   rebuild_mode.py99991%83–101, 128
otaclient/app/proto
   __init__.py31390%37, 44–45
   _common.py4194988%85, 163, 170, 182–184, 203, 208, 219, 256, 262, 267, 298, 302, 306, 363, 380, 403, 464, 471, 474, 494, 501, 503, 529, 535, 538, 540, 565, 571, 574, 576, 605, 609, 611, 625, 642, 670, 673, 677, 680, 708, 714, 761–766, 797
   _ota_metafiles_wrapper.py841385%37, 40–42, 112–116, 122–125
   _otaclient_v2_pb2_wrapper.py2763687%87, 90–93, 175, 183, 197, 207, 210–213, 258, 261, 264–265, 285, 305, 385, 452, 505, 513–515, 519–522, 525–530, 551, 559, 573, 581, 595
   streamer.py43881%32, 47, 65–66, 71, 80–81, 99
   wrapper.py40100% 
otaclient/ota_proxy
   __init__.py46393%86, 90, 94
   __main__.py22220%16–79
   _consts.py150100% 
   cache_control.py68494%71, 91, 113, 121
   config.py180100% 
   db.py1461590%75, 81, 103, 113–116, 145–147, 166, 199, 208–209, 229, 258, 300
   errors.py100100% 
   orm.py1151190%37, 91, 96, 101, 107, 113, 140–141, 154, 231, 235
   ota_cache.py4047581%95–96, 215, 226, 253–255, 275, 291–294, 317–330, 359–363, 379, 440–441, 483–484, 554, 567–570, 620, 639–640, 672–673, 684, 721–728, 735–736, 741–743, 747, 756, 803, 811–813, 892–895, 899–903, 917–934, 965, 971, 998, 1027–1029
   server_app.py1383972%75, 78, 84, 100, 102, 161–170, 212–243, 256–266, 292–298, 312–314, 320–322
   utils.py23196%31
TOTAL5997123679% 

@Bodong-Yang
Copy link
Member Author

TODO: generate the keys/certs for test use on every test run.

Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to ask one question.

tests/keys/interm.pem Outdated Show resolved Hide resolved
Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment, please have a check!

slot_name = dev_path.lstrip("/dev/")
_dev_path_ma = self.DEV_PATH_PA.match(dev_path)
assert _dev_path_ma and (
_pid := _dev_path_ma.group("partition_id")
Copy link
Contributor

@obi-t4 obi-t4 Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure that the operation with side-effects(=assignment) should not be used with the assert. assert can be removed by the option.
It is OK if _pid is assigned out side of the assert expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at 74d4dd3.

slot_name = dev_path.lstrip("/dev/")
_dev_path_ma = self.DEV_PATH_PA.match(dev_path)
assert _dev_path_ma and (
_pid := _dev_path_ma.group("partition_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same assignment in the assert expression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated at a92b60a.

Copy link
Contributor

@obi-t4 obi-t4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@Bodong-Yang Bodong-Yang merged commit c000645 into main Nov 6, 2023
3 checks passed
@Bodong-Yang Bodong-Yang deleted the fix/grub_slot branch November 6, 2023 02:11
@obi-t4
Copy link
Contributor

obi-t4 commented Nov 6, 2023

FYI: 2Mbps test has been passed.

Bodong-Yang added a commit that referenced this pull request Nov 8, 2023
Commits from v3.6.0 to v3.6.1

* Bug fix
1. [Fix] grub: fix unintended behavior when updating grub_default #253
2. (Major) [Fix] grub: workaround for OTA failure/otaclient crashing if rootfs is not sda #258

* Refinement
1. [Chore] minor update and fix to tools.status_monitor #256
2. (Major) [Refinement] refine error handling, error report on status API now covers boot controller startup failure #259

* Chore
1. [Chore] minor update and fix to tools.status_monitor #256
@yn-mrse
Copy link

yn-mrse commented Dec 19, 2023

Check issue reproduce and testing report for more details.
より問題ないと判断しました。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants