New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add suspend stats related check (Bugfix) #1700

Open

hanhsuan wants to merge 4 commits into main from add_suspend_stats_check

+329 −0

Contributor

hanhsuan commented Jan 24, 2025 •

edited

Loading

Description

WARNING: This modifies com.canonical.certification::sru-server

According to the suspend_advanced_auto is the dependency for all after suspend jobs, it would be better to do more tests for this concept to fix #1579.
Therefore, this PR is the first step to collect information by:

valid_suspend_status is used to check success should be non zero
any_fail_during_supend is used to check fail should be zero

If everything woks well, valid_suspend_status will be merge into suspend_advanced_auto in the future.

Resolved issues

Documentation

Tests

succ 202001-27667
fail 202412-36131

hanhsuan added 2 commits

January 24, 2025 10:45


          This is the first step to fix #1579.

9b1cd14

1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail


          fix black error

c0c2536

codecov bot commented Jan 24, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 95.45455% with 2 lines in your changes missing coverage. Please review.

Project coverage is 49.06%. Comparing base (e5b7c6f) to head (3684954).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
providers/base/bin/suspend_stats.py	95.45%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   49.01%   49.06%   +0.05%     
==========================================
  Files         372      373       +1     
  Lines       40348    40392      +44     
  Branches     6817     6827      +10     
==========================================
+ Hits        19777    19819      +42     
- Misses      19849    19851       +2     
  Partials      722      722

Flag	Coverage Δ
provider-base	`24.99% <95.45%> (+0.19%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hanhsuan added 2 commits

January 24, 2025 13:30


          Change to check fail only

10872f3


          fix black error

hanhsuan marked this pull request as ready for review

January 24, 2025 05:38

hanhsuan changed the title ~~Add suspend stats releated check (Bugfix)~~ Add suspend stats related check (Bugfix)

pieqq assigned Hook25

Hook25 requested changes

View reviewed changes

Collaborator

Hook25 left a comment •

edited

Loading

Well done, I really like what I see and I wouldn't be against merging (the half that checks we did actually suspend) in suspend advanced auto.

I have 2 big worries:

Is this supported on any kernel we have to support? Even xenial? Can you check? If not, what do we do about it?
Does this work from a strict snap? Can you check?

providers/base/bin/suspend_stats.py

Comment on lines +46 to +53

+                      for root, dirs, files in os.walk(search_directory):
+                          for file_name in files:
+                              file_path = os.path.join(root, file_name)
+                              with open(
+                                  file_path, "r", encoding="utf-8", errors="ignore"
+                              ) as file:
+                                  content[file_name] = file.read().splitlines()[0]
+                      return content

Collaborator

Hook25 Jan 29, 2025

Slightly easier to read imo

Suggested change

      
                    for root, dirs, files in os.walk(search_directory):
          
                        for file_name in files:
          
                            file_path = os.path.join(root, file_name)
          
                            with open(
          
                                file_path, "r", encoding="utf-8", errors="ignore"
          
                            ) as file:
          
                                content[file_name] = file.read().splitlines()[0]
          
                    return content
          
                    search_directory = Path(search_directory)
          
                    for p in filter(lambda x: x.is_file(), search_directory.iterdir()):
          
                        content[p.name], *_ = p.read_text().splitlines(1)
          
                    return content

providers/base/bin/suspend_stats.py

Comment on lines +60 to +61

		for c in self.contents:
		print("{}:{}".format(c, self.contents[c]))

Collaborator

Hook25 Jan 29, 2025

Suggested change

      
                    for c in self.contents:
          
                        print("{}:{}".format(c, self.contents[c]))
          
                    for c, v in self.contents.items():
          
                        print("{}:{}".format(c, v))

providers/base/bin/suspend_stats.py

Comment on lines +69 to +74

+                      return (
+                          self.contents["success"] != "0"
+                          and self.contents["failed_prepare"] == "0"
+                          and self.contents["failed_suspend"] == "0"
+                          and self.contents["failed_resume"] == "0"
+                      )

Collaborator

Hook25 Jan 29, 2025

Big questoion is, why are we comparing these values as strings? Is it even possible that these are not valid integers? if that happens I suppose its a big deal, for this reason I've also suggested to remove the utf8/ignore errors above, if sysfs is broken we want to know

Another big question is what do we do if one or more of these files is missing? is it possible? if it is, when does it happen? should we have a default for those situations?

providers/base/bin/suspend_stats.py

Comment on lines +116 to +122

+                      parser_any.add_argument(
+                          "-p",
+                          "--print",
+                          dest="print",
+                          action="store_true",
+                          help="Print content",
+                      )

Collaborator

Hook25 Jan 29, 2025

Given how this will be used/how this works, I would like it to always print, remove this parameter

providers/base/bin/suspend_stats.py

Comment on lines +104 to +110

+                      parser_valid.add_argument(
+                          "-p",
+                          "--print",
+                          dest="print",
+                          action="store_true",
+                          help="Print content",
+                      )

Collaborator

Hook25 Jan 29, 2025

ditto, remove this

providers/base/bin/suspend_stats.py

+                      subparsers.required = True
+                      # Add parser for validating the system is after suspend or not
+                      parser_valid = subparsers.add_parser(

Collaborator

Hook25 Jan 29, 2025

This is a bit of a weird usage of subparsers, I would use an argument (check_kind or something) that is a choice between:

after_suspend: Check that the system is after suspend or fail
any_failure: Check that no suspend failure is reported in sysfs or fail

providers/base/bin/suspend_stats.py

Comment on lines +126 to +139

+                  def main(self):
+                      args = self.parse_args()
+                      if args.type == "valid":
+                          if args.print:
+                              self.print_all_content()
+                          if not self.is_after_suspend():
+                              raise SystemExit("System is not under after suspend status")
+                      elif args.type == "any":
+                          if args.print:
+                              self.print_all_content()
+                          if self.is_any_failed():
+                              raise SystemExit(
+                                  "There are [{}] failed".format(self.contents["fail"])
+                              )

Collaborator

Hook25 Jan 29, 2025

If the sysfs nodes may not be there please check here and... lets discuss what to do

Consider the following:

Suggested change

      
                def main(self):
          
                    args = self.parse_args()
          
                    if args.type == "valid":
          
                        if args.print:
          
                            self.print_all_content()
          
                        if not self.is_after_suspend():
          
                            raise SystemExit("System is not under after suspend status")
          
                    elif args.type == "any":
          
                        if args.print:
          
                            self.print_all_content()
          
                        if self.is_any_failed():
          
                            raise SystemExit(
          
                                "There are [{}] failed".format(self.contents["fail"])
          
                            )
          
                def main(self):
          
                    args = self.parse_args()
          
                    self.print_all_content()
          
                    if args.type == "valid":
          
                        if not self.is_after_suspend():
          
                            raise SystemExit("System is not after suspend (didn't suspend even once)")
          
                        print("System is after suspend")
          
                    elif args.type == "any":
          
                        if self.is_any_failed():
          
                            raise SystemExit(
          
                                "There are [{}] suspend failures reported in sysfs".format(self.contents["fail"])
          
                            )
          
                        print("No suspend failure reported in sysfs")

providers/base/units/suspend/suspend.pxu

Comment on lines +1720 to +1730

+              id: suspend/valid_suspend_status
+              plugin: shell
+              category_id: com.canonical.plainbox::suspend
+              depends: suspend/suspend_advanced_auto
+              estimated_duration: 5s
+              command: suspend_stats.py valid -p
+              summary:
+                  Test this machine suspend status is success or not
+              description:
+                  If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
+                  and the /sys/power/suspend_stats/failed_suspend will be zero

Collaborator

Hook25 Jan 29, 2025

(removed -p in line with above, see if you like the new interface)

Slightly changed wording:

Suggested change

      
            id: suspend/valid_suspend_status
          
            plugin: shell
          
            category_id: com.canonical.plainbox::suspend
          
            depends: suspend/suspend_advanced_auto
          
            estimated_duration: 5s
          
            command: suspend_stats.py valid -p
          
            summary:
          
                Test this machine suspend status is success or not
          
            description:
          
                If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
          
                and the /sys/power/suspend_stats/failed_suspend will be zero
          
            id: suspend/validate_suspend_status
          
            plugin: shell
          
            category_id: com.canonical.plainbox::suspend
          
            depends: suspend/suspend_advanced_auto
          
            estimated_duration: 5s
          
            command: suspend_stats.py after_suspend
          
            _summary:  Tests that the machine suspended correctly
          
            _description:
          
                Query sysfs for the amount of times the system was suspended. We expect the 
          
                count in /sys/power/suspend_stats/success to be non zero

providers/base/units/suspend/suspend.pxu

Comment on lines +1732 to +1743

+              id: suspend/any_fail_during_suspend
+              plugin: shell
+              category_id: com.canonical.plainbox::suspend
+              depends: suspend/suspend_advanced_auto
+              estimated_duration: 5s
+              command: suspend_stats.py any -p
+              summary:
+                  Test all devices in this machine suspend status is failed or not
+              description:
+                  If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
+                  and the /sys/power/suspend_stats/fail will be zero. If there is device failed,
+                  /sys/power/suspend_stats/last_failed_dev will show the failed device

Collaborator

Hook25 Jan 29, 2025

Suggested change

      
            id: suspend/any_fail_during_suspend
          
            plugin: shell
          
            category_id: com.canonical.plainbox::suspend
          
            depends: suspend/suspend_advanced_auto
          
            estimated_duration: 5s
          
            command: suspend_stats.py any -p
          
            summary:
          
                Test all devices in this machine suspend status is failed or not
          
            description:
          
                If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
          
                and the /sys/power/suspend_stats/fail will be zero. If there is device failed,
          
                /sys/power/suspend_stats/last_failed_dev will show the failed device
          
            id: suspend/any_suspend_failure
          
            plugin: shell
          
            category_id: com.canonical.plainbox::suspend
          
            depends: suspend/suspend_advanced_auto
          
            estimated_duration: 5s
          
            command: suspend_stats.py any_failure 
          
            summary:   Tests if any device in this machine reports a suspend failure
          
            description:
          
                If the DUT suspended successfully, /sys/power/suspend_stats/fail will be zero. 
          
                If a device failed /sys/power/suspend_stats/last_failed_dev will show the failed device

providers/base/units/suspend/test-plan.pxu

Comment on lines +49 to +50

		suspend/valid_suspend_status
		suspend/any_fail_during_suspend

Collaborator

Hook25 Jan 29, 2025

If you like the new ids:

Suggested change

      
                suspend/valid_suspend_status
          
                suspend/any_fail_during_suspend
          
                suspend/validate_suspend_status
          
                suspend/any_suspend_failure

Collaborator

Hook25 commented Jan 31, 2025

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet