Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add suspend stats related check (Bugfix) #1700

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hanhsuan
Copy link
Contributor

@hanhsuan hanhsuan commented Jan 24, 2025

Description

WARNING: This modifies com.canonical.certification::sru-server

According to the suspend_advanced_auto is the dependency for all after suspend jobs, it would be better to do more tests for this concept to fix #1579.
Therefore, this PR is the first step to collect information by:

  1. valid_suspend_status is used to check success should be non zero
  2. any_fail_during_supend is used to check fail should be zero

If everything woks well, valid_suspend_status will be merge into suspend_advanced_auto in the future.

Resolved issues

#1579

Documentation

Tests

succ 202001-27667
fail 202412-36131

1. test case is_suspend_success will be merged into
   suspend_advanced_auto
2. is_device_supend_success is used to check any fail
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 95.45455% with 2 lines in your changes missing coverage. Please review.

Project coverage is 49.06%. Comparing base (e5b7c6f) to head (3684954).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
providers/base/bin/suspend_stats.py 95.45% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1700      +/-   ##
==========================================
+ Coverage   49.01%   49.06%   +0.05%     
==========================================
  Files         372      373       +1     
  Lines       40348    40392      +44     
  Branches     6817     6827      +10     
==========================================
+ Hits        19777    19819      +42     
- Misses      19849    19851       +2     
  Partials      722      722              
Flag Coverage Δ
provider-base 24.99% <95.45%> (+0.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hanhsuan hanhsuan marked this pull request as ready for review January 24, 2025 05:38
@hanhsuan hanhsuan changed the title Add suspend stats releated check (Bugfix) Add suspend stats related check (Bugfix) Jan 24, 2025
Copy link
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, I really like what I see and I wouldn't be against merging (the half that checks we did actually suspend) in suspend advanced auto.

I have 2 big worries:

  1. Is this supported on any kernel we have to support? Even xenial? Can you check? If not, what do we do about it?
  2. Does this work from a strict snap? Can you check?

Comment on lines +46 to +53
for root, dirs, files in os.walk(search_directory):
for file_name in files:
file_path = os.path.join(root, file_name)
with open(
file_path, "r", encoding="utf-8", errors="ignore"
) as file:
content[file_name] = file.read().splitlines()[0]
return content
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly easier to read imo

Suggested change
for root, dirs, files in os.walk(search_directory):
for file_name in files:
file_path = os.path.join(root, file_name)
with open(
file_path, "r", encoding="utf-8", errors="ignore"
) as file:
content[file_name] = file.read().splitlines()[0]
return content
search_directory = Path(search_directory)
for p in filter(lambda x: x.is_file(), search_directory.iterdir()):
content[p.name], *_ = p.read_text().splitlines(1)
return content

Comment on lines +60 to +61
for c in self.contents:
print("{}:{}".format(c, self.contents[c]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for c in self.contents:
print("{}:{}".format(c, self.contents[c]))
for c, v in self.contents.items():
print("{}:{}".format(c, v))

Comment on lines +69 to +74
return (
self.contents["success"] != "0"
and self.contents["failed_prepare"] == "0"
and self.contents["failed_suspend"] == "0"
and self.contents["failed_resume"] == "0"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big questoion is, why are we comparing these values as strings? Is it even possible that these are not valid integers? if that happens I suppose its a big deal, for this reason I've also suggested to remove the utf8/ignore errors above, if sysfs is broken we want to know

Another big question is what do we do if one or more of these files is missing? is it possible? if it is, when does it happen? should we have a default for those situations?

Comment on lines +116 to +122
parser_any.add_argument(
"-p",
"--print",
dest="print",
action="store_true",
help="Print content",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how this will be used/how this works, I would like it to always print, remove this parameter

Comment on lines +104 to +110
parser_valid.add_argument(
"-p",
"--print",
dest="print",
action="store_true",
help="Print content",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, remove this

subparsers.required = True

# Add parser for validating the system is after suspend or not
parser_valid = subparsers.add_parser(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a weird usage of subparsers, I would use an argument (check_kind or something) that is a choice between:

  • after_suspend: Check that the system is after suspend or fail
  • any_failure: Check that no suspend failure is reported in sysfs or fail

Comment on lines +126 to +139
def main(self):
args = self.parse_args()
if args.type == "valid":
if args.print:
self.print_all_content()
if not self.is_after_suspend():
raise SystemExit("System is not under after suspend status")
elif args.type == "any":
if args.print:
self.print_all_content()
if self.is_any_failed():
raise SystemExit(
"There are [{}] failed".format(self.contents["fail"])
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the sysfs nodes may not be there please check here and... lets discuss what to do

Consider the following:

Suggested change
def main(self):
args = self.parse_args()
if args.type == "valid":
if args.print:
self.print_all_content()
if not self.is_after_suspend():
raise SystemExit("System is not under after suspend status")
elif args.type == "any":
if args.print:
self.print_all_content()
if self.is_any_failed():
raise SystemExit(
"There are [{}] failed".format(self.contents["fail"])
)
def main(self):
args = self.parse_args()
self.print_all_content()
if args.type == "valid":
if not self.is_after_suspend():
raise SystemExit("System is not after suspend (didn't suspend even once)")
print("System is after suspend")
elif args.type == "any":
if self.is_any_failed():
raise SystemExit(
"There are [{}] suspend failures reported in sysfs".format(self.contents["fail"])
)
print("No suspend failure reported in sysfs")

Comment on lines +1720 to +1730
id: suspend/valid_suspend_status
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py valid -p
summary:
Test this machine suspend status is success or not
description:
If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
and the /sys/power/suspend_stats/failed_suspend will be zero
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(removed -p in line with above, see if you like the new interface)

Slightly changed wording:

Suggested change
id: suspend/valid_suspend_status
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py valid -p
summary:
Test this machine suspend status is success or not
description:
If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
and the /sys/power/suspend_stats/failed_suspend will be zero
id: suspend/validate_suspend_status
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py after_suspend
_summary: Tests that the machine suspended correctly
_description:
Query sysfs for the amount of times the system was suspended. We expect the
count in /sys/power/suspend_stats/success to be non zero

Comment on lines +1732 to +1743
id: suspend/any_fail_during_suspend
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py any -p
summary:
Test all devices in this machine suspend status is failed or not
description:
If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
and the /sys/power/suspend_stats/fail will be zero. If there is device failed,
/sys/power/suspend_stats/last_failed_dev will show the failed device
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
id: suspend/any_fail_during_suspend
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py any -p
summary:
Test all devices in this machine suspend status is failed or not
description:
If the suspend is successed, the count in /sys/power/suspend_stats/success will be non zero
and the /sys/power/suspend_stats/fail will be zero. If there is device failed,
/sys/power/suspend_stats/last_failed_dev will show the failed device
id: suspend/any_suspend_failure
plugin: shell
category_id: com.canonical.plainbox::suspend
depends: suspend/suspend_advanced_auto
estimated_duration: 5s
command: suspend_stats.py any_failure
summary: Tests if any device in this machine reports a suspend failure
description:
If the DUT suspended successfully, /sys/power/suspend_stats/fail will be zero.
If a device failed /sys/power/suspend_stats/last_failed_dev will show the failed device

Comment on lines +49 to +50
suspend/valid_suspend_status
suspend/any_fail_during_suspend
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you like the new ids:

Suggested change
suspend/valid_suspend_status
suspend/any_fail_during_suspend
suspend/validate_suspend_status
suspend/any_suspend_failure

@Hook25
Copy link
Collaborator

Hook25 commented Jan 31, 2025

Adding this I dug up from the kernel docs: https://github.com/torvalds/linux/blob/69e858e0b8b2ea07759e995aa383e8780d9d140c/Documentation/ABI/testing/sysfs-power#L317C9-L317C40

If this means what I think it means, then we need a fallback mechanism

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

suspend/suspend_advanced_auto always returns passed result
2 participants