Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update baseline script to ignore emerg logs that irq handler is missing #496

Closed
musamaanjum opened this issue Nov 26, 2024 · 9 comments · Fixed by kernelci/kernelci-core#2751
Assignees

Comments

@musamaanjum
Copy link

musamaanjum commented Nov 26, 2024

An emergy log is printed if no irq handler is found in the vector during boot. Because of this log, we get lots of false positives. Investigate first if it is safe to ignore, then update the baseline script to ignore these. In this way, the results would be cleaner. Example job

kern  :emerg : call_irq_handler: 1.55 No irq handler for vector

kern  :emerg : call_irq_handler: 2.55 No irq handler for vector
kern  :emerg : call_irq_handler: 3.55 No irq handler for vector
[   12.324058] <LAVA_SIGNAL_TESTCASE TEST_CASE_ID=emerg RESULT=fail UNITS=lines MEASUREMENT=3>

The updates are needed in ./config/rootfs/debos/overlays/baseline/opt/kernelci/dmesg.sh.

@musamaanjum musamaanjum self-assigned this Nov 26, 2024
@musamaanjum musamaanjum converted this from a draft issue Nov 26, 2024
@musamaanjum
Copy link
Author

dmesg.sh file is copied into the rootfs at creation time. We'll have to build the rootfs again.

I'm not sure why dmesg.sh file is copied from kernelci.org instead of master. It seems kernelci.org branch was being kept to store updated data files. @nuclearcat Are we still using kernelci.org branch or should I update the dmesg.sh file in the master?

@laura-nao
Copy link

laura-nao commented Nov 27, 2024

@musamaanjum thanks for looking into this! There was a discussion about this when we first enabled baseline/baseline-nfs tests on Chromebooks (see kernelci/kernelci-core#2483). The errors are known and harmless on zork and grunt devices (and their derivatives). There's a way to exclude specific Tast tests that are known/expected to fail (see: kernelci/kernelci-core#2489) but I don't think we have a counterpart for non-ChromeOS tests at the moment.

I think it would be useful to auto-match these errors on the affected boards, so the tests will keep failing but the users will be able to see it's a known issue (and will be able to filter the results out as well).

@helen-fornazier I'm not up to speed on the status of auto-matching issues, is this something we can already add or is it still in the works?

@laura-nao
Copy link

Duplicate of kernelci/kernelci-core#2483

@laura-nao laura-nao marked this as a duplicate of kernelci/kernelci-core#2483 Nov 27, 2024
@musamaanjum
Copy link
Author

@laura-nao Thanks for letting me know about this. I didn't know. Let's wait for @helen-fornazier to comment. Let's try to expedite the solution as it is going to help me a lot in opening fewer failed reports and speed up my overall work.

@musamaanjum
Copy link
Author

Let's close this ticket and continue discussion over kernelci/kernelci-core#2483.

@musamaanjum
Copy link
Author

As discussed in the KernelCI weekly, as we don't have the ability to ignore known issues at this time, let's update the dmesg.sh script to ignore these warnings.

@musamaanjum musamaanjum moved this from Todo to In Progress in KernelCI results data evaluation Nov 28, 2024
@laura-nao
Copy link

As discussed in the KernelCI weekly, as we don't have the ability to ignore known issues at this time, let's update the dmesg.sh script to ignore these warnings.

How about auto-generated issues based on log matching, as mentioned in #496 (comment)? Has this option already been excluded?

I don't think completely hiding the warnings is a good idea, we know they're harmless on grunt and zork devices but I'm not sure they won't be fatal on other platforms. And filtering out the messages in the dmesg.sh script itself only for specific boards looks a bit hacky.

@musamaanjum
Copy link
Author

In the meeting, participants agreed that Auto log matching would be the best solution. But as we don't have any plan to add this, we can ignore these specific warnings temporarily. Additionally, I'll try to report them to upstream to see if we can reduce their log level if they aren't important errors.
cc: @nuclearcat @spbnick

@laura-nao
Copy link

Another option that comes to mind is related to Grafana’s log_line field (in the All tests view), which uses regex to match specific error messages in the logs. Sometimes, the kern :emerg : call_irq_handler: 3.55 No irq handler for vector message is matched correctly, but in other cases, different errors are prioritized. If I remember correctly, these regex patterns were manually set up to catch recurring issues. Maybe it’s possible to adjust them so that the IRQ handler message gets matched first? This way we can easily identify these cases and filter them out.

It’s not an ideal solution, but since we’re considering temporary workarounds, this could be an alternative to modifying the dmesg.sh script, which was supposed to be generic.

musamaanjum pushed a commit to musamaanjum/kernelci-core that referenced this issue Dec 4, 2024
Ignore "No irq handler for vector" errors in baseline tests [1]. These
errors are harmless. But because of these errors the baseline test gets
marked as failed which makes results dirty. It would have been better to
ignore these errors as known errors on the dashboard. But as we don't
have the functionality at this time. We have decided to ignore these
during the test until we have the functionality.

[1] https://kcidb.kernelci.org/d/test/test?orgId=1&var-datasource=default&var-build_architecture=x86_64&var-build_config_name=cros:%2F%2Fchromeos-6.6%2Fx86_64%2Fchromeos-amd-stoneyridge.flavour.config&var-id=maestro:67455b8c3be6da94b19fde34&from=now-100y&to=now&timezone=browser&var-origin=$__all&var-test_path=&var-issue_presence=$__all
Close kernelci/kernelci-project#496
Signed-off-by: Muhammad Usama Anjum <[email protected]>
github-merge-queue bot pushed a commit to kernelci/kernelci-core that referenced this issue Dec 9, 2024
Ignore "No irq handler for vector" errors in baseline tests [1]. These
errors are harmless. But because of these errors the baseline test gets
marked as failed which makes results dirty. It would have been better to
ignore these errors as known errors on the dashboard. But as we don't
have the functionality at this time. We have decided to ignore these
during the test until we have the functionality.

[1] https://kcidb.kernelci.org/d/test/test?orgId=1&var-datasource=default&var-build_architecture=x86_64&var-build_config_name=cros:%2F%2Fchromeos-6.6%2Fx86_64%2Fchromeos-amd-stoneyridge.flavour.config&var-id=maestro:67455b8c3be6da94b19fde34&from=now-100y&to=now&timezone=browser&var-origin=$__all&var-test_path=&var-issue_presence=$__all
Close kernelci/kernelci-project#496
Signed-off-by: Muhammad Usama Anjum <[email protected]>
@github-project-automation github-project-automation bot moved this from In Progress to Done in KernelCI results data evaluation Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants