Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send notification to core team if postrun.sh fails #297

Open
hpages opened this issue Jun 9, 2023 · 4 comments · Fixed by #308
Open

Send notification to core team if postrun.sh fails #297

hpages opened this issue Jun 9, 2023 · 4 comments · Fixed by #308
Assignees

Comments

@hpages
Copy link
Contributor

hpages commented Jun 9, 2023

The most frequent cause of postrun.sh failures is when a build node didn't finish in time. This will typically:

  • cause BBS-make-PROPAGATION_STATUS_DB.py to fail if the node's products (i.e. package source tarballs or package binaries) need to be propagated,
  • or cause BBS-report.py to fail if the node's products don't need to be propagated.

One way to go is to modify these 2 Python scripts to send a notification email to an address of our choice (e.g. [email protected]) when they fail. The advantage of doing this at the Python level -- and not at the shell level (e.g. in postrun.sh) or at the R level (e.g. in BBS/utils/makePropagationStatusDb.R) -- is that we can use bbs.notify.sendtextmail() which already "knows" what SMTP server/user/password to use. Note that bbs.notify.sendtextmail() is what BBS-notify.py uses to send build/check failure notifications to package maintainers.

So we need to identify the function call in BBS-make-PROPAGATION_STATUS_DB.py and BBS-report.py that fails when a node didn't finish in time:

  • For BBS-make-PROPAGATION_STATUS_DB.py, which is basically a Python wrapper around BBS/utils/makePropagationStatusDb.R, it's the call to make_PROPAGATION_STATUS_DB() (this is the only function defined and called in the file).
  • For BBS-report.py, this would need some further assement but I suspect that it's the call to BBSreportutils.set_NODES(), but please don't take my word for it.

Then I guess it would just be a matter of wrapping these calls in a try statement, catch the exception, and send the notification.

What do you think?

Let me know if you have questions.

@jwokaty
Copy link
Collaborator

jwokaty commented Jun 14, 2023

I don't think I am familiar with the second scenario where BBS-report.py fails. I only remember seeing errors related to the propagation status db. Is the error the same? I don't think I understand very well how it happens.

@hpages
Copy link
Contributor Author

hpages commented Jun 14, 2023

This is to cover the scenario where there's no build products for a node listed in BBS_REPORT_NODES. I don't know the details of how/where this would break things but it would certainly break things. To learn about the how/where, you could simulate (should not be hard).

But let's focus on the 1st scenario which is by far the most common.

@hpages
Copy link
Contributor Author

hpages commented Jun 14, 2023

BTW the reason we usually don't see the 2nd scenario is because all our build nodes send back stuff that we propagate so BBS-make-PROPAGATION_STATUS_DB.py will fail before anything else. The 2nd scenario would happen only if for some reason we disable propagation for some nodes, or if some nodes like kunpeng2 are not sending back package tarballs.

@jwokaty
Copy link
Collaborator

jwokaty commented Jun 23, 2023

I'm just reopening for the second scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants