-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd: improve error reporting #336
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Files in cmd/ create turnstile.ConcurrentError instances with a mix of newConcurrentError calls and struct literals. Drop the helper in favor of consistently using struct literals.
Some of the turnstile.ConcurrentError notes are capitalized. That doesn't play well with postWorkNotice, which embeds the notes in "Details are %s".
postWorkNotice includes only the run identifier and notes when logging the ConcurrentError list. While the notes provide the overall context for the error, the underlying errors often contain additional details that are useful for troubleshooting.
If exec.LookPath returns an error, executeSGEJob aborts and returns a turnstile.ConcurrentError, but it has around twenty lines in between the call and error handling, none of which are related to the error. Return the error right after the LookPath call.
generateScript and generateBbiScript _replace_ a template parsing or execution error with a new one. Theses spots are unlikely to error given the templates are hard coded, but it's still useful to include the underlying error to make troubleshooting tweaks easier.
executeLocalJob checks whether running a command leads to an ExitError. If so, it logs the exit code and error string output. Including the error string just leads to a redundant message like Exit code was 127, details were exit status 127 Instead combine the exit code message with the following one that includes the combined stdout/stderr.
Like executeLocalJob, executeSGEJob runs a command with CombinedOutput and returns a turnstile.ConcurrentError if that results in an error. Unlike executeLocalJob, it doesn't relay the combined stdout/stderr when it hits an ExitError. The ConcurrentError doesn't include the stdout/stderr, and, in the error case, the output isn't written to disk, so the caller has no way to access these details. Update executeSGEJob to log ExitErrors in the same way executeLocalJob does.
executeLocalJob and executeSGEJob have nearly identical code for running a command and handling the errors. While the code is not too complicated, it's worth consolidating because 1) it ensures that all runners (local, SGE, and upcoming Slurm) behave in the same way (e.g., in terms of reporting errors and writing output files) and 2) the logic is going to get a bit more involved with the next commit.
The new runModelCommand was extracted from executeLocalJob and executeSGEJob and behaves as they did in terms of handling the output. The output is collected with CombinedOutput. If there's an error, the output is logged before returning a ConcurrentError. The output is written to _disk_ only if there was not an error. Capturing the output on disk is helpful for troubleshooting. Rework runModelCommand to send standard output and standard error directly to a file.
seth127
approved these changes
Oct 17, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes make sense to me.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This series focuses on improving errors reported by
bbi nonmem run {local,sge}
.The two main improvements make information more available for troubleshooting:
{model}.out