-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add PBS comments #340
add PBS comments #340
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, but in my original writing of this issue I used a comment from polaris, which is reflected in your commit; I suggest using a comment from aurora instead. (I later updated the original issue but perhaps you were never notified.)
FURTHER, this morning I went back to aurora and discovered that if one adds a "w" (wide) option to the qstat command then the comment is not truncated; improves readability. So I further refine my suggestion (check it out @weallcock):
qstat -xfw [JOBID] | grep comment
Select comments after executing "qstat -xfw" against all aurora jobs this morning:
comment = Not Running: Job is requesting an exclusive node and node is in use and terminated
comment = Not Running: Node is in an ineligible state: down and terminated
comment = Not Running: Node is in an ineligible state: offline and terminated
comment = Not Running: Job would conflict with reservation or top job and terminated
comment = Not Running: User has reached queue LustreApps running job limit. and terminated
comment = Not Running: Node is in an ineligible state: down
comment = Not Running: Insufficient amount of resource: at_queue and terminated
comment = Not Running: Insufficient amount of resource: tier0 and terminated
comment = Not Running: Insufficient amount of resource: ncpus and terminated
comment = Not Running: Queue not started. and terminated
comment = job held, too many failed attempts to run
comment = job held, too many failed attempts to run and terminated
comment = Job held by osborn on Tue Feb 6 05:20:00 2024 and terminated
comment = Can Never Run: Insufficient amount of resource: ncpus (R: 2209792 A: 2165904 T: 2165904)
Fyi I'm not suggesting you add all the comments - just providing a set for you to choose from. I bow to your documentation wording preferences :) |
@monkeystate How does this look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, though I note this change includes two separate references to 'comment = job held, too many failed attempts to run', with slightly different advice text in response. (Imo it's fine as is if you don't want to tweak it further; just pointing it out in case you do.)
No description provided.