Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add PBS comments #340

Merged
merged 5 commits into from
Feb 18, 2024
Merged

add PBS comments #340

merged 5 commits into from
Feb 18, 2024

Conversation

cjknight
Copy link
Contributor

@cjknight cjknight commented Feb 9, 2024

No description provided.

@cjknight cjknight requested a review from monkeystate February 9, 2024 21:55
@cjknight cjknight linked an issue Feb 9, 2024 that may be closed by this pull request
Copy link
Member

@monkeystate monkeystate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, but in my original writing of this issue I used a comment from polaris, which is reflected in your commit; I suggest using a comment from aurora instead. (I later updated the original issue but perhaps you were never notified.)

FURTHER, this morning I went back to aurora and discovered that if one adds a "w" (wide) option to the qstat command then the comment is not truncated; improves readability. So I further refine my suggestion (check it out @weallcock):

qstat -xfw [JOBID] | grep comment

Select comments after executing "qstat -xfw" against all aurora jobs this morning:
comment = Not Running: Job is requesting an exclusive node and node is in use and terminated
comment = Not Running: Node is in an ineligible state: down and terminated
comment = Not Running: Node is in an ineligible state: offline and terminated
comment = Not Running: Job would conflict with reservation or top job and terminated
comment = Not Running: User has reached queue LustreApps running job limit. and terminated
comment = Not Running: Node is in an ineligible state: down
comment = Not Running: Insufficient amount of resource: at_queue and terminated
comment = Not Running: Insufficient amount of resource: tier0 and terminated
comment = Not Running: Insufficient amount of resource: ncpus and terminated
comment = Not Running: Queue not started. and terminated
comment = job held, too many failed attempts to run
comment = job held, too many failed attempts to run and terminated
comment = Job held by osborn on Tue Feb 6 05:20:00 2024 and terminated
comment = Can Never Run: Insufficient amount of resource: ncpus (R: 2209792 A: 2165904 T: 2165904)

@monkeystate
Copy link
Member

Fyi I'm not suggesting you add all the comments - just providing a set for you to choose from. I bow to your documentation wording preferences :)

@cjknight cjknight changed the title add grep comment add PBS comments Feb 17, 2024
@cjknight
Copy link
Contributor Author

@monkeystate How does this look?

Copy link
Member

@monkeystate monkeystate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, though I note this change includes two separate references to 'comment = job held, too many failed attempts to run', with slightly different advice text in response. (Imo it's fine as is if you don't want to tweak it further; just pointing it out in case you do.)

@felker felker merged commit 8109462 into argonne-lcf:main Feb 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Aurora doc suggestion for known issues page
3 participants