Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize machine-independent vs. Polaris-specific example PBS submission scripts #224

Open
8 tasks
felker opened this issue May 2, 2023 · 0 comments
Open
8 tasks

Comments

@felker
Copy link
Member

felker commented May 2, 2023

Copied from Polaris channel on CELS Slack. Related #218

The Polaris “Running Jobs” page is a bit messed up right now: https://docs.alcf.anl.gov/polaris/running-jobs/
@weallcock extracted/moved most of the Polaris example PBS scripts from the Polaris sidebar/subdirectory in #122 which he started in November that I merged in February; the idea being that many of the PBS examples will have machine-independent lessons for when we make Sunspot/Aurora public?

But it is in a weird state right now, since the generic https://docs.alcf.anl.gov/running-jobs/example-job-scripts/ basically reads like a Polaris page, with many CPU core-count and GPU model-specific lessons.

@zippylab had some concerns to that effect in #122 (comment) and made a small patch last month to supplement the Polaris page in #199 but I dont recall if we ever actually discussed/addressed the original concerns.

So more possible changes to https://docs.alcf.anl.gov/polaris/running-jobs/ for now:

  • add more links to the “generic” running jobs/example job scripts page
  • make it clear that the first example doesnt assume anything about the GPU usage; i.e. it could be a pure CPU application
  • Running MPI+OpenMP Applications
  • Use pymdownx.snippets to avoid duplicating source documentation for machine-independent PBS info. See Add comparison pymdownx.snippets to codeinclude #177

Not sure how to improve https://docs.alcf.anl.gov/running-jobs/example-job-scripts/

  • @zippylab suggests the Example Job Scripts section should just be moved under Polaris Running Jobs. Maybe rename it Polaris Example Job Scripts. Even CPU thread mappings and CPU-GPU connectivity details are machine specific.
  • @cjknight: cpu and ensemble stuff might be ok in generic pbs page and gpu bits could be put in machine-specific page (like Tim suggesting?)
  • Start adding non-restricted info about Sunspot job scheduler to the page, so it would be good to have another example just to see how we will structure and write a machine-independent, generic guide
  • How many machines do we expect to have that will be running PBS and not Cobalt in the near future?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant