Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable more flexible lustre scheduling #161

Open
jameshcorbett opened this issue May 10, 2024 · 0 comments
Open

Enable more flexible lustre scheduling #161

jameshcorbett opened this issue May 10, 2024 · 0 comments

Comments

@jameshcorbett
Copy link
Member

If #157 goes in and changes the layout of the resource graph, it will enable jobspecs that look like this:

version: 9999
resources:
  - type: ssd
    count: 50000
    exclusive: true
  - type: node
    count: 1
    exclusive: false
    with:
    - type: slot
      label: task
      count: 1
      with:
      - type: core
        count: 1
# a comment
attributes:
  system:
    duration: 3600
tasks:
  - command: [ "app" ]
    slot: task
    count:
      per_slot: 1

Which would allow fluxion to pick rabbit-ssds and nodes completely independently, which would be a perfect match for lustre file systems. However, if a job also asked for xfs or gfs2 in addition to lustre, I think the only option would be to fall back to forcing all storage to be rack-local.

To enable this, directivebreakdown.py would need to be updated to recognize lustre-only directives, and coral2_dws would need to inspect the JGF output from Fluxion after scheduling to see which rabbits were selected, rather than simply assuming (as it does now) that rabbits were chosen according to the nodes that were chosen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant