-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler
: better and more tests
#57
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #57 +/- ##
==========================================
- Coverage 79.97% 79.32% -0.65%
==========================================
Files 4 4
Lines 674 677 +3
==========================================
- Hits 539 537 -2
- Misses 135 140 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks good, just some minor comments
tests/test_calculation.py
Outdated
@@ -35,6 +36,10 @@ def test_calculation_basic(firecrest_computer: orm.Computer): | |||
builder = code.get_builder() | |||
builder.x = orm.Int(1) | |||
builder.y = orm.Int(2) | |||
builder.metadata.options.account = firecrest_config.builder_metadata_options_account | |||
builder.metadata.options.custom_scheduler_commands = ( | |||
"#SBATCH --constraint=mc\n#SBATCH --mem=10K" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this constraint mc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for CSCS it means, calculations to run on multicore nodes, as opposed to gpu nodes:
https://user.cscs.ch/access/running/
We can get this from firecrest_config
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, maybe you can link this here and write a comment that we need this to test against the real server of cscs. If the docker image works again these might change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line of the code doesn't need to change in that case. Only in .firecrest-demo-config.json
one can leave the field empty. So I think it's alright.
tests/conftest.py
Outdated
@@ -383,6 +400,7 @@ class ComputerFirecrestConfig: | |||
temp_directory: str | |||
workdir: str | |||
api_version: str | |||
builder_metadata_options_account: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can add a small docstring for this class, this variable is not so clear to me. I think most variables are explained in the README and can copy pasted to here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, docstring added.
builder_metadata_options_custom_scheduler_commands
are "custom" settings that users had to define while submitting jobs. These are cluster specific variables and historically people using aiida were to set these variables in their workflow, etc..
an example of this variable is like this:
["#SBATCH --account=mr32", "#SBATCH --constraint=mc", "#SBATCH --mem=10K"]
We didn't need to do this for tests against a mocking server, but for tests against a real server these values are needed.
Note, we don't set this for the entire computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not so related to this PR, but for my clarification it would be nice. Usually AiiDA uses prepend-text option for adding this SBATCH commands into the script but for firecrest the SBATCH commands have to be made more explicit, therefore we have custom_scheduler_commands
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly I haven't explored that prepend-text variable when creating a new computer in aiida.
But you are right, some of these could be put there, the only thing is that, then it's always the same for all jobs in that computer.
Maybe one would want to run some jobs on gpu
and some other mc
, or different accounts for billing, etc..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case you would define multiple computers, but I am not sure about best practices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I'm a fan of having these things editable. e.g. if one has to change his billing account, shouldn't need to define a new computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but for mc
& gpu
, probably I agree, I don't really know
tests/test_calculation.py
Outdated
@@ -35,6 +36,10 @@ def test_calculation_basic(firecrest_computer: orm.Computer): | |||
builder = code.get_builder() | |||
builder.x = orm.Int(1) | |||
builder.y = orm.Int(2) | |||
builder.metadata.options.account = firecrest_config.builder_metadata_options_account | |||
builder.metadata.options.custom_scheduler_commands = ( | |||
"#SBATCH --constraint=mc\n#SBATCH --mem=10K" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification, maybe you can link this here and write a comment that we need this to test against the real server of cscs. If the docker image works again these might change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left few comments, can you resolve them and then merge? I give you already approval since it is not so important.
Co-authored-by: Alexander Goscinski <[email protected]>
Cheers @agoscinski, I applied/answered your suggestions |
Changes include:
tests/conftest.py
MockFirecrest
Slurm.all_jobs
cancel()
for killing jobspoll_active
whenjob_id
is not found, raiseFirecrestException
just like firecrest does. --this includes when the job is finished.conftest.Values
tests/test_calculation.py::test_calculation_basic
tests/test_scheduler.py
transport.kill_jobs
test_scheduler.py
tests pass against server