-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
waiting-for-jobs: add new guide #220
base: master
Are you sure you want to change the base?
Conversation
|
||
.. code-block:: console | ||
|
||
$ flux submit --wait -n1 bash -c "sleep 30; /bin/false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know this! TIL
$ echo $? | ||
1 | ||
|
||
The above command submits a job that simply sleeps for 30 seconds on one processor (``-n1``) and then runs ``/bin/false``. The :ref:`jobid <fluid>` is immediately output, but the command won't return until the 30 second job has completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above command submits a job that simply sleeps for 30 seconds on one processor (``-n1``) and then runs ``/bin/false``. The :ref:`jobid <fluid>` is immediately output, but the command won't return until the 30 second job has completed. | |
The above command submits a job that simply sleeps for 30 seconds on one processor (``-n1``) and then runs ``/bin/false``. The :ref:`jobid <fluxid>` is immediately output, but the command won't return until the 30 second job has completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you typo something here? When I grep I don't see a reference to "fluxid".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed "fluid" should be fluxid, but if "fluid" is correct my mistake!
jobs/waiting-for-jobs.rst
Outdated
|
||
The above command submits a job that simply sleeps for 30 seconds on one processor (``-n1``) and then runs ``/bin/false``. The :ref:`jobid <fluid>` is immediately output, but the command won't return until the 30 second job has completed. | ||
|
||
After the command has finished we print the exit code from ``flux submit``. You'll notice the exit code is ``1``, which is the final exit code of the job, which in this case was ``1`` because we ran ``/bin/false``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the command has finished we print the exit code from ``flux submit``. You'll notice the exit code is ``1``, which is the final exit code of the job, which in this case was ``1`` because we ran ``/bin/false``. | |
After the command has finished we print the exit code from ``flux submit``, which is ``1``, because we ran ``/bin/false``. |
jobs/waiting-for-jobs.rst
Outdated
Flux Job Status | ||
--------------- | ||
|
||
In most cases, you do not want to sit and wait for the current job submission to complete. You would like to do other things, such as submit more jobs, and then wait for those specific jobs to complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed I don't! I have avocados to eat! Mountains to climb!
jobs/waiting-for-jobs.rst
Outdated
|
||
In most cases, you do not want to sit and wait for the current job submission to complete. You would like to do other things, such as submit more jobs, and then wait for those specific jobs to complete. | ||
|
||
The ``flux job status`` command is the most basic way to wait for a specific job, based on jobid, to complete. Pass it one or more jobids to wait on, and ``flux job status`` will return once all of the jobs have completed. It will exit with largest exit code from any of the jobids specified. If the job(s) have already completed, ``flux job status`` returns immediately. It can be run as many times as the user would like against the same jobid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the context here that I've submit a bunch, and then (after that) I want to wait for a specific job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, think I should mention something to that affect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly - you read between the lines.
$ flux job wait | ||
flux-job: there are no more waitable jobs | ||
|
||
In this above example, we submit three jobs, sleeping for 60, 45, and 30 seconds respectively before running ``/bin/true``. We then run ``flux job wait`` without any inputs. You'll notice the jobids for the ``sleep 30`` job, then ``sleep 45`` job, then ``sleep 60`` job are returned in that order. Finally, without any jobs left running with the ``waitable`` flag, ``flux job wait`` indicates there are no more waitable jobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it doesn't wait for all of them to complete (like the multiple one on the same line?) What is the use case for this if I have to run it a gazillion times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the typical use case is a user wants to know when a job has finished and can do some type of post-processing on its results while the other jobs keep on running. They don't care which one finishes first/next, they just need to know that one has finished (and which one).
(Hopefully this use case might explain other questions you had above/below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably good to stick one sentence in there to note this common use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you couldn't use flux job status
for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flux job status
requires you to input all of the jobids and doesn't exit until all of the jobs finish, thus more inconvenient.
ƒ4YPufmCjq | ||
$ flux submit --flags waitable -n1 bash -c "sleep 30; /bin/false" | ||
ƒ4YSVQWfZq | ||
$ flux job wait --all --verbose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh this one makes sense! But what is the use case for without --all
?
jobs/waiting-for-jobs.rst
Outdated
|
||
This example is similar to the above, except one of the jobs runs ``/bin/false`` instead of ``/bin/true``. When ``flux job wait --all`` is executed, you'll notice a message output indicating that one job has failed (the one that ran ``/bin/false``). And similar to ``flux job status``, the exit code of ``1`` is returned due to the highest exit code of all the jobs. | ||
|
||
The biggest disadvantage of ``flux job wait`` compared to ``flux job status`` is that jobs can only waited on once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest disadvantage of ``flux job wait`` compared to ``flux job status`` is that jobs can only waited on once. | |
The biggest disadvantage of ``flux job wait`` compared to ``flux job status`` is that jobs can only be waited on once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only being able to wait on a job once is not necessarily a disadvantage, without it you would not be able to flux job wait
in a loop (you'd just keep getting the same jobid continually). So there is a purpose here and each interface satisfies different se cases. Instead of calling this a disadvantage, maybe the guide should discuss the use cases for which each interface is designed?
jobs/waiting-for-jobs.rst
Outdated
|
||
$ flux submit --flags waitable -n1 bash -c "sleep 30; /bin/true" | ||
ƒBbk3qrdro | ||
$ flux job wait ƒBbk3qrdro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would I put the jobid at all? Wouldn't I just run flux job wait
without any args like shown in the example above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh you're correct for this specific case, they wouldn't need to. Would it be clearer to not put in the jobid in this case? (Edit: i see your comment below, probably should remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if there was a previously submitted waitable job that was not yet reaped? flux job wait
doesn't necessarily only wait for the last submitted job...
Pros: | ||
|
||
- ``flux job wait`` more efficient when waiting for a set of jobs | ||
- Jobids do not need to be specified to ``flux job wait`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe just take that part of the tutorial out - don't show giving a job id to flux job wait if that shouldn't be learned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. I'll mention it, but definitely stress it less.
pushed a fixup, tweaking a few things, adding a sentence here and there given comments above |
Perhaps something should be said in here to the effect of: If you need to wait for thousands of jobs efficiently, or need to wait for single jobs as they complete, then |
re-pushed. taking into account several of the comments above, re-worked the flow of the |
re-pushed, updating example script given completion of flux-framework/flux-core#5033 |
Add a new guide on how to wait for jobs to complete.
Add a new guide on how to wait for jobs to complete.