-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
tutorials: Add a flux job cancel tutorial
- Loading branch information
Al Chu11
committed
Feb 17, 2023
1 parent
7412532
commit caeb736
Showing
4 changed files
with
228 additions
and
0 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,226 @@ | ||
.. _flux-job-cancel: | ||
.. _flux-job-cancelall: | ||
.. _flux-pkill: | ||
|
||
======================== | ||
How to Cancel a Flux Job | ||
======================== | ||
|
||
Inevitably submitted jobs will have to be canceled for one reason or another. This tutorial | ||
will show you how. | ||
|
||
---------------------------- | ||
How to Cancel a Job by Jobid | ||
---------------------------- | ||
|
||
The basic way to cancel a job is through ``flux job cancel``. All you have to do is specify | ||
the jobid on the command line. Here is a simple example after submitting a job. | ||
|
||
.. code-block:: console | ||
$ flux mini submit sleep 100 | ||
ƒh35Dh5qRyq | ||
$ flux jobs ƒh35Dh5qRyq | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒh35Dh5qRyq achu sleep R 1 1 13.33s corona174 | ||
$ flux job cancel ƒh35Dh5qRyq | ||
<snip wait a little bit> | ||
$ flux jobs ƒh35Dh5qRyq | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒh35Dh5qRyq achu sleep CA 1 1 20.18s corona174 | ||
In the above example we submitted a simple job via ``flux mini submit`` that simply | ||
runs ``sleep``. Passing the resulting jobid to ``flux jobs`` shows that it is | ||
running (state is ``R``). | ||
|
||
We cancel the job simply by passing the jobid to ``flux job cancel``. After waiting | ||
a little bit, we see that the job is now canceled in ``flux jobs`` (state is ``CA``). | ||
|
||
While we only passed one jobid to ``flux job cancel`` in this example, multiple jobids can be | ||
passed on the commandline to cancel many jobs. | ||
|
||
Note that in this particular example we happened to know the jobid of our job. If you do | ||
not know the the jobid of your job, you can always use ``flux jobs`` to see a list of all | ||
your currently active jobs. | ||
|
||
------------------------ | ||
Cancelling All Your Jobs | ||
------------------------ | ||
|
||
The ``flux job cancelall`` command allows you to cancel jobs without specifying jobids. | ||
By default it cancels all of your active jobs, but several options allow you to target a subset of the jobs. | ||
|
||
To start off, lets create 100 jobs that will sleep infinitely. We will use the special ``--cc`` (carbon copy) | ||
option to ``flux mini submit`` that will submit 100 duplicate copies of the ``sleep`` job. | ||
|
||
.. code-block:: console | ||
$ flux mini submit --cc=1-100 sleep inf | ||
<snip - many job ids printed out> | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒjTWS5m3 achu sleep S 1 - - | ||
ƒjTWS5m4 achu sleep S 1 - - | ||
ƒjTWS5m5 achu sleep S 1 - - | ||
ƒjTWS5m6 achu sleep S 1 - - | ||
<snip - there are many jobs waiting to be run> | ||
ƒjTWS5m2 achu sleep R 1 1 8.858s corona212 | ||
ƒjTWS5m1 achu sleep R 1 1 8.860s corona212 | ||
ƒjTUx6Um achu sleep R 1 1 8.870s corona212 | ||
ƒjTUx6Uk achu sleep R 1 1 8.870s corona212 | ||
ƒjTUx6Uj achu sleep R 1 1 8.870s corona212 | ||
ƒjTUx6Ui achu sleep R 1 1 8.871s corona212 | ||
<snip - there are many jobs running> | ||
As you can see, we have a lot of jobs waiting to run (state ``S``) and a lot of running jobs (state ``R``). | ||
|
||
Lets first ``flux job cancelall`` without any options. | ||
|
||
.. code-block:: console | ||
$ flux job cancelall | ||
flux-job: Command matched 100 jobs (-f to confirm) | ||
As you can see, ``flux job cancelall`` found all 100 jobs to cancel, but it hasn't canceled them yet. In order to go through | ||
with the cancellation you must specify the ``-f`` (or ``--force``) option. | ||
|
||
.. code-block:: console | ||
$ flux job cancelall -f | ||
flux-job: Canceled 100 jobs (0 errors) | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
As you can see, all the jobs are now canceled after passing the ``-f`` option to ``flux job cancelall``. ``flux jobs`` | ||
confirms there are no longer any of our jobs running or waiting to run. | ||
|
||
``flux job cancellall`` has several options to filter the jobs to cancel. Perhaps the most commonly used | ||
option is the ``-S`` or ``--states`` option. The ``--states`` option specifies the state(s) of a job to cancel. The most | ||
common states to target are ``pending`` and ``running``. Lets resubmit our 100 jobs and see the result | ||
of trying to cancel ``pending`` vs ``running`` jobs. | ||
|
||
.. code-block:: console | ||
$ flux mini submit --cc=1-100 sleep inf | ||
<snip - many job ids printed out> | ||
$ flux job cancelall --states=pending | ||
flux-job: Command matched 52 jobs (-f to confirm) | ||
$ flux job cancelall --states=running | ||
flux-job: Command matched 48 jobs (-f to confirm) | ||
As you can see ``flux job cancelall --states=pending`` would target the 52 pending jobs for cancellation and | ||
``flux job cancelall --states=running`` would target the current 48 running jobs for cancellation. | ||
|
||
-------------------------- | ||
Cancelling with Flux Pkill | ||
-------------------------- | ||
|
||
One final way to cancel a job is via ``flux pkill``. There are a number of search and filtering options available in | ||
``flux pkill`` which can be seen in the :core:man1:`flux-pkill` manpage. | ||
|
||
However, there are two common ways ``flux pkill`` is used. The first is to cancel a range of jobids. The jobid range can be specified | ||
via the format ``jobid1..jobidN``. | ||
|
||
It is best shown with an example. | ||
|
||
.. code-block:: console | ||
$ flux mini submit --cc=1-5 sleep inf | ||
ƒ3vEobuhH | ||
ƒ3vEobuhJ | ||
ƒ3vEobuhK | ||
ƒ3vEq5tyd | ||
ƒ3vEq5tye | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒ3vEq5tye achu sleep R 1 1 14.23s corona212 | ||
ƒ3vEq5tyd achu sleep R 1 1 14.23s corona212 | ||
ƒ3vEobuhK achu sleep R 1 1 14.23s corona212 | ||
ƒ3vEobuhJ achu sleep R 1 1 14.23s corona212 | ||
ƒ3vEobuhH achu sleep R 1 1 14.23s corona212 | ||
Similar to before, we've submitted some sleep jobs. We see all five of the sleep jobs are | ||
running (state ``R``) in the ``flux jobs`` output. | ||
|
||
We can inform ``flux pkill`` to cancel the set of 5 jobs by specifying the first and last jobid of this range. | ||
|
||
.. code-block:: console | ||
$ flux pkill ƒ3vEobuhH..ƒ3vEq5tye | ||
flux-pkill: INFO: Canceled 5 jobs | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
As you can see ``flux pkill`` canceled the five jobs in the range. | ||
|
||
The other common way to ``flux pkill`` is used is to cancel jobs with matching job names. For example, you may | ||
submit several different types of jobs and give them different types of names to describe their function. ``flux pkill`` | ||
can be used to match on the job names and cancel only the ones that match. | ||
|
||
Lets submit several jobs and give them specific names using the ``--job-name`` option. | ||
|
||
.. code-block:: console | ||
$ flux mini submit --job-name=foo sleep inf | ||
ƒ6KjHNcxP | ||
$ flux mini submit --job-name=foobar sleep inf | ||
ƒ6Limcmju | ||
$ flux mini submit --job-name=boo sleep inf | ||
ƒ6NCaXCmV | ||
$ flux mini submit --job-name=baz sleep inf | ||
ƒ6PjZG6jq | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒ6PjZG6jq achu baz R 1 1 38.06s corona212 | ||
ƒ6NCaXCmV achu boo R 1 1 41.54s corona212 | ||
ƒ6Limcmju achu foobar R 1 1 44.9s corona212 | ||
ƒ6KjHNcxP achu foo R 1 1 47.15s corona212 | ||
We've submitted four jobs, giving them the job names "foo", "foobar", "boo", and "baz". | ||
|
||
Lets cancel the job "boo" via ``flux pkill`` | ||
|
||
.. code-block:: console | ||
$ flux pkill boo | ||
flux-pkill: INFO: Canceled 1 job | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒ6PjZG6jq achu baz R 1 1 2.856m corona212 | ||
ƒ6Limcmju achu foobar R 1 1 2.97m corona212 | ||
ƒ6KjHNcxP achu foo R 1 1 3.008m corona212 | ||
As you can see, ``flux pkill`` canceled just one job, the one assigned the name "boo". | ||
|
||
``flux pkill`` will actually search for all jobs matching the supplied name, so what would happen if we asked ``flux pkill`` | ||
to cancel jobs with the matching name "foo". | ||
|
||
.. code-block:: console | ||
$ flux pkill foo | ||
flux-pkill: INFO: Canceled 2 jobs | ||
$ flux jobs | ||
JOBID USER NAME ST NTASKS NNODES TIME INFO | ||
ƒ6PjZG6jq achu baz R 1 1 4.626m corona212 | ||
As you can see it didn't cancel 1 job, it canceled 2 jobs, the job "foo" and the job "foobar". | ||
|
||
And that's it! If you have any questions, please | ||
`let us know <https://github.com/flux-framework/flux-docs/issues>`_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters