Skip to content

Commit

Permalink
tutorials: Add a flux job cancel tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
Al Chu11 committed Feb 17, 2023
1 parent 7412532 commit caeb736
Show file tree
Hide file tree
Showing 4 changed files with 228 additions and 0 deletions.
Binary file modified auto_examples/auto_examples_jupyter.zip
Binary file not shown.
Binary file modified auto_examples/auto_examples_python.zip
Binary file not shown.
226 changes: 226 additions & 0 deletions tutorials/commands/flux-job-cancel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,226 @@
.. _flux-job-cancel:
.. _flux-job-cancelall:
.. _flux-pkill:

========================
How to Cancel a Flux Job
========================

Inevitably submitted jobs will have to be canceled for one reason or another. This tutorial
will show you how.

----------------------------
How to Cancel a Job by Jobid
----------------------------

The basic way to cancel a job is through ``flux job cancel``. All you have to do is specify
the jobid on the command line. Here is a simple example after submitting a job.

.. code-block:: console
$ flux mini submit sleep 100
ƒh35Dh5qRyq
$ flux jobs ƒh35Dh5qRyq
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒh35Dh5qRyq achu sleep R 1 1 13.33s corona174
$ flux job cancel ƒh35Dh5qRyq
<snip wait a little bit>
$ flux jobs ƒh35Dh5qRyq
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒh35Dh5qRyq achu sleep CA 1 1 20.18s corona174
In the above example we submitted a simple job via ``flux mini submit`` that simply
runs ``sleep``. Passing the resulting jobid to ``flux jobs`` shows that it is
running (state is ``R``).

We cancel the job simply by passing the jobid to ``flux job cancel``. After waiting
a little bit, we see that the job is now canceled in ``flux jobs`` (state is ``CA``).

While we only passed one jobid to ``flux job cancel`` in this example, multiple jobids can be
passed on the commandline to cancel many jobs.

Note that in this particular example we happened to know the jobid of our job. If you do
not know the the jobid of your job, you can always use ``flux jobs`` to see a list of all
your currently active jobs.

------------------------
Cancelling All Your Jobs
------------------------

The ``flux job cancelall`` command allows you to cancel jobs without specifying jobids.
By default it cancels all of your active jobs, but several options allow you to target a subset of the jobs.

To start off, lets create 100 jobs that will sleep infinitely. We will use the special ``--cc`` (carbon copy)
option to ``flux mini submit`` that will submit 100 duplicate copies of the ``sleep`` job.

.. code-block:: console
$ flux mini submit --cc=1-100 sleep inf
<snip - many job ids printed out>
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒjTWS5m3 achu sleep S 1 - -
ƒjTWS5m4 achu sleep S 1 - -
ƒjTWS5m5 achu sleep S 1 - -
ƒjTWS5m6 achu sleep S 1 - -
<snip - there are many jobs waiting to be run>
ƒjTWS5m2 achu sleep R 1 1 8.858s corona212
ƒjTWS5m1 achu sleep R 1 1 8.860s corona212
ƒjTUx6Um achu sleep R 1 1 8.870s corona212
ƒjTUx6Uk achu sleep R 1 1 8.870s corona212
ƒjTUx6Uj achu sleep R 1 1 8.870s corona212
ƒjTUx6Ui achu sleep R 1 1 8.871s corona212
<snip - there are many jobs running>
As you can see, we have a lot of jobs waiting to run (state ``S``) and a lot of running jobs (state ``R``).

Lets first ``flux job cancelall`` without any options.

.. code-block:: console
$ flux job cancelall
flux-job: Command matched 100 jobs (-f to confirm)
As you can see, ``flux job cancelall`` found all 100 jobs to cancel, but it hasn't canceled them yet. In order to go through
with the cancellation you must specify the ``-f`` (or ``--force``) option.

.. code-block:: console
$ flux job cancelall -f
flux-job: Canceled 100 jobs (0 errors)
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
As you can see, all the jobs are now canceled after passing the ``-f`` option to ``flux job cancelall``. ``flux jobs``
confirms there are no longer any of our jobs running or waiting to run.

``flux job cancellall`` has several options to filter the jobs to cancel. Perhaps the most commonly used
option is the ``-S`` or ``--states`` option. The ``--states`` option specifies the state(s) of a job to cancel. The most
common states to target are ``pending`` and ``running``. Lets resubmit our 100 jobs and see the result
of trying to cancel ``pending`` vs ``running`` jobs.

.. code-block:: console
$ flux mini submit --cc=1-100 sleep inf
<snip - many job ids printed out>
$ flux job cancelall --states=pending
flux-job: Command matched 52 jobs (-f to confirm)
$ flux job cancelall --states=running
flux-job: Command matched 48 jobs (-f to confirm)
As you can see ``flux job cancelall --states=pending`` would target the 52 pending jobs for cancellation and
``flux job cancelall --states=running`` would target the current 48 running jobs for cancellation.

--------------------------
Cancelling with Flux Pkill
--------------------------

One final way to cancel a job is via ``flux pkill``. There are a number of search and filtering options available in
``flux pkill`` which can be seen in the :core:man1:`flux-pkill` manpage.

However, there are two common ways ``flux pkill`` is used. The first is to cancel a range of jobids. The jobid range can be specified
via the format ``jobid1..jobidN``.

It is best shown with an example.

.. code-block:: console
$ flux mini submit --cc=1-5 sleep inf
ƒ3vEobuhH
ƒ3vEobuhJ
ƒ3vEobuhK
ƒ3vEq5tyd
ƒ3vEq5tye
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒ3vEq5tye achu sleep R 1 1 14.23s corona212
ƒ3vEq5tyd achu sleep R 1 1 14.23s corona212
ƒ3vEobuhK achu sleep R 1 1 14.23s corona212
ƒ3vEobuhJ achu sleep R 1 1 14.23s corona212
ƒ3vEobuhH achu sleep R 1 1 14.23s corona212
Similar to before, we've submitted some sleep jobs. We see all five of the sleep jobs are
running (state ``R``) in the ``flux jobs`` output.

We can inform ``flux pkill`` to cancel the set of 5 jobs by specifying the first and last jobid of this range.

.. code-block:: console
$ flux pkill ƒ3vEobuhH..ƒ3vEq5tye
flux-pkill: INFO: Canceled 5 jobs
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
As you can see ``flux pkill`` canceled the five jobs in the range.

The other common way to ``flux pkill`` is used is to cancel jobs with matching job names. For example, you may
submit several different types of jobs and give them different types of names to describe their function. ``flux pkill``
can be used to match on the job names and cancel only the ones that match.

Lets submit several jobs and give them specific names using the ``--job-name`` option.

.. code-block:: console
$ flux mini submit --job-name=foo sleep inf
ƒ6KjHNcxP
$ flux mini submit --job-name=foobar sleep inf
ƒ6Limcmju
$ flux mini submit --job-name=boo sleep inf
ƒ6NCaXCmV
$ flux mini submit --job-name=baz sleep inf
ƒ6PjZG6jq
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒ6PjZG6jq achu baz R 1 1 38.06s corona212
ƒ6NCaXCmV achu boo R 1 1 41.54s corona212
ƒ6Limcmju achu foobar R 1 1 44.9s corona212
ƒ6KjHNcxP achu foo R 1 1 47.15s corona212
We've submitted four jobs, giving them the job names "foo", "foobar", "boo", and "baz".

Lets cancel the job "boo" via ``flux pkill``

.. code-block:: console
$ flux pkill boo
flux-pkill: INFO: Canceled 1 job
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒ6PjZG6jq achu baz R 1 1 2.856m corona212
ƒ6Limcmju achu foobar R 1 1 2.97m corona212
ƒ6KjHNcxP achu foo R 1 1 3.008m corona212
As you can see, ``flux pkill`` canceled just one job, the one assigned the name "boo".

``flux pkill`` will actually search for all jobs matching the supplied name, so what would happen if we asked ``flux pkill``
to cancel jobs with the matching name "foo".

.. code-block:: console
$ flux pkill foo
flux-pkill: INFO: Canceled 2 jobs
$ flux jobs
JOBID USER NAME ST NTASKS NNODES TIME INFO
ƒ6PjZG6jq achu baz R 1 1 4.626m corona212
As you can see it didn't cancel 1 job, it canceled 2 jobs, the job "foo" and the job "foobar".

And that's it! If you have any questions, please
`let us know <https://github.com/flux-framework/flux-docs/issues>`_.
2 changes: 2 additions & 0 deletions tutorials/commands/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Welcome to the Command Tutorials! These tutorials should help you to map specifi
with your use case, and then see detailed usage.

- ``flux mini submit/flux mini run`` (:ref:`flux-mini-submit`): "Submit a job in a Flux instance"
- ``flux job cancel/flux job cancelall/flux pkill`` (:ref:`flux-job-cancel`): "Cancel a job you submitted"
- ``flux proxy`` (:ref:`ssh-across-clusters`): "Send commands to a Flux instance across clusters using ssh"

This section is currently 🚧️ under construction 🚧️, so please come back later to see more command tutorials!
Expand All @@ -17,4 +18,5 @@ This section is currently 🚧️ under construction 🚧️, so please come bac
:caption: Command Tutorials

flux-mini-submit
flux-job-cancel
ssh-across-clusters

0 comments on commit caeb736

Please sign in to comment.