Skip to content

Commit

Permalink
flux-job(1): update WAIT section
Browse files Browse the repository at this point in the history
Problem: the man page entry for flux job wait does not adequately
describe the design of waitable jobs.

Rework this section to emphasize the underlying design.

Fixes flux-framework#5038
  • Loading branch information
garlick committed Mar 30, 2023
1 parent 5b8ef5b commit ec3e98a
Showing 1 changed file with 29 additions and 16 deletions.
45 changes: 29 additions & 16 deletions doc/man1/flux-job.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,24 +103,37 @@ Wait for job(s) to complete and exit with the largest exit code.
WAIT
====

A waitable job may be waited on with ``flux job wait``. A specific job
can be waited on by specifying a jobid. If no jobid is specified, the
command will wait for any waitable user job to complete, outputting that
jobid before exiting. This command will exit with error if the job is not
successful.

Compared to ``flux job status``, there are several advantages /
disadvantages of using ``flux job wait``. For a large number of jobs,
``flux job wait`` is far more efficient, especially when used with the
``--all`` option below. In addition, job ids do not have to be specified
to ``flux job wait``.

The two major limitations are that jobs must be submitted with the
waitable flag, which can only be done in user instances. In addition,
``flux job wait`` can only be called once per job.
``flux job wait`` behaves like the UNIX :linux:man2:`wait` system call,
for jobs submitted with the ``waitable`` flag. Compared to other methods
of synchronizing on job completion and obtaining results, it is very
lightweight.

The result of a waitable job may only be consumed once. This is a design
feature that makes it possible to call ``flux job wait`` in a loop until all
results are consumed.

.. note::
Only the instance owner is permitted to submit jobs with the ``waitable``
flag.

When run with a jobid argument, ``flux job wait`` blocks until the specified
job completes. If the job was successful, it silently exits with a code of
zero. If the job has failed, an error is printed on stderr, and it exits with
a code of one. It is an error if the job was not submitted with the
``waitable`` flag.

When run without arguments, ``flux job wait`` blocks until the next waitable
job completes and behaves as above except that the jobid is printed to stdout.
When there are no more waitable jobs, it exits with a code of one. Note that
a ``while flux job wait...`` loop terminates on the first unsuccessful job
or when there are no more jobs.

``flux job wait --all`` loops through waitable jobs as they complete, printing
their jobids. If all jobs are successful, it exits with a code of zero. If
any jobs have failed, it exits with a code of one.

**-a, --all**
Wait for all waitable jobs. Will exit with error if any jobs are
Wait for all waitable jobs and exit with error if any jobs are
not successful.

**-v, --verbose**
Expand Down

0 comments on commit ec3e98a

Please sign in to comment.