Skip to content

Commit

Permalink
Add status.requiredDaemons to DirectiveBreakdown (#177)
Browse files Browse the repository at this point in the history
Signed-off-by: Dean Roehrich <[email protected]>
  • Loading branch information
roehrich-hpe authored Jul 5, 2024
1 parent 85ec1c0 commit 703a2e8
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/guides/data-movement/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,14 @@ The `CreateRequest` API call that is used to create Data Movement with the Copy
options to allow a user to specify some options for that particular Data Movement. These settings
are on a per-request basis.

The Copy Offload API requires the `nnf-dm` daemon to be running on the compute node. This daemon may be configured to run full-time, or it may be left in a disabled state if the WLM is expected to run it only when a user requests it. See [Compute Daemons](../compute-daemons/readme.md) for the systemd service configuration of the daemon. See `RequiredDaemons` in [Directive Breakdown](../directive-breakdown/readme.md) for a description of how the user may request the daemon, in the case where the WLM will run it only on demand.

If the WLM is running the `nnf-dm` daemon only on demand, then the user can request that the daemon be running for their job by specifying `requires=copy-offload` in their `DW` directive. The following is an example:

```bash
#DW jobdw type=xfs capacity=1GB name=stg1 requires=copy-offload
```

See the [DataMovementCreateRequest API](copy-offload-api.html#datamovement.DataMovementCreateRequest)
definition for what can be configured.

Expand Down
30 changes: 30 additions & 0 deletions docs/guides/directive-breakdown/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,3 +149,33 @@ A location constraint consists of an `access` list and a `reference`.
* `status.compute.constraints.location.access` is a list that specifies what type of access the compute nodes need to have to the storage allocations in the allocation set. An allocation set may have multiple access types that are required
* `status.compute.constraints.location.access.type` specifies the connection type for the storage. This can be `network` or `physical`
* `status.compute.constraints.location.access.priority` specifies how necessary the connection type is. This can be `mandatory` or `bestEffort`

## RequiredDaemons

The `status.requiredDaemons` section of the `DirectiveBreakdown` tells the WLM about any driver-specific daemons it must enable for the job; it is assumed that the WLM knows about the driver-specific daemons and that if the users are specifying these then the WLM knows how to start them. The `status.requiredDaemons` section will exist only for `jobdw` and `persistentdw` directives. An example of the `status.requiredDaemons` section is included below.

```yaml
status:
...
requiredDaemons:
- copy-offload
...
```

The allowed list of required daemons that may be specified is defined in the [nnf-ruleset.yaml for DWS](https://github.com/NearNodeFlash/nnf-sos/blob/master/config/dws/nnf-ruleset.yaml), found in the `nnf-sos` repository. The `ruleDefs.key[requires]` statement is specified in two places in the ruleset, one for `jobdw` and the second for `persistentdw`. The ruleset allows a list of patterns to be specified, allowing one for each of the allowed daemons.

The `DW` directive will include a comma-separated list of daemons after the `requires` keyword. The following is an example:

```bash
#DW jobdw type=xfs capacity=1GB name=stg1 requires=copy-offload
```

The `DWDirectiveRule` resource currently active on the system can be viewed with:

```console
kubectl get -n dws-system dwdirectiverule nnf -o yaml
```

### Valid Daemons

Each site should define the list of daemons that are valid for that site and recognized by that site's WLM. The initial `nnf-ruleset.yaml` defines only one, called `copy-offload`. When a user specifies `copy-offload` in their `DW` directive, they are stating that their compute-node application will use the Copy Offload API Daemon described in the [Data Movement Configuration](../data-movement/readme.md).

0 comments on commit 703a2e8

Please sign in to comment.