Skip to content

Commit

Permalink
Added global user documentation
Browse files Browse the repository at this point in the history
Signed-off-by: Blake Devcich <[email protected]>
  • Loading branch information
bdevcich committed Mar 7, 2024
1 parent 6d218f2 commit 1b8203e
Show file tree
Hide file tree
Showing 3 changed files with 81 additions and 0 deletions.
79 changes: 79 additions & 0 deletions docs/guides/global-lustre/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
authors: Blake Devcich <[email protected]>
categories: provisioning
---

# Global Lustre

## Background

Adding global lustre to rabbit systems allows access to external file systems. This is primarily
used for Data Movement, where a user can perform `copy_in` and `copy_out` directives with global
lustre being the source and destination, respectively.

Global lustre fileystems are represented by the `lustrefilesystems` resource in Kubernetes:

```shell
$ kubectl get lustrefilesystems -A
NAMESPACE NAME FSNAME MGSNIDS AGE
default mylustre mylustre 10.1.1.113@tcp 20d
```

An example resource is as follows:

```yaml
apiVersion: lus.cray.hpe.com/v1alpha1
kind: LustreFileSystem
metadata:
name: mylustre
namespace: default
spec:
mgsNids: 10.1.1.100@tcp
mountRoot: /p/mylustre
name: mylustre
namespaces:
default:
modes:
- ReadWriteMany
```
## Namespaces
Note the `spec.namespaces` field. For each namespace listed, the `lustre-fs-operator` creates a
PV/PVC pair in that namespace. This allows pods in that namespace to access global lustre.

It is recommended to create `lustrefilesystem` resources in the `default` namespace (i.e.
`metadata.namespace`). Adding `lustrefilesystems` to the `nnf-dm-system` namespace can cause issues
with namespace deletion if you are undeploying `nnf-dm`. This also makes the `lustrefilesystem`
resource available to the `default` namespace, which makes it available to containers (e.g.
container workflows) running in the `default` namespace.

## NNF Data Movement Manager

The NNF Data Movement Manager is responsible for monitoring these resources and uses it to mount the
global lustre filesystem into each of the NNF DM Worker pods. These pods run on each of the NNF
nodes. This means with each addition or removal of `lustrefilesystems` resources, the DM worker pods
restart to adjust their mount points.

The NNF Data Movement Manager also places a finalizer on the `lustrefilesystem` resource to indicate
that the resource is in use by Data Movement. This is to prevent the PV/PVC being deleted while they
are being used by pods.

## Adding Global Lustre

As mentioned previously, the NNF Data Movement Manager monitors these resources and automatically
adds the `nnf-dm-system` namespace to all `lustrefilesystem` resources. Once this happens, a PV/PVC
is created for the `nnf-dm-system` namespace to access global lustre. The Manager updates the NNF DM
Worker pods, which are then restarted to mount the global lustre file system.

## Removing Global Lustre

When a `lustrefilesystem` is deleted, the NNF DM Manager takes notice and starts to unmount the file
system from the DM Worker pods - causing another restart of the DM Worker pods. Once this is
finished, the DM finalizer is removed from the `lustrefilesystem` resource to signal that it is no
longer in use by Data Movement.

If a `lustrefilesystem` does not delete, check the finalizers to see what might still be using it.
It is possible to get into a situation where `nnf-dm` has been undeployed, so there is nothing to
remove the DM finalizer from the `lustrefilesystem` resource. If that is the case, then manually
remove the DM finalizer so the deletion of the `lustrefilesystem` resource can continue.
1 change: 1 addition & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
* [Data Movement Configuration](data-movement/readme.md)
* [Copy Offload API](data-movement/copy-offload-api.html)
* [Lustre External MGT](external-mgs/readme.md)
* [Global Lustre](global-lustre/readme.md)

## NNF User Containers

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ nav:
- 'Storage Profiles': 'guides/storage-profiles/readme.md'
- 'User Containers': 'guides/user-containers/readme.md'
- 'Lustre External MGT': 'guides/external-mgs/readme.md'
- 'Global Lustre': 'guides/global-lustre/readme.md'
- 'RFCs':
- rfcs/index.md
- 'Rabbit Request For Comment Process': 'rfcs/0001/readme.md'
Expand Down

0 comments on commit 1b8203e

Please sign in to comment.