From 1b8203e84fd8fed03d118c8c964b01bd35a6da1c Mon Sep 17 00:00:00 2001 From: Blake Devcich Date: Thu, 7 Mar 2024 10:00:18 -0600 Subject: [PATCH] Added global user documentation Signed-off-by: Blake Devcich --- docs/guides/global-lustre/readme.md | 79 +++++++++++++++++++++++++++++ docs/guides/index.md | 1 + mkdocs.yml | 1 + 3 files changed, 81 insertions(+) create mode 100644 docs/guides/global-lustre/readme.md diff --git a/docs/guides/global-lustre/readme.md b/docs/guides/global-lustre/readme.md new file mode 100644 index 0000000..6c4a2f7 --- /dev/null +++ b/docs/guides/global-lustre/readme.md @@ -0,0 +1,79 @@ +--- +authors: Blake Devcich +categories: provisioning +--- + +# Global Lustre + +## Background + +Adding global lustre to rabbit systems allows access to external file systems. This is primarily +used for Data Movement, where a user can perform `copy_in` and `copy_out` directives with global +lustre being the source and destination, respectively. + +Global lustre fileystems are represented by the `lustrefilesystems` resource in Kubernetes: + +```shell +$ kubectl get lustrefilesystems -A +NAMESPACE NAME FSNAME MGSNIDS AGE +default mylustre mylustre 10.1.1.113@tcp 20d +``` + +An example resource is as follows: + +```yaml +apiVersion: lus.cray.hpe.com/v1alpha1 +kind: LustreFileSystem +metadata: + name: mylustre + namespace: default +spec: + mgsNids: 10.1.1.100@tcp + mountRoot: /p/mylustre + name: mylustre + namespaces: + default: + modes: + - ReadWriteMany +``` + +## Namespaces + +Note the `spec.namespaces` field. For each namespace listed, the `lustre-fs-operator` creates a +PV/PVC pair in that namespace. This allows pods in that namespace to access global lustre. + +It is recommended to create `lustrefilesystem` resources in the `default` namespace (i.e. +`metadata.namespace`). Adding `lustrefilesystems` to the `nnf-dm-system` namespace can cause issues +with namespace deletion if you are undeploying `nnf-dm`. This also makes the `lustrefilesystem` +resource available to the `default` namespace, which makes it available to containers (e.g. +container workflows) running in the `default` namespace. + +## NNF Data Movement Manager + +The NNF Data Movement Manager is responsible for monitoring these resources and uses it to mount the +global lustre filesystem into each of the NNF DM Worker pods. These pods run on each of the NNF +nodes. This means with each addition or removal of `lustrefilesystems` resources, the DM worker pods +restart to adjust their mount points. + +The NNF Data Movement Manager also places a finalizer on the `lustrefilesystem` resource to indicate +that the resource is in use by Data Movement. This is to prevent the PV/PVC being deleted while they +are being used by pods. + +## Adding Global Lustre + +As mentioned previously, the NNF Data Movement Manager monitors these resources and automatically +adds the `nnf-dm-system` namespace to all `lustrefilesystem` resources. Once this happens, a PV/PVC +is created for the `nnf-dm-system` namespace to access global lustre. The Manager updates the NNF DM +Worker pods, which are then restarted to mount the global lustre file system. + +## Removing Global Lustre + +When a `lustrefilesystem` is deleted, the NNF DM Manager takes notice and starts to unmount the file +system from the DM Worker pods - causing another restart of the DM Worker pods. Once this is +finished, the DM finalizer is removed from the `lustrefilesystem` resource to signal that it is no +longer in use by Data Movement. + +If a `lustrefilesystem` does not delete, check the finalizers to see what might still be using it. +It is possible to get into a situation where `nnf-dm` has been undeployed, so there is nothing to +remove the DM finalizer from the `lustrefilesystem` resource. If that is the case, then manually +remove the DM finalizer so the deletion of the `lustrefilesystem` resource can continue. diff --git a/docs/guides/index.md b/docs/guides/index.md index c13ac11..01c6c36 100644 --- a/docs/guides/index.md +++ b/docs/guides/index.md @@ -14,6 +14,7 @@ * [Data Movement Configuration](data-movement/readme.md) * [Copy Offload API](data-movement/copy-offload-api.html) * [Lustre External MGT](external-mgs/readme.md) +* [Global Lustre](global-lustre/readme.md) ## NNF User Containers diff --git a/mkdocs.yml b/mkdocs.yml index 218fcb6..74b4b7e 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -18,6 +18,7 @@ nav: - 'Storage Profiles': 'guides/storage-profiles/readme.md' - 'User Containers': 'guides/user-containers/readme.md' - 'Lustre External MGT': 'guides/external-mgs/readme.md' + - 'Global Lustre': 'guides/global-lustre/readme.md' - 'RFCs': - rfcs/index.md - 'Rabbit Request For Comment Process': 'rfcs/0001/readme.md'