Merge branch 'main' into release-v0.0.2

NearNodeFlash · Jun 8, 2023 · a1826bc · a1826bc
2 parents 8ba5e6f + 9cdcafb
commit a1826bc
Show file tree

Hide file tree

Showing 8 changed files with 530 additions and 133 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,4 @@
 venv
+site
+
 
diff --git a/docs/guides/compute-daemons/readme.md b/docs/guides/compute-daemons/readme.md
@@ -40,7 +40,7 @@ SERVICE_ACCOUNT=$1
 NAMESPACE=$2
 
 kubectl get secret ${SERVICE_ACCOUNT} -n ${NAMESPACE} -o json | jq -Mr '.data.token' | base64 --decode > ./service.token
-kubectl get secret ${SERVICE_ACCOUNT} -n ${NAMESPACE} -o json | jq -Mr '.data["ca.crt"]' | base64 -decode > ./service.cert
+kubectl get secret ${SERVICE_ACCOUNT} -n ${NAMESPACE} -o json | jq -Mr '.data["ca.crt"]' | base64 --decode > ./service.cert
 ```
 
 The `service.token` and `service.cert` files must be copied to each compute node, typically in the `/etc/[BINARY-NAME]/` directory

diff --git a/docs/guides/ha-cluster/readme.md b/docs/guides/ha-cluster/readme.md
@@ -46,7 +46,7 @@ Configure the NNF agent with the following parameters:
 | `nnf-node-name=[NNF-NODE-NAME]` | Name of the NNF node as it is appears in the System Configuration |
 | `api-version=[VERSION]` | The API Version of the NNF Node resource. Defaults to "v1alpha1" |
 
-The token and certificate can be found in the Kubernetes Secrets resource for the nnf-system/nnf-fence-agent ServiceAccount. This provides RBAC rules to limit the fencing agent to only the Kubernetes resources it needs access to.
+The token and certificate can be found in the Kubernetes Secrets resource for the nnf-system/nnf-fencing-agent ServiceAccount. This provides RBAC rules to limit the fencing agent to only the Kubernetes resources it needs access to.
 
 For example, setting up the NNF fencing agent on `rabbit-node-1` with a kubernetes service API running at `192.168.0.1:6443` and the service token and certificate copied to `/etc/nnf/fence/`. This needs to be run on one node in the cluster.
 

diff --git a/docs/guides/index.md b/docs/guides/index.md
@@ -12,3 +12,9 @@
 
 * [Storage Profiles](storage-profiles/readme.md)
 * [Data Movement Configuration](data-movement/readme.md)
+
+## NNF User Containers
+
+* [User Containers](user-containers/readme.md)
+
+
diff --git a/docs/guides/rbac-for-users/readme.md b/docs/guides/rbac-for-users/readme.md
@@ -3,13 +3,15 @@ authors: Matt Richerson <[email protected]>
 categories: setup
 ---
 
-# RBAC for Users
+# RBAC: Role-Based Access Control
 
-This document shows how to create a kubeconfig file with RBAC set up to restrict access to view only for resources.
+RBAC (Role Based Access Control) determines the operations a user or service can perform on a list of Kubernetes resources. RBAC affects everything that interacts with the kube-apiserver (both users and services internal or external to the cluster). More information about RBAC can be found in the Kubernetes [***documentation***](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
 
-## Overview
+## RBAC for Users
 
-RBAC (Role Based Access Control) determines the operations a user or service can perform on a list of Kubernetes resources. RBAC affects everything that interacts with the kube-apiserver (both users and services internal or external to the cluster). More information about RBAC can be found in the Kubernetes [***documentation***](https://kubernetes.io/docs/reference/access-authn-authz/rbac/).
+This section shows how to create a kubeconfig file with RBAC set up to restrict access to view only for resources.
+
+### Overview
 
 User access to a Kubernetes cluster is defined through a kubeconfig file. This file contains the address of the kube-apiserver as well as the key and certificate for the user. Typically this file is located in `~/.kube/config`. When a kubernetes cluster is created, a config file is generated for the admin that allows unrestricted access to all resources in the cluster. This is the equivalent of `root` on a Linux system.
 
@@ -19,46 +21,49 @@ The goal of this document is to create a new kubeconfig file that allows view on
 - Creating a new kubeconfig file
 - Adding RBAC rules for the "hpe" user to allow read access
 
-## Generate a Key and Certificate
+### Generate a Key and Certificate
 
 The first step is to create a new key and certificate so that HPE employees can authenticate as the "hpe" user. This will likely be done on one of the master nodes. The `openssl` command needs access to the certificate authority file. This is typically located in `/etc/kubernetes/pki`.
 
 ```bash
 
 # make a temporary work space
-mkdir /tmp/hpe
-cd /tmp/hpe
+mkdir /tmp/rabbit
+cd /tmp/rabbit
+
+# Create this user
+export USERNAME=hpe
 
 # generate a new key
-openssl genrsa -out hpe.key 2048
+openssl genrsa -out rabbit.key 2048
 
-# create a certificate signing request for the "hpe" user
-openssl req -new -key hpe.key -out hpe.csr -subj "/CN=hpe"
+# create a certificate signing request for this user
+openssl req -new -key rabbit.key -out rabbit.csr -subj "/CN=$USERNAME"
 
 # generate a certificate using the certificate authority on the k8s cluster. This certificate lasts 500 days
-openssl x509 -req -in hpe.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out hpe.crt -days 500
+openssl x509 -req -in rabbit.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out rabbit.crt -days 500
 
 ```
 
-## Create a kubeconfig
+### Create a kubeconfig
 
-After the keys have been generated, a new kubeconfig file can be created for the "hpe" user. The admin kubeconfig `/etc/kubernetes/admin.conf` can be used to determine the cluster name kube-apiserver address.
+After the keys have been generated, a new kubeconfig file can be created for this user. The admin kubeconfig `/etc/kubernetes/admin.conf` can be used to determine the cluster name kube-apiserver address.
 
 ```bash
 
 # create a new kubeconfig with the server information
-kubectl config set-cluster {CLUSTER_NAME} --kubeconfig=/tmp/hpe/hpe.conf --server={SERVER_ADDRESS} --certificate-authority=/etc/kubernetes/pki/ca.crt --embed-certs=true
+kubectl config set-cluster $CLUSTER_NAME --kubeconfig=/tmp/rabbit/rabbit.conf --server=$SERVER_ADDRESS --certificate-authority=/etc/kubernetes/pki/ca.crt --embed-certs=true
 
-# add the key and cert for the "hpe" user to the config
-kubectl config set-credentials hpe --kubeconfig=/tmp/hpe/hpe.conf --client-certificate=/tmp/hpe/hpe.crt --client-key=/tmp/hpe/hpe.key --embed-certs=true
+# add the key and cert for this user to the config
+kubectl config set-credentials $USERNAME --kubeconfig=/tmp/rabbit/rabbit.conf --client-certificate=/tmp/rabbit/rabbit.crt --client-key=/tmp/rabbit/rabbit.key --embed-certs=true
 
 # add a context
-kubectl config set-context hpe-context --kubeconfig=/tmp/hpe/hpe.conf --cluster={CLUSTER_NAME} --user=hpe
+kubectl config set-context $USERNAME --kubeconfig=/tmp/rabbit/rabbit.conf --cluster=$CLUSTER_NAME --user=$USERNAME
 ```
 
 The kubeconfig file should be placed in a location where HPE employees have read access to it.
 
-## Create ClusterRole and ClusterRoleBinding
+### Create ClusterRole and ClusterRoleBinding
 
 The next step is to create ClusterRole and ClusterRoleBinding resources. The ClusterRole provided allows viewing all cluster and namespace scoped resources, but disallows creating, deleting, or modifying any resources.
 
@@ -92,10 +97,58 @@ roleRef:
 
 Both of these resources can be created using the `kubectl apply` command.
 
-## Testing
+### Testing
 
 Get, List, Create, Delete, and Modify operations can be tested as the "hpe" user by setting the KUBECONFIG environment variable to use the new kubeconfig file. Get and List should be the only allowed operations. Other operations should fail with a "forbidden" error.
 
 ```bash
 export KUBECONFIG=/tmp/hpe/hpe.conf
 ```
+
+## RBAC for Workload Manager (WLM)
+
+**Note** This section assumes the reader has read and understood the steps described above for setting up `RBAC for Users`.
+
+A workload manager (WLM) such as [Flux](https://github.com/flux-framework) or [Slurm](https://slurm.schedmd.com) will interact with [DataWorkflowServices](https://dataworkflowservices.github.io) as a privileged user. RBAC is used to limit the operations that a WLM can perform on a Rabbit system.
+
+The following steps are required to create a user and a role for the WLM.  In this case, we're creating a user to be used with the Flux WLM:
+
+- Generate a new key/cert pair for a "flux" user
+- Creating a new kubeconfig file
+- Adding RBAC rules for the "flux" user to allow appropriate access to the DataWorkflowServices API.
+
+### Generate a Key and Certificate
+
+Generate a key and certificate for our "flux" user, similar to the way we created one for the "hpe" user above.  Substitute "flux" in place of "hpe".
+
+### Create a kubeconfig
+
+After the keys have been generated, a new kubeconfig file can be created for the "flux" user, similar to the one for the "hpe" user above.  Again, substitute "flux" in place of "hpe".
+
+### Apply the provided ClusterRole and create a ClusterRoleBinding
+
+DataWorkflowServices has already defined the role to be used with WLMs.  Simply apply the `workload-manager` ClusterRole from DataWorkflowServices to the system:
+
+```console
+kubectl apply -f https://github.com/HewlettPackard/dws/raw/master/config/rbac/workload_manager_role.yaml
+```
+
+Create and apply a ClusterRoleBinding to associate the "flux" user with the `workload-manager` ClusterRole:
+
+ClusterRoleBinding
+```yaml
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: flux
+subjects:
+- kind: User
+  name: flux
+  apiGroup: rbac.authorization.k8s.io
+roleRef:
+  kind: ClusterRole
+  name: workload-manager
+  apiGroup: rbac.authorization.k8s.io
+```
+
+The WLM should then use the kubeconfig file associated with this "flux" user to access the DataWorkflowServices API and the Rabbit system.
diff --git a/docs/guides/user-containers/readme.md b/docs/guides/user-containers/readme.md
@@ -0,0 +1,200 @@
+# NNF User Containers
+
+NNF User Containers are a mechanism to allow user-defined containerized
+applications to be run on Rabbit nodes with access to NNF ephemeral and persistent storage.
+
+!!! note
+
+    The following is a limited look at User Containers.  More content will be
+    provided after the RFC has been finalized.
+
+## Custom NnfContainerProfile
+
+The author of a containerized application will work with the administrator to
+define a pod specification template for the container and to create an
+appropriate NnfContainerProfile resource for the container.  The image and tag
+for the user's container will be specified in the profile.
+
+New NnfContainerProfile resources may be created by copying one of the provided
+example profiles from the `nnf-system` namespace.  The examples may be found by listing them with `kubectl`:
+
+```console
+kubectl get nnfcontainerprofiles -n nnf-system
+```
+
+### Workflow Job Specification
+
+The user's workflow will specify the name of the NnfContainerProfile in a DW
+directive.  If the custom profile is named `red-rock-slushy` then it will be
+specified in the "#DW container" directive with the "profile" parameter.
+
+```bash
+#DW container profile=red-rock-slushy  [...]
+```
+
+## Using a Private Container Repository
+
+The user's containerized application may be placed in a private repository.  In
+this case, the user must define an access token to be used with that repository,
+and that token must be made available to the Rabbit's Kubernetes environment
+so that it can pull that container from the private repository.
+
+See [Pull an Image from a Private Registry](https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/) in the Kubernetes documentation
+for more information.
+
+### About the Example
+
+Each container registry will have its own way of letting its users create tokens to
+be used with their repositories.  Docker Hub will be used for the private repository in this example, and the user's account on Docker Hub will be "dean".
+
+### Preparing the Private Repository
+
+The user's application container is named "red-rock-slushy".  To store this container
+on Docker Hub the user must log into docker.com with their browser and click the "Create repository" button to create a repository named "red-rock-slushy", and the user must check the box that marks the repository as private.  The repository's name will be displayed as "dean/red-rock-slushy" with a lock icon to show that it is private.
+
+### Create and Push a Container
+
+The user will create their container image in the usual ways, naming it for their private repository and tagging it according to its release.
+
+Prior to pushing images to the repository, the user must complete a one-time login to the Docker registry using the docker command-line tool.
+
+```console
+docker login -u dean
+```
+
+After completing the login, the user may then push their images to the repository.
+
+```console
+docker push dean/red-rock-slushy:v1.0
+```
+
+### Generate a Read-Only Token
+
+A read-only token must be generated to allow Kubernetes to pull that container
+image from the private repository, because Kubernetes will not be running as
+that user.  **This token must be given to the administrator, who will use it to create a Kubernetes secret.**
+
+To log in and generate a read-only token to share with the administrator, the user must follow these steps:
+
+- Visit docker.com and log in using their browser.
+- Click on the username in the upper right corner.
+- Select "Account Settings" and navigate to "Security".
+- Click the "New Access Token" button to create a read-only token.
+- Keep a copy of the generated token to share with the administrator.
+
+### Store the Read-Only Token as a Kubernetes Secret
+
+The adminstrator must store the user's read-only token as a kubernetes secret.  The
+secret must be placed in the `default` namespace, which is the same namespace
+where the user containers will be run.  The secret must include the user's Docker
+Hub username and the email address they have associated with that username.  In
+this case, the secret will be named `readonly-red-rock-slushy`.
+
+```console
+$ USER_TOKEN=users-token-text
+$ USER_NAME=dean
+$ [email protected]
+$ SECRET_NAME=readonly-red-rock-slushy
+$ kubectl create secret docker-registry $SECRET_NAME -n default --docker-server="https://index.docker.io/v1/" --docker-username=$USER_NAME --docker-password=$USER_TOKEN --docker-email=$USER_EMAIL
+```
+
+### Add the Secret to the NnfContainerProfile
+
+The administrator must add an `imagePullSecrets` list to the NnfContainerProfile
+resource that was created for this user's containerized application.
+
+The following profile shows the placement of the `readonly-red-rock-slushy` secret
+which was created in the previous step, and points to the user's
+`dean/red-rock-slushy:v1.0` container.
+
+```yaml
+apiVersion: nnf.cray.hpe.com/v1alpha1
+kind: NnfContainerProfile
+metadata:
+  name: red-rock-slushy
+  namespace: nnf-system
+data:
+  pinned: false
+  retryLimit: 6
+  spec:
+    imagePullSecrets:
+    - name: readonly-red-rock-slushy
+    containers:
+    - command:
+      - /users-application
+      image: dean/red-rock-slushy:v1.0
+      name: red-rock-app
+  storages:
+  - name: DW_JOB_foo_local_storage
+    optional: false
+  - name: DW_PERSISTENT_foo_persistent_storage
+    optional: true
+```
+
+Now any user can select this profile in their Workflow by specifying it in a
+`#DW container` directive.
+
+```bash
+#DW container profile=red-rock-slushy  [...]
+```
+
+### Using a Private Container Repository for MPI Application Containers
+
+If our user's containerized application instead contains an MPI application,
+because perhaps it's a private copy of [nnf-mfu](https://github.com/NearNodeFlash/nnf-mfu),
+then the administrator would insert two `imagePullSecrets` lists into the
+`mpiSpec` of the NnfContainerProfile for the MPI launcher and the MPI worker.
+
+```yaml
+apiVersion: nnf.cray.hpe.com/v1alpha1
+kind: NnfContainerProfile
+metadata:
+  name: mpi-red-rock-slushy
+  namespace: nnf-system
+data:
+  mpiSpec:
+    mpiImplementation: OpenMPI
+    mpiReplicaSpecs:
+      Launcher:
+        template:
+          spec:
+            imagePullSecrets:
+            - name: readonly-red-rock-slushy
+            containers:
+            - command:
+              - mpirun
+              - dcmp
+              - $(DW_JOB_foo_local_storage)/0
+              - $(DW_JOB_foo_local_storage)/1
+              image: dean/red-rock-slushy:v2.0
+              name: red-rock-launcher
+      Worker:
+        template:
+          spec:
+            imagePullSecrets:
+            - name: readonly-red-rock-slushy
+            containers:
+            - image: dean/red-rock-slushy:v2.0
+              name: red-rock-worker
+    runPolicy:
+      cleanPodPolicy: Running
+      suspend: false
+    slotsPerWorker: 1
+    sshAuthMountPath: /root/.ssh
+  pinned: false
+  retryLimit: 6
+  storages:
+  - name: DW_JOB_foo_local_storage
+    optional: false
+  - name: DW_PERSISTENT_foo_persistent_storage
+    optional: true
+```
+
+Now any user can select this profile in their Workflow by specifying it in a
+`#DW container` directive.
+
+```bash
+#DW container profile=mpi-red-rock-slushy  [...]
+```
+
+