diff --git a/docs/guides/initial-setup/readme.md b/docs/guides/initial-setup/readme.md index b96332f..1eeaba3 100644 --- a/docs/guides/initial-setup/readme.md +++ b/docs/guides/initial-setup/readme.md @@ -12,23 +12,35 @@ Instructions for the initial setup of a Rabbit are included in this document. ??? "LVM Details" Running LVM commands (lvcreate/lvremove) on a Rabbit to create logical volumes is problematic if those commands run within a container. Rabbit Storage Orchestration code contained in the `nnf-node-manager` Kubernetes pod executes LVM commands from within the container. The problem is that the LVM create/remove commands wait for a UDEV confirmation cookie that is set when UDEV rules run within the host OS. These cookies are not synchronized with the containers where the LVM commands execute. - 3 options to solve this problem are: + 4 options to solve this problem are: - 1. Disable UDEV sync at the host operating system level - 2. Disable UDEV sync using the `–noudevsync` command option for each LVM command - 3. Clear the UDEV cookie using the `dmsetup udevcomplete_all` command after the lvcreate/lvremove command. + 1. Disable UDEV for LVM + 2. Disable UDEV sync at the host operating system level + 3. Disable UDEV sync using the `–noudevsync` command option for each LVM command + 4. Clear the UDEV cookie using the `dmsetup udevcomplete_all` command after the lvcreate/lvremove command. - Taking these in reverse order using option 3 above which allows UDEV settings within the host OS to remain unchanged from the default, one would need to start the `dmsetup` command on a separate thread because the LVM create/remove command waits for the UDEV cookie. This opens too many error paths, so it was rejected. + Taking these in reverse order using option 4 above which allows UDEV settings within the host OS to remain unchanged from the default, one would need to start the `dmsetup` command on a separate thread because the LVM create/remove command waits for the UDEV cookie. This opens too many error paths, so it was rejected. - Option 2 allows UDEV settings within the host OS to remain unchanged from the default, but the use of UDEV within production Rabbit systems is viewed as unnecessary because the host OS is PXE-booted onto the node vs loaded from an device that is discovered by UDEV. + Option 3 allows UDEV settings within the host OS to remain unchanged from the default, but the use of UDEV within production Rabbit systems is viewed as unnecessary because the host OS is PXE-booted onto the node vs loaded from an device that is discovered by UDEV. - Option 1 above is what we chose to implement because it is the simplest. The following sections discuss this setting. + Option 2 above is our preferred way to disable UDEV syncing if disabling UDEV for LVM is not desired. + + If UDEV sync is disabled as described in options 2 and 3, then LVM must also be run with the option to verify UDEV operations. This adds extra checks to verify that the UDEV devices appear as LVM expects. For some LV types (like RAID configurations), the UDEV device takes longer to appear in `/dev`. Without the UDEV confirmation cookie, LVM won't wait long enough to find the device unless the LVM UDEV checks are done. + + Option 1 above is the overall preferred method for managing LVM devices on Rabbit nodes. LVM will handle device files without input from UDEV. -In order for LVM commands to run within the container environment on a Rabbit, the following change is required to the `/etc/lvm/lvm.conf` file on Rabbit. +In order for LVM commands to run within the container environment on a Rabbit, one of the following changes is required to the `/etc/lvm/lvm.conf` file on Rabbit. + +Option 1 as described above: +```bash +sed -i 's/udev_rules = 1/udev_rules = 0/g' /etc/lvm/lvm.conf +``` +Option 2 as described above: ```bash sed -i 's/udev_sync = 1/udev_sync = 0/g' /etc/lvm/lvm.conf +sed -i 's/verify_udev_operations = 0/verify_udev_operations = 1/g' /etc/lvm/lvm.conf ``` ### ZFS diff --git a/docs/guides/system-storage/readme.md b/docs/guides/system-storage/readme.md index 1cdf8d2..d7975f4 100644 --- a/docs/guides/system-storage/readme.md +++ b/docs/guides/system-storage/readme.md @@ -24,9 +24,9 @@ System storage is created through the `NnfSystemStorage` resource. By default, s | `ComputesPattern` | No | Empty | A list of integers [0-15] | If `ComputesTarget` is `pattern`, then the storage is made available on compute nodes with the indexes specified in this list. | | `Capacity` | Yes | `1073741824` | Integer | Number of bytes to allocate per Rabbit | | `Type` | Yes | `raw` | `raw`, `xfs`, `gfs2` | Type of file system to create on the Rabbit storage | -| `StorageProfile` | Yes | None | `ObjectReference` to an `NnfStorageProfile`. This storage profile must be marked as `pinned` | -| `MakeClientMounts` | Yes | `false` | Create `ClientMount` resources to mount the storage on the compute nodes. If this is `false`, then the devices are made available to the compute nodes without mounting the file system | -| `ClientMountPath` | No | None | Path to mount the file system on the compute nodes | +| `StorageProfile` | Yes | None | `ObjectReference` to an `NnfStorageProfile` | This storage profile must be marked as `pinned` | +| `MakeClientMounts` | Yes | `false` | Bool | Create `ClientMount` resources to mount the storage on the compute nodes. If this is `false`, then the devices are made available to the compute nodes without mounting the file system | +| `ClientMountPath` | No | None | Path | Path to mount the file system on the compute nodes | `NnfSystemResources` can be created in any namespace. @@ -62,7 +62,7 @@ spec: clientMountPath: "/mnt/nnf/gfs2" storageProfile: name: gfs2-systemstorage - namespace: systemstorage + namespace: default kind: NnfStorageProfile ``` @@ -80,8 +80,8 @@ The following example resources show how to create two system storages to use fo apiVersion: nnf.cray.hpe.com/v1alpha1 kind: NnfStorageProfile metadata: - name: lvmlockd_even - namespace: systemstorage + name: lvmlockd-even + namespace: default data: xfsStorage: capacityScalingFactor: "1.0" @@ -100,14 +100,14 @@ data: vgChange: lockStart: --lock-start $VG_NAME lockStop: --lock-stop $VG_NAME - vgCreate: --shared --addtag lvmlockd_even $VG_NAME $DEVICE_LIST + vgCreate: --shared --addtag lvmlockd-even $VG_NAME $DEVICE_LIST vgRemove: $VG_NAME --- apiVersion: nnf.cray.hpe.com/v1alpha1 kind: NnfStorageProfile metadata: - name: lvmlockd_odd - namespace: systemstorage + name: lvmlockd-odd + namespace: default data: xfsStorage: capacityScalingFactor: "1.0" @@ -126,7 +126,7 @@ data: vgChange: lockStart: --lock-start $VG_NAME lockStop: --lock-stop $VG_NAME - vgCreate: --shared --addtag lvmlockd_odd $VG_NAME $DEVICE_LIST + vgCreate: --shared --addtag lvmlockd-odd $VG_NAME $DEVICE_LIST vgRemove: $VG_NAME ``` @@ -136,29 +136,29 @@ Note that the `NnfStorageProfile` resources are marked as `default: false` and ` apiVersion: nnf.cray.hpe.com/v1alpha1 kind: NnfSystemStorage metadata: - name: lvmlockd_even + name: lvmlockd-even namespace: systemstorage spec: type: "raw" computesTarget: "even" makeClientMounts: false storageProfile: - name: lvmlockd_even - namespace: systemstorage + name: lvmlockd-even + namespace: default kind: NnfStorageProfile --- apiVersion: nnf.cray.hpe.com/v1alpha1 kind: NnfSystemStorage metadata: - name: lvmlockd_odd + name: lvmlockd-odd namespace: systemstorage spec: type: "raw" computesTarget: "odd" makeClientMounts: false storageProfile: - name: lvmlockd_odd - namespace: systemstorage + name: lvmlockd-odd + namespace: default kind: NnfStorageProfile ```