Improve NFS home directory restore documentation #5291

sunu · 2024-12-17T14:20:36Z

Also adds the option to mount an EBS volume to a pod through the deployer exec root-homes command.

Follow up for #5286. Related to #5061.

@yuvipanda @sgibson91 - This is still a draft and I haven't really tested this but does the overall approach look ok?

Also adds the option to mount an EBS volume to a pod through the deployer exec root-homes command.

sgibson91 · 2024-12-18T09:53:28Z

docs/howto/filesystem-management/filesystem-backups/restore-filesystem.md

@@ -25,18 +25,24 @@ To restore a home directory from a snapshot, we need to create a new EBS volume
 Please follow AWS's guidance for [restoring EBS volumes from a snapshot](https://docs.aws.amazon.com/prescriptive-guidance/latest/backup-recovery/restore.html#restore-files) to create a new EBS volume from the snapshot.
 ```

-Once we have created a new EBS volume from the snapshot, we need to mount it to the EC2 instance attached to the existing EBS volume containing the NFS home directories. To do this, we need to find the instance ID of the EC2 instance attached to the existing EBS volume. This involves the following steps:
+Once we have created a new EBS volume from the snapshot, we can use the `deployer exec` command to mount the new EBS volume to a pod along with the existing NFS home directories volume.


Suggested change

Once we have created a new EBS volume from the snapshot, we can use the `deployer exec` command to mount the new EBS volume to a pod along with the existing NFS home directories volume.

Once we have created a new EBS volume from the snapshot, we can use the `deployer exec root-homes` command to mount the new EBS volume to a pod along with the existing NFS home directories volume.

docs/howto/filesystem-management/filesystem-backups/restore-filesystem.md

…ound first

for more information, see https://pre-commit.ci

sunu · 2024-12-19T06:29:32Z

@sgibson91 I did another test run to test the "wait for the pod to be ready" command that we've added. Everything worked as expected. I can merge this once you give your approval.

sgibson91 · 2024-12-19T10:43:51Z

I've pinged Yuvi on this too, because I want to make sure his concerns that caused him to reopen #5061 are addressed

yuvipanda

This is great! Some minor suggested changes but otherwise this looks pretty close.

While not necessarily done as part of this PR, I do think it's important for us to fully end to end test the restore process at least once before calling it done (see pt 7 of https://mastodon.social/@nixCraft/113671487670162948). So let's make sure that at least one person tries all these steps top to bottom once, and verifies the completeness of the whole process.

Exciting to have this be so close to done!

yuvipanda · 2024-12-19T20:23:35Z

deployer/commands/exec/infra_components.py

@@ -37,6 +37,15 @@ def root_homes(
    extra_nfs_mount_path: str = typer.Option(
        None, help="Mount point for the extra NFS share"
    ),
+    restore_volume_id: str = typer.Option(


Let's call this additional_ebs_volume_id or something like that, as the code itself doesn't do anything to do with restoring - just mounting additional ebs volumes.

Should we use additional_volume_id to keep it platform agnostic? I can add in the help text that only EBS volumes are supported for now.

That sounds reasonable to me for now. Of course, we can always update things as we begin work on other platforms.

Ya i leave it to you both for now!

yuvipanda · 2024-12-19T20:24:18Z

deployer/commands/exec/infra_components.py

+        help="ID of volume to mount (e.g. vol-1234567890abcdef0) "
+        "to restore data from",
+    ),
+    restore_mount_path: str = typer.Option(


See comment on restore_volume_id about using 'restore' in the name. I'd say this could be additional_ebs_volume_mount_path?

yuvipanda · 2024-12-19T20:25:36Z

deployer/commands/exec/infra_components.py

+            "kind": "PersistentVolume",
+            "metadata": {"name": pv_name},
+            "spec": {
+                "capacity": {"storage": restore_volume_size},


Is this necessary for us to specify? I worry that if we get this wrong, it will trigger the kubernetes volume resizing feature

I suspect storage capacity is a required field. I'll try to confirm whether that's required or not.

Reading this warning note, looks like volume expansion is triggered only when the storage capacity is updated for an existing PVC that is already bound to a PV creating a mismatch between the specified storage capacity of the PVC and the bound PV.

I will confirm what happens when the specified volume on the PV and the PVC is larger than the actual size of the disk and ALLOWVOLUMEEXPANSION is enabled on the storageclass.

I did some testing and storage capacity is indeed a required field.

I also tried passing a larger size than the actual disk size as the argument to see if that triggers a resize. The PV and PVC were labeled with the wrong capacity but the underlying disk was not resized.

@yuvipanda How would you like to move forward?

@sunu ah, if it doesn't trigger a resize I would say:

Hard code an intentionally wrong number here (as df and other commands we use don't rely on it, instead using other more linux filesystem level calls)

Write an inline comment explaining what is going on, and that it doesn't trigger resize

This would allow us to not have the engineer doing this need to know the filesize.

yuvipanda · 2024-12-19T20:25:58Z

deployer/commands/exec/infra_components.py

+                "accessModes": ["ReadWriteOnce"],
+                "volumeName": pv_name,
+                "storageClassName": "",
+                "resources": {"requests": {"storage": restore_volume_size}},


See comment about specifying sizes

yuvipanda · 2024-12-19T20:26:14Z

deployer/commands/exec/infra_components.py

+    restore_mount_path: str = typer.Option(
+        "/restore-volume", help="Mount point for the volume"
+    ),
+    restore_volume_size: str = typer.Option("100Gi", help="Size of the volume"),


See comment later about specifying the size

yuvipanda · 2024-12-19T20:27:12Z

deployer/commands/exec/infra_components.py

@@ -140,6 +200,17 @@ def root_homes(
        tmpf.flush()

        with cluster.auth():
+            # If we have an volume to restore from, create PV and PVC first
+            if restore_volume_id:
+                with tempfile.NamedTemporaryFile(mode="w", suffix=".json") as pv_tmpf:


This is being created outside the try whose finally block contains the cleanup code, so it's possible that errors here will lead to the PV not being cleaned up. I think this hsould all be moved inside the same try whose finally has the cleanup code

Ah, good catch, thank you!

sgibson91 · 2024-12-20T11:41:33Z

While not necessarily done as part of this PR, I do think it's important for us to fully end to end test the restore process at least once before calling it done (see pt 7 of https://mastodon.social/@nixCraft/113671487670162948). So let's make sure that at least one person tries all these steps top to bottom once, and verifies the completeness of the whole process.

So I did actually do this in a meeting with Tarashish the other day. I created a volume from a snapshot, mounted it to a pod using the deployer exec root-homes command. We verified both volumes were there and were of similar disk size (they were identical because I think the snapshot was less than a day old), and that the username subdirs were the same. The only thing I didn't do was the rsync process itself.

We definitely need to spread all of this knowledge out beyond just me and Tarashish though.

yuvipanda · 2024-12-20T17:14:49Z

That's excellent, @sgibson91!

I think once #5291 (comment) is addressed, this can be merged.

Improve NFS home directory restore documentation

a4ba4fe

Also adds the option to mount an EBS volume to a pod through the deployer exec root-homes command.

This comment was marked as off-topic.

Sign in to view

sgibson91 reviewed Dec 18, 2024

View reviewed changes

sgibson91 added the deployer:skip-deploy Skips deployment of anything (support, staging, prod) label Dec 18, 2024

sunu and others added 7 commits December 18, 2024 15:33

remove references to EBS

b58b55e

Fix indentation

65027ec

Delete the pod before deleting the PV/PVC because they need to be unb…

06c74b1

…ound first

Add a FIXME to wait for pod to be ready before execing into it

6c8ea48

[pre-commit.ci] auto fixes from pre-commit.com hooks

ed2c380

for more information, see https://pre-commit.ci

wait for the pod to be ready

11bdd13

documentation fixes

64fa20b

sgibson91 mentioned this pull request Dec 18, 2024

[deployer exec root-homes]: wait for the pod to be ready before exec-ing into it #5294

Closed

log a message while waiting for pod to be ready

12e7911

sunu marked this pull request as ready for review December 19, 2024 06:24

sunu requested review from sgibson91 and yuvipanda December 19, 2024 06:25

yuvipanda requested changes Dec 19, 2024

View reviewed changes

Move PV and PVC creation within try block

e9cf080

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve NFS home directory restore documentation #5291

Improve NFS home directory restore documentation #5291

sunu commented Dec 17, 2024 •

edited

Loading

This comment was marked as off-topic.

sgibson91 Dec 18, 2024

sunu commented Dec 19, 2024

sgibson91 commented Dec 19, 2024

yuvipanda left a comment

yuvipanda Dec 19, 2024

sunu Dec 20, 2024

sgibson91 Dec 20, 2024

yuvipanda Dec 20, 2024

yuvipanda Dec 19, 2024

yuvipanda Dec 19, 2024

sunu Dec 20, 2024 •

edited

Loading

sunu Dec 20, 2024

yuvipanda Dec 20, 2024

yuvipanda Dec 19, 2024

yuvipanda Dec 19, 2024

yuvipanda Dec 19, 2024

sunu Dec 20, 2024

sgibson91 commented Dec 20, 2024 •

edited

Loading

yuvipanda commented Dec 20, 2024

	Once we have created a new EBS volume from the snapshot, we can use the `deployer exec` command to mount the new EBS volume to a pod along with the existing NFS home directories volume.
	Once we have created a new EBS volume from the snapshot, we can use the `deployer exec root-homes` command to mount the new EBS volume to a pod along with the existing NFS home directories volume.

Improve NFS home directory restore documentation #5291

Are you sure you want to change the base?

Improve NFS home directory restore documentation #5291

Conversation

sunu commented Dec 17, 2024 • edited Loading

This comment was marked as off-topic.

Choose a reason for hiding this comment

sunu commented Dec 19, 2024

sgibson91 commented Dec 19, 2024

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunu Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgibson91 commented Dec 20, 2024 • edited Loading

yuvipanda commented Dec 20, 2024

sunu commented Dec 17, 2024 •

edited

Loading

sunu Dec 20, 2024 •

edited

Loading

sgibson91 commented Dec 20, 2024 •

edited

Loading