Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot based disk migration #699

Open
s4heid opened this issue Sep 9, 2024 · 1 comment
Open

Snapshot based disk migration #699

s4heid opened this issue Sep 9, 2024 · 1 comment

Comments

@s4heid
Copy link
Contributor

s4heid commented Sep 9, 2024

Is your feature request related to a problem? Please describe.

Currently, Azure does not support a direct migration from Premium SSD v2 or Ultra SSD to other disk types, as detailed here. This limitation prevents the use of the native disk update feature for those looking to switch storage types from Premium SSD v2 or Ultra Disks.

Describe the solution you'd like

Although direct support is not available yet, changing the disk type is feasible by utilizing snapshots. If you're transitioning from Premium SSD v2 or Ultra Disks, the update_disk method should do the following steps:

  1. unmounting the disk
  2. creating a snapshot, ensuring the completionPercent reaches 100
  3. generating a new disk from this snapshot
  4. mounting the new disk
  5. removing the snapshot again

Pending Task: evaluate and compare the efficiency of using regular copy methods versus snapshots.

Additional context
This request is a continuation of issue #697 and has been suggested by @MSSedusch.

@s4heid
Copy link
Contributor Author

s4heid commented Nov 18, 2024

It appears that the situation is somewhat more complex than initially thought. There are several caveats to note about snapshots and the Premium SSD v2 disk type:

  • Ultra and Premium SSD v2 disks only support incremental snapshots.

  • When you create an incremental snapshot of either a Premium SSD v2 or an Ultra Disk, the first snapshot acts as a full copy of the disk. However, after taking this initial snapshot, you cannot use it immediately. There is a background copy process that must complete before you can create a new disk from that snapshot. See reference.

    Attempting to create a new disk from the snapshot before the background process completes results in an error from the Azure API:

    $ az disk create --name myNewPremiumDisk --resource-group rg-disk-test --size-gb 1024 --sku Premium_LRS --source "/subscriptions/<subscription>/resourceGroups/rg-disk-test/providers/Microsoft.Compute/snapshots/mySnapshot"
    (Conflict) Source incremental snapshot sebastian-snap-other copy is still in progress. Please retry after source snapshot's copy has completed.
    Code: Conflict
    Message: Source incremental snapshot mySnapshot copy is still in progress. Please retry after source snapshot's copy has completed.
  • Full snapshots require significant time to complete, especially for large disks, because they copy the entire data set of the disk. Incremental snapshots are much faster to create, because they only capture the changes made since the last snapshot.

To minimize downtime while converting Premium v2 disks via incremental snapshots, here are a few strategies that come to my mind:

  1. We could shorten the downtime by creating an initial incremental snapshot (full copy) before even starting the update_disk process and applying only the incremental snapshot (which copies the delta) during update_disk. The initial snapshot could be taken upon disk creation e.g. when the enable_cpi_update_disk is active. The azure cpi's create_disk method could be instructed with a parameter passed by bosh via cloud_properties to perform the snapshot creation.

    Implementing this solution could significantly reduce the downtime, but would require some modifications to bosh and it seems like this could become a larger change, since we also have to manage the snapshots lifecycle.

  2. Accept a longer downtime during disk updates, as the default copy mechanism used by bosh may not be quicker (this needs an evaluation). Monitor the snapshot's status with a simple polling algorithm and proceed with disk creation once the completionPercent reaches 100.0

    $ az snapshot show -n sebastian-snap -g rg-disk-test --query '[completionPercent]' -o tsv
    100.0

    If creating and waiting for the snapshot is not faster than bosh's default copying mechanism, the question is whether this extra implementation is worth the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

1 participant