Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle VM state changes during storage live migration #1394

Merged

Conversation

awels
Copy link
Contributor

@awels awels commented Nov 12, 2024

It is possible that during the storage live migration of a VM the state of that VM changes. This can happen when the VM is started or stopped. When a running
VM is stopped, the live migration is cancelled and an offline migration is started instead.

The reverse is also true, if a VM is started during and offline migration, the offline migration is
cancelled and a live migration is started instead.

Since offline migrations use a single rsync server with potentially multiple clients. If we stop a VM but another offline migration is running, that one is allowed to complete before starting a new offline migration. The reverse is also true, if a VM is started while it is part of an offline migration, the offline migration is allowed to complete before starting a live migration.

The following combinations of VM state and changes should behave as described in this table

VM state when cutting over VM state change Expected behavior
One VM off VM is started after both rsync server and rsync client are created (pending or running) rsync server and client are stopped, and live migration completes
One VM on VM is stopped after cutover is started and second virt-launcher is running Both virt-launcher pods are stopped, and only after they are gone will the rsync server and client get created
One VM on VM is stopped after cutover is started and second virt-launcher is not running virt-launcher pods are stopped, and only after they are gone will the rsync server and client get created
Two VMs, both off One VM is started after rsync server and both clients are created (pending or running) One client is stopped immediately (the one associated with the VM that was started). The other client is allowed to complete, only then will the live migration of the VM start. This is to prevent anything in the running rsync server from messing with the live migration
Two VMs, both off Both VMs are started after rsync server and both clients are created (pending or running) Both clients are stopped as well as the rsync server, once all the rsync pods have stopped, the live migration will start
Two VMs, both running One VM is stopped after live migrations have started The running live migration completes, and an rsync server and client are created for the VM that is stopped, and runs to completion
Two VMs, both running Both VMs are stopped after live migrations have started All virt-launcher pods are stopped, and an rsync server and two clients are created and the offline migration runs to completion (it is possible for this scenario to turn create an rsync server and client, and then after completion does it again for the second VM disk, this happens if the rsync server starts before the second VM is stopped)
Two VMs, one running, one stopped The stopped VM is started after both live migration and offline migration have started The rsync server and client are stopped, and a new live migration is created and both live migrations run to completion
Two VMs, one running, one stopped The running VM is stopped after both live migration and offline migration have started The virt-launchers pods are stopped, and the rsync server and client run to completion, after they complete a new rsync server and client are created for the newly stopped VM, this runs to completion

It is possible that during the storage live migration
of a VM the state of that VM changes. This can happen
when the VM is started or stopped. When a running
VM is stopped, the live migration is cancelled and
an offline migration is started instead.

The reverse is also true, if a VM is started during
and offline migration, the offline migration is
cancelled and a live migration is started instead.

Since offline migrations use a single rsync server
with potentially multiple clients. If we stop a VM
but another offline migration is running, that one
is allowed to complete before starting a new offline
migration. The reverse is also true, if a VM is started
while it is part of an offline migration, the offline
migration is allowed to complete before starting a
live migration.

Signed-off-by: Alexander Wels <[email protected]>
@@ -376,7 +377,7 @@ func (r *ReconcileMigPlan) getClaims(client compat.Client, plan *migapi.MigPlan)
}

alreadyMigrated := func(pvc core.PersistentVolumeClaim) bool {
if planuid, exists := pvc.Labels[migapi.MigMigrationLabel]; exists {
if planuid, exists := pvc.Labels[migapi.MigPlanLabel]; exists {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the switch to the migplan label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the MigMigrationLabel didn't make sense. Once you migrate, you cannot migrate again, unless you rollback. At which point the migration label is overwritten to the new MigMigration. To me it made much more sense to mark the PVCs with the MigPlan since that will be constant regardless of how many times you migrate and roll back.

@rayfordj rayfordj merged commit c47fede into migtools:master Nov 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants