Handle VM state changes during storage live migration #1394

awels · 2024-11-12T13:54:33Z

It is possible that during the storage live migration of a VM the state of that VM changes. This can happen when the VM is started or stopped. When a running
VM is stopped, the live migration is cancelled and an offline migration is started instead.

The reverse is also true, if a VM is started during and offline migration, the offline migration is
cancelled and a live migration is started instead.

Since offline migrations use a single rsync server with potentially multiple clients. If we stop a VM but another offline migration is running, that one is allowed to complete before starting a new offline migration. The reverse is also true, if a VM is started while it is part of an offline migration, the offline migration is allowed to complete before starting a live migration.

The following combinations of VM state and changes should behave as described in this table

VM state when cutting over	VM state change	Expected behavior
One VM off	VM is started after both rsync server and rsync client are created (pending or running)	rsync server and client are stopped, and live migration completes
One VM on	VM is stopped after cutover is started and second virt-launcher is running	Both virt-launcher pods are stopped, and only after they are gone will the rsync server and client get created
One VM on	VM is stopped after cutover is started and second virt-launcher is not running	virt-launcher pods are stopped, and only after they are gone will the rsync server and client get created
Two VMs, both off	One VM is started after rsync server and both clients are created (pending or running)	One client is stopped immediately (the one associated with the VM that was started). The other client is allowed to complete, only then will the live migration of the VM start. This is to prevent anything in the running rsync server from messing with the live migration
Two VMs, both off	Both VMs are started after rsync server and both clients are created (pending or running)	Both clients are stopped as well as the rsync server, once all the rsync pods have stopped, the live migration will start
Two VMs, both running	One VM is stopped after live migrations have started	The running live migration completes, and an rsync server and client are created for the VM that is stopped, and runs to completion
Two VMs, both running	Both VMs are stopped after live migrations have started	All virt-launcher pods are stopped, and an rsync server and two clients are created and the offline migration runs to completion (it is possible for this scenario to turn create an rsync server and client, and then after completion does it again for the second VM disk, this happens if the rsync server starts before the second VM is stopped)
Two VMs, one running, one stopped	The stopped VM is started after both live migration and offline migration have started	The rsync server and client are stopped, and a new live migration is created and both live migrations run to completion
Two VMs, one running, one stopped	The running VM is stopped after both live migration and offline migration have started	The virt-launchers pods are stopped, and the rsync server and client run to completion, after they complete a new rsync server and client are created for the newly stopped VM, this runs to completion

It is possible that during the storage live migration of a VM the state of that VM changes. This can happen when the VM is started or stopped. When a running VM is stopped, the live migration is cancelled and an offline migration is started instead. The reverse is also true, if a VM is started during and offline migration, the offline migration is cancelled and a live migration is started instead. Since offline migrations use a single rsync server with potentially multiple clients. If we stop a VM but another offline migration is running, that one is allowed to complete before starting a new offline migration. The reverse is also true, if a VM is started while it is part of an offline migration, the offline migration is allowed to complete before starting a live migration. Signed-off-by: Alexander Wels <[email protected]>

pkg/controller/migplan/pvlist.go

dymurray · 2024-11-18T14:04:29Z

pkg/controller/migplan/pvlist.go

@@ -376,7 +377,7 @@ func (r *ReconcileMigPlan) getClaims(client compat.Client, plan *migapi.MigPlan)
 	}

 	alreadyMigrated := func(pvc core.PersistentVolumeClaim) bool {
-		if planuid, exists := pvc.Labels[migapi.MigMigrationLabel]; exists {
+		if planuid, exists := pvc.Labels[migapi.MigPlanLabel]; exists {


Why the switch to the migplan label?

Because the MigMigrationLabel didn't make sense. Once you migrate, you cannot migrate again, unless you rollback. At which point the migration label is overwritten to the new MigMigration. To me it made much more sense to mark the PVCs with the MigPlan since that will be constant regardless of how many times you migrate and roll back.

dymurray reviewed Nov 18, 2024

View reviewed changes

pkg/controller/migplan/pvlist.go Show resolved Hide resolved

dymurray reviewed Nov 18, 2024

View reviewed changes

dymurray approved these changes Nov 18, 2024

View reviewed changes

pranavgaikwad approved these changes Nov 18, 2024

View reviewed changes

rayfordj merged commit c47fede into migtools:master Nov 18, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle VM state changes during storage live migration #1394

Handle VM state changes during storage live migration #1394

awels commented Nov 12, 2024

dymurray Nov 18, 2024

awels Nov 18, 2024

Handle VM state changes during storage live migration #1394

Handle VM state changes during storage live migration #1394

Conversation

awels commented Nov 12, 2024

dymurray Nov 18, 2024

Choose a reason for hiding this comment

awels Nov 18, 2024

Choose a reason for hiding this comment