New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

neonvm: add readiness probe for sysfs scaling #1190

Open

mikhail-sakhnov wants to merge 4 commits into main from misha/add-ready-probe-for-sysfs-scaling

Contributor

mikhail-sakhnov commented Dec 30, 2024 •

edited

Loading

Call runner /ready which, in sysfs scaling mode, proxifies to daemon's /cpu to check if runner pod and vm is ready. Runner's endpoint /ready does nothing in case of qmp scaling model.
Move neonvm-daemon line in the inittab to have it starting later since we use it for the health checks.

It is a blocker for #1141 because working without ready probe introduces race in some e2e tests and kuttl tries to execute shell command kill while VM is not fully booted and acpid rules are not in action yet.

It closes #1147 which is in the backlog but since it is a blocker I extracted it from the PR #1141

mikhail-sakhnov requested a review from sharnoff

December 30, 2024 12:08

github-actions bot commented Dec 30, 2024 •

edited

Loading

No changes to the coverage.

HTML Report

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from e5c9385 to 249d38b Compare

December 30, 2024 14:01

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 249d38b to 1ea3177 Compare

January 7, 2025 15:06

sharnoff reviewed

View reviewed changes

Member

sharnoff left a comment

Mostly LGTM, left a couple questions

vm-builder/files/inittab Show resolved Hide resolved

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

mikhail-sakhnov commented

View reviewed changes

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch 3 times, most recently from 61c45f4 to e71bf59 Compare

January 23, 2025 18:59

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from e71bf59 to 5825069 Compare

February 3, 2025 11:41

mikhail-sakhnov marked this pull request as ready for review

February 3, 2025 12:09

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch 2 times, most recently from 17b03f6 to abf64fe Compare

February 3, 2025 12:27

mikhail-sakhnov requested a review from sharnoff

February 3, 2025 12:28

sharnoff self-assigned this

sharnoff reviewed

View reviewed changes

Member

sharnoff left a comment

Left some questions — high level, looks fine, just fussing over the details.

neonvm-runner/cmd/main.go Outdated Show resolved Hide resolved

neonvm-runner/cmd/main.go Show resolved Hide resolved

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

sharnoff assigned mikhail-sakhnov and unassigned sharnoff

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 77d300a to 9a4bb16 Compare

February 4, 2025 12:00

mikhail-sakhnov assigned sharnoff and unassigned mikhail-sakhnov

mikhail-sakhnov mentioned this pull request

neonvm-daemon: store fractional value in the neonvm-daemon #1243

Open

mikhail-sakhnov requested a review from sharnoff

February 4, 2025 16:07


          neonvm: add readiness probe for sysfs scaling

04df9d4

Call runner /ready which, in sysfs scaling mode, proxifies to daemon's /cpu to
check if runner pod and vm is ready. Runner's endpoint /ready does nothing in
case of qmp scaling model.

Move neonvm-daemon line in the inittab to start it right before vmstart.

Modify logic in the migration controller to not waiting for the pod
readiness - neonvm-daemon doesn't start until the migration is finished.
De-facto, that doesn't change behavior for the migration at all since
before the PR we had no readiness probe.

Signed-off-by: Mikhail Sakhnov <[email protected]>

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 9a4bb16 to 04df9d4 Compare

February 6, 2025 17:25

mikhail-sakhnov mentioned this pull request

neonvm-kernel: reduce difference between configs #1248

Open

sharnoff reviewed

View reviewed changes

Member

sharnoff left a comment

Looks good. Some blocking questions then will be good to merge

pkg/neonvm/controllers/vm_controller.go Outdated

               	default:
               		panic(fmt.Errorf("unknown pod phase: %q", pod.Status.Phase))
               	}
               }
+              // isRunnerPodReady returns whether the runner pod is ready respecting the readiness probe of its containers.
+              func isRunnerPodReady(pod *corev1.Pod) runnerStatusKind {

Member

sharnoff Feb 7, 2025

nit: functions titled isFoo should return bool

Contributor Author

mikhail-sakhnov Feb 11, 2025

renamed to runnerContainerStatus(..)

pkg/neonvm/controllers/vm_controller.go Outdated

+              		if containerSpec.Name == "neonvm-runner" {
+              			for _, env := range containerSpec.Env {
+              				if env.Name == "RECEIVE_MIGRATION" && env.Value == "true" {
+              					return runnerRunning

Member

sharnoff Feb 7, 2025

question(blocking): Does this mean that we'll return runnerRunning even when the neonvm-runner container is not yet running? (or alternately: can this return ready even when the pod is still pending?)

Contributor Author

mikhail-sakhnov Feb 11, 2025

Fair point. I introduce the new status and change the vmmigration controller to use it instead of running.

Contributor Author

mikhail-sakhnov Feb 11, 2025

After the discussion with you and @Omrigan I changed the code here to check the pod references to determine if it is a target pod for migration, which looks like the most reliable way we can currently afford.

pkg/neonvm/controllers/vm_controller.go Outdated

Comment on lines 885 to 889

+              	for _, containerSpec := range pod.Spec.Containers {
+              		if containerSpec.Name == "neonvm-runner" {
+              			for _, env := range containerSpec.Env {
+              				if env.Name == "RECEIVE_MIGRATION" && env.Value == "true" {
+              					return runnerRunning

Member

sharnoff Feb 7, 2025

suggestion(non-blocking): Maybe extract the "is this a target pod" check into a separate function?

Contributor Author

mikhail-sakhnov Feb 11, 2025

done

pkg/neonvm/controllers/vm_controller.go Outdated Show resolved Hide resolved

pkg/neonvm/controllers/vmmigration_controller.go

    
            @@ -713,6 +713,7 @@ func (r *VirtualMachineMigrationReconciler) targetPodForVirtualMachine(
          
              	pod.Name = migration.Status.TargetPodName

              	// add env variable to turn on migration receiver

              	// TODO: make it false or empty after the migration is done to enable correct readiness probe

Member

sharnoff Feb 7, 2025

question(non-blocking): what's the plan here? I don't think we can modify container environments at runtime... And, is this TODO tracked somewhere?

Contributor Author

mikhail-sakhnov Feb 11, 2025 •

edited

Loading

I think we discussed it verbally with @Omrigan and I promised to create an issue for that, since I am a bit afraid of the scenario like:

we start VM
we migrate VM to a new pod
new pod has RECEIVE_MIGRATION eq true which means that in some hypothetical scenario of restarting pod (which tbh I am not even sure is possible in our lifecycle) qemu will not boot, but will wait for some migration (which is not actually happening at the moment).

Since we have already an issue to audit vmmigration controller, I will add it as a context there.

sharnoff assigned mikhail-sakhnov and unassigned sharnoff

mikhail-sakhnov marked this pull request as draft

February 11, 2025 11:48

mikhail-sakhnov mentioned this pull request

Audit VMMigration controller code #1233

Open


          Merge remote-tracking branch 'origin/main' into misha/add-ready-probe…

75c1464

…-for-sysfs-scaling

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from 22644c2 to b21b327 Compare

February 11, 2025 12:10


          neonvm-controller: code review renames

e243d9d

mikhail-sakhnov force-pushed the misha/add-ready-probe-for-sysfs-scaling branch from b21b327 to e243d9d Compare

February 11, 2025 12:11


          neonvm-controller: check if pod is target for migration in readiness …

b24addd

…probe using MigrationOwnerForPod

mikhail-sakhnov marked this pull request as ready for review

February 11, 2025 14:46

mikhail-sakhnov assigned sharnoff and unassigned mikhail-sakhnov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet