-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: Check that volume is functional after live migration. #256
base: main
Are you sure you want to change the base?
Conversation
a9e5ef9
to
6ef9d52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Hmmm we'll it works locally but not in CI... what a surprise! 🤦 I'll have to investigate more tomorrow with tmate. |
tests/vm-migration
Outdated
sleep 60 | ||
|
||
# Wait for a long time for it to boot (doubly nested VM takes a while). | ||
while [ "$(lxc exec member1 -- sh -c "lxc info v1 | grep -F 'Processes:' | cut -d':' -f2 | tr -d '[:blank:]'")" -le 1 ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the wrapper shell is not needed.
while [ "$(lxc exec member1 -- sh -c "lxc info v1 | grep -F 'Processes:' | cut -d':' -f2 | tr -d '[:blank:]'")" -le 1 ]; do | |
while [ "$(lxc exec member1 -- lxc info v1 | awk '{if ($1 == "Processes:") print $2}')" -le 1 ]; do |
Also changed to use awk
like we do elsewhere to get the process count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With awk I think I need the subshell:
$ [ lxc exec member1 -- lxc info v1 | awk '{if ($1 == "Processes:") print $2}' -le 1 ]
awk: fatal: cannot open file `-le' for reading: No such file or directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear. With the subshell, I meant the sh -c
bit used while doing the lxc exec
.
tests/vm-migration
Outdated
lxc exec member1 -- lxc move v1 --target member2 | ||
|
||
# The VM is slow. So the agent isn't immediately available after the live migration. | ||
while [ "$(lxc exec member2 -- sh -c "lxc info v1 | grep -F 'Processes:' | cut -d':' -f2 | tr -d '[:blank:]'")" -le 1 ]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note about subshell.
tests/vm-migration
Outdated
done | ||
|
||
# The volume should be functional, still mounted, and the file we created should still be there with the same contents. | ||
[ "$(lxc exec member2 -- sh -c "lxc exec v1 -- cat /mnt/vol1/bar")" = "foo" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work, I think:
[ "$(lxc exec member2 -- sh -c "lxc exec v1 -- cat /mnt/vol1/bar")" = "foo" ] | |
[ "$(lxc exec member2 -- lxc exec v1 -- cat /mnt/vol1/bar)" = "v1" ] |
tests/vm-migration
Outdated
lxc exec member1 -- sh -c "lxc exec v1 -- mkfs -t ext4 /dev/sdb" | ||
lxc exec member1 -- sh -c "lxc exec v1 -- mkdir /mnt/vol1" | ||
lxc exec member1 -- sh -c "lxc exec v1 -- mount -t auto /dev/sdb /mnt/vol1" | ||
lxc exec member1 -- sh -c "lxc exec v1 -- sh -c 'echo foo > /mnt/vol1/bar'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do without the subshells and also make it explicit that we expect an ext4 formatted disk rather than accept anything:
lxc exec member1 -- sh -c "lxc exec v1 -- mkfs -t ext4 /dev/sdb" | |
lxc exec member1 -- sh -c "lxc exec v1 -- mkdir /mnt/vol1" | |
lxc exec member1 -- sh -c "lxc exec v1 -- mount -t auto /dev/sdb /mnt/vol1" | |
lxc exec member1 -- sh -c "lxc exec v1 -- sh -c 'echo foo > /mnt/vol1/bar'" | |
lxc exec member1 -- lxc exec v1 -- mkfs -t ext4 /dev/sdb | |
lxc exec member1 -- lxc exec v1 -- mkdir /mnt/vol1 | |
lxc exec member1 -- lxc exec v1 -- mount -t ext4 /dev/sdb /mnt/vol1 | |
lxc exec member1 -- lxc exec v1 -- cp /etc/hostname /mnt/vol1/bar |
6ef9d52
to
b040de8
Compare
ready for rebase |
11eaed0
to
462a0c7
Compare
tests/vm-migration
Outdated
lxc launch "${TEST_IMG:-ubuntu-minimal-daily:24.04}" member1 --vm -c security.devlxd.images=true | ||
lxc launch "${TEST_IMG:-ubuntu-minimal-daily:24.04}" member2 --vm -c security.devlxd.images=true | ||
else | ||
lxc launch "${TEST_IMG:-ubuntu-minimal-daily:24.04}" member1 --vm | ||
lxc launch "${TEST_IMG:-ubuntu-minimal-daily:24.04}" member2 --vm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GHA runner VM have 16G of RAM (https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories) so you can probably allocate more RAM to those memberX VMs.
Have you already tried with -c limits.cpu=2 -c limits.memory=4GiB
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've bumped all the resources to what is used in microcloud and it's still not booting after over an hour. I'll go back to trying with containers and see how I get on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The while loop checking for v1
's state seems to have a great overhead as it is not being executed every 30s as it should. Over time it drifted by many seconds. One possible way to reduce that overhead would be to reduce the polling frequency and have the loop done inside member1
rather than from the GHA runner itself.
In other words, replace the while :; do lxc exec member1 -- lxc info v1 ...; sleep 30; done
by lxc exec member1 -- sh -c 'while :; do lxc info v1 ...; sleep 60; done'
.
Another thing that could possibly help would be to delay member2
start till after v1
is confirmed booted on member1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this would avoid needing to keep setting up the websockets over vsock for the exec sessions
30564cc
to
e7f3e43
Compare
tests/vm-migration
Outdated
sleep 60 | ||
|
||
# Wait for a long time for it to boot (doubly nested VM takes a while). | ||
lxc exec member1 -- sh -c 'while [ "$(lxc info v1 | awk '"'"'{if ($1 == "Processes:") print $2}'"'"')" -le 1 ]; do echo "Instance v1 still not booted, waiting 60s..." && sleep 60; done' |
Check warning
Code scanning / shellcheck
SC2016 Warning test
tests/vm-migration
Outdated
sleep 60 | ||
|
||
# Wait for a long time for it to boot (doubly nested VM takes a while). | ||
lxc exec member1 -- sh -c 'while [ "$(lxc info v1 | awk '"'"'{if ($1 == "Processes:") print $2}'"'"')" -le 1 ]; do echo "Instance v1 still not booted, waiting 60s..." && sleep 60; done' |
Check warning
Code scanning / shellcheck
SC2016 Warning test
tests/vm-migration
Outdated
lxc exec member1 -- lxc move v1 --target member2 | ||
|
||
# The VM is slow. So the agent isn't immediately available after the live migration. | ||
lxc exec member1 -- sh -c 'while [ "$(lxc info v1 | awk '"'"'{if ($1 == "Processes:") print $2}'"'"')" -le 1 ]; do echo "Instance v1 still not booted, waiting 60s..." && sleep 60; done' |
Check warning
Code scanning / shellcheck
SC2016 Warning test
tests/vm-migration
Outdated
lxc exec member1 -- lxc move v1 --target member2 | ||
|
||
# The VM is slow. So the agent isn't immediately available after the live migration. | ||
lxc exec member1 -- sh -c 'while [ "$(lxc info v1 | awk '"'"'{if ($1 == "Processes:") print $2}'"'"')" -le 1 ]; do echo "Instance v1 still not booted, waiting 60s..." && sleep 60; done' |
Check warning
Code scanning / shellcheck
SC2016 Warning test
3139e54
to
28ad7f1
Compare
13191d2
to
5e7488f
Compare
@markylaing it seems the Also, on microcloud, we use the extra/ephemeral disk available on GHA VMs, see https://github.com/canonical/microcloud/blob/main/.github/workflows/tests.yml#L202-L209 |
60e2d21
to
a4b95cf
Compare
02f31e0
to
98a9a13
Compare
054e18d
to
e416b1d
Compare
e416b1d
to
93e3402
Compare
@markylaing @simondeziel any progress on using container cluster members for ceph backed VMs? |
@tomponline it's in my queue for next pulse, but no progress as of yet. |
Changes are as follows:
lxc info
.