-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVH VMs does not start after resume #6859
Comments
guest-sys-firewall.log
I censored the complete journal a bit: sed -i -E '/(kernel:|dom0) (audit|CROND|sudo|runuser|\[drm:drm_(mode_object_(put|get)|ioctl))/d' journal-suspend-fail |
I did a fresh run without the XEN kernel and booted Linux 5.15, it was flawless even with s2Idle sleep. No artifacts. But of course I can't test this case. Booting 5.15 with runs into some weird deadlock issue. I'm currently starting to bisect to see if I can find if there's a sweet spot of no deadlock issue and working s2idle sleep/resume. |
Linux 5.14 works with S3 sleep, if running without XEN, with XEN there's massive artifacts and VMs don't start. Maybe this is due to a locking issue inside amdgpu, which somehow happen due to unstable tsc clocks in xen. |
I found that sweet-spot :-) Upgrading the laptop BIOS to v0.1.34, removing tsc=unstable in xen cmdline. |
This commit seems to totally deadlock the amdgpu drivers when running under Xen, works like a clock if booted without Xen.
git bisect log
|
If I remember right the TSC Problem is solved via BIOS Updates on the most systems. So that commit is the core problem? If I can contribute somehow let me know. BTW... what artifacts do you mean? I have none with my P14s AMD G2 (Ryzen 5850U) with kernel 5.10.61 as 5.13.x wont boot. |
Yeah, it should be fixed.
yup, I reverted and compiled, 5.15 now boots. No s2idle though, so it seems we still have issues inside Xen as it works booting without Xen.
Do you get artifacts with kernel-latest? (after suspend/resume)
Graphical artifacts, if I type text the text desintegrates into color pixels. This works flawless without Xen.
Interesting, maybe I should do a bisect between 5.10 and 5.11, because I had these problems in 5.11 if I recall correctly, or, when ever my first release with working S3 was. Does suspend/resume in S3 work for you? |
s2idle is yet not possible with xen and as such no suspend for intel 11th gen. This was explained in #6411
After suspend (with 5.10.61) I have the described issues (see #6066 (comment)): With VMs ran at suspend i get frozen unlock screen but movable mouse pointer, then screen turns black 2 or 3 times. Sometimes login screen appears (same story). Finally complete hang up (frozen mouse or black screen). But no "text desintegrates into color pixels". Maybe because the hole graphics expect mouse is frozen? Btw, I had always (suspend not possible as mentioned above) artifacts with a Intel 11th Gen i7 with every kernel >5.8.
I forgot everything about kernel compilation (they just worked for me till now ^^) so no, could not try this solution yet and without I keep getting the above mentioned issues. Is there somewhere a documentation (for dummies ^^) how to do get the qubes kernel modified and compiled? Did you make any further changes expected reverting that part? Then I can tell if there are artifacts or not ;-) |
Thanks! I'll stop looking then. For everyone looking at this anyhow, s2idle now works on kernel 5.15 on AMD Ryzen 4750U at least, WITHOUT Xen :-) It did not work in 5.14.
Then there is no change between 5.8 and 5.15 sadly. I still have artifacts after S3 suspend, laptop not resuming after a while. This specific issue is solved though. VMs can be started afterwards.
No need, just do
I can show you, no problems :-) But the guide from Qubes OS is quite ok. I will update my repos that I have in http://github.com/isodude with the latest patches. |
For Intel 11th, right.
... after a while? After how many turns and with what behavier?
Great!
Sure, but without that patch it wont boot? ;-)
Many thanks! I give it a try and eventually come back to you! |
I don't know pre 5.10 TBH. Yes, this is only after resume.
I will check more specifically, I just know that suspend/resume worked once, but not when it slept through the night.
The problem with boot arises after 5.13, the patch refenced above between 5.14 and 5.15.
No problem, took a while for me :) |
Current status:
4.14 ~just before the mentioned commit above: boots fine, I fixed an issue with xen-libs being wrong versions in chroot-dom0-fc32, seems to have fixed my issues with suspending with VMs running. Still artifacts when writing text. |
I just realized that it's important to change the kernel version of the Qubes VMs. It's probably working with a 5.12 kernel on VMs and 5.14 on dom0, but not 5.14 on both. I'm trying to bisect the xen-balloon issue now. |
Weird, was sure 5.12 didnt boot (same way as 5.13) but tried again now and it boots, but still no resume, no artefacts so, at least in the short period before it hangs or display turns black ;-) Sounds promising! |
I think the artifacts are due to some steal issue, because whenever there's load on the system ( e.g. start of a vm ), if I keep hitting one letter in a row, there will be artifacts. I'm pretty satisfied right now, got a 5.14 kernel going with a good patch that actually suspends with VMs running. I will bisect up to 5.15 though and see which commit breaks xen-balloon though. It seems more stable the higher I go, so.. promising :-) |
kernel-latest (5.12.14-1) with @marmarek's patches for Xen and latest BIOS updates makes it possible to suspend/resume. |
To build vmm-xen you add builder.conf to qubes-builder directory and run
Edit qubes-builder/qubes-src/vmm-xen/xen.spec.in and add the patches below as Patch1101 and Patch1102. run When done enter dom0 and copy the files from the VM (e.g. qvm-run --pass-io 'tar -c - -C qubes-builder/qubes-src/vmm-xen/pkgs . ' | tar -x -C ~/compiled.
The packages should be python3-xen xen xen-libs xen-hypervisor xen-runtime xen-licenses. This way you don't need to update all packages around xen. I guess that marmarek will update the Xen packages with the patches soon. builder.conf
patch 2
|
Hi, |
So 5.12.14 survives one hour at least in suspend, but not several hours. It is non responsive with black screen at resume, power led is indicating that it should be up and running. |
@johnnyboy-3 you are referencing this patch right? Make sure that:
|
Interesting thread: https://gitlab.freedesktop.org/drm/amd/-/issues/892 |
@johnnyboy-3 also make sure that you are running the correct kernel version on your VMs. |
Doesnt work for me ;-( Build and reinstalled vmm-xen in dom0 (no issues as far as i could tell) and tried kernel 5.12 and 5.13 for VMs, no difference. dom0 kernel is the 5.12.14 as it is, no patch/change, thats how understood you. But 5.10 doesnt change anything and didnt tried to patch 5.13 yet. BIOS is uptodate (and I assume TSC is fixed), xen-commandline (the one in grub, right?) has smt=off (=on works neither) and no clocksource or tsc set. Any suggestions what else I could try? Weird, I thought while suspended nothing CAN change/happen but obviously after one hour something changes?! |
I also have been running a pretty new linux-firmware, maybe that does a difference aswell? Basically I
What version of linux-firmware are you running? @bigdx What is the actual error you get now? |
The "default" based on the beta-kernel-latest ISO from 9/11/21 and all updates till then. Cant check version or try something before monday.
Still the same, frozen unlock screen with movable mouse pointer etc... |
Great! Then it's not related to that somehow.
And nothing in the logs? And to be exakt, you see the dialog to enter password? Exactly what packages did you update in dom0 from vmm-xen? What BIOS version are you running? It should be a note in the changelog that TSC is fixed, if 5000 CPU even have these problems to begin with. I tried to boot without xen to ensure that the kernel itself is handling suspend/resume correctly. It's possible if you delete the multiboot line, change multiboot2 with linux and module --nounzip with initrd. |
To be honest, havent checked, had not much time and hoped it works out of the box ;-) Which log would be of interest for you?
All, I followed you example and installed all that have build.
The changelog doesnt mention that, but the initial/first BIOS was released after TSC has been fixed, so I assume that fix is included. Otherwise I would have corresponding issues as I dont have set any xen option which solved them.
Will try that on monday :-) |
Hi again
Yes. I checked all three except bios TSC fix. I run a desktop PC with Ryzen 2400G if that matters.
I also turned off auto-boot of all AppVMs and didn't run any while testing.
I tried a first run with
Tried that too, works without problems. The following is a journalctl snippet from the second run (i think, time got a bit messed up due to multiple systems):
(sorry, still quite long, can i use spoilers or something?) |
This is not related to this issue anymore. This issue was fixed with the updated Xen 4.12 patches, which now are included in Xen 4.13. I will open a new issue concerning resume crashing on ryzen. |
Let's see how things fare when marmarek pushed the new xen 4.13 branch, I will check if that resolves things. |
@johnnyboy-3 @bigdx I started a much more specific case here instead: #6923 |
Qubes OS release
Qubes 4.1 with kernel-latest 5.12.14-1
Running on Lenovo Thinkpad P14s 4750U.
Brief summary
Not able to resume properly, tried a lot of kernel/xen/linux-firmware versions but nothing seems to aid the situation.
Resume works but the system has problems afterwards.
Steps to reproduce
Expected behavior
PVH VM should start
Actual behavior
When I suspend/resume PVH VMs stop at "installing Xen timer for CPU 0", next line should be "smpboot: CPU0: AMD Ryzen 7 4750U with Radeon Graphics (family 0x17, model: 0x60, stepping: 0x1)".
There is also artifacts on the screen when scrolling text and the system is slower than usual.
Related
https://gitlab.freedesktop.org/drm/amd/-/issues/1715
This seems to be resolved in 5.14. I'm investigating a bit more to be sure about the problem.
The text was updated successfully, but these errors were encountered: