Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVH VMs does not start after resume #6859

Closed
isodude opened this issue Aug 26, 2021 · 31 comments
Closed

PVH VMs does not start after resume #6859

isodude opened this issue Aug 26, 2021 · 31 comments
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core C: kernel C: Xen hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.

Comments

@isodude
Copy link

isodude commented Aug 26, 2021

Qubes OS release

Qubes 4.1 with kernel-latest 5.12.14-1
Running on Lenovo Thinkpad P14s 4750U.

Brief summary

Not able to resume properly, tried a lot of kernel/xen/linux-firmware versions but nothing seems to aid the situation.
Resume works but the system has problems afterwards.

Steps to reproduce

  • Booting with no VMs started
  • enter suspend
  • resume
  • start PVH VM

Expected behavior

PVH VM should start

Actual behavior

When I suspend/resume PVH VMs stop at "installing Xen timer for CPU 0", next line should be "smpboot: CPU0: AMD Ryzen 7 4750U with Radeon Graphics (family 0x17, model: 0x60, stepping: 0x1)".

There is also artifacts on the screen when scrolling text and the system is slower than usual.

Related

https://gitlab.freedesktop.org/drm/amd/-/issues/1715

This seems to be resolved in 5.14. I'm investigating a bit more to be sure about the problem.

@isodude isodude added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Aug 26, 2021
@andrewdavidwong andrewdavidwong added C: core hardware support needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Aug 26, 2021
@andrewdavidwong andrewdavidwong added this to the Release 4.1 updates milestone Aug 26, 2021
@isodude
Copy link
Author

isodude commented Aug 31, 2021

guest-sys-firewall.log
Interesting parts from journalctl -b

Aug 20 08:07:45 dom0 kernel: ------------[ cut here ]------------
Aug 20 08:07:45 dom0 kernel: WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:456 switch_mm_irqs_off+0x375/0x3a0
Aug 20 08:07:45 dom0 kernel: Modules linked in: nf_tables nfnetlink vfat fat intel_rapl_msr wmi_bmof intel_rapl_common pcspkr k10temp uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common iwlwifi videodev sp5100_tco joydev mc i2c_piix4 ucsi_acpi thinkpad_acpi ipmi_devintf typec_ucsi platform_profile cfg80211 ledtrig_audio r8169 ipmi_msghandler roles snd typec soundcore rfkill wmi video i2c_scmi fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted hid_multitouch amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel gpu_sched i2c_algo_bit drm_ttm_helper ttm drm_kms_helper cec sdhci_pci xhci_pci ghash_clmulni_intel cqhci xhci_pci_renesas drm sdhci nvme xhci_hcd serio_raw ccp mmc_core ehci_pci ehci_hcd nvme_core pinctrl_amd xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
Aug 20 08:07:45 dom0 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-4.fc32.qubes.x86_64 #1
Aug 20 08:07:45 dom0 kernel: Hardware name: LENOVO 20Y1S02400/20Y1S02400, BIOS R1BET61W(1.30 ) 12/21/2020
Aug 20 08:07:45 dom0 kernel: RIP: e030:switch_mm_irqs_off+0x375/0x3a0
Aug 20 08:07:45 dom0 kernel: Code: 00 00 65 48 89 05 63 b7 fa 7e e9 7e fd ff ff b9 49 00 00 00 b8 01 00 00 00 31 d2 0f 30 e9 5e fd ff ff 41 89 f7 e9 a1 fe ff ff <0f> 0b e8 54 fa ff ff e9 fe fc ff ff 0f 0b e9 49 fe ff ff 0f 0b e9
Aug 20 08:07:45 dom0 kernel: RSP: e02b:ffffc900400efeb8 EFLAGS: 00010006
Aug 20 08:07:45 dom0 kernel: RAX: 00000001089d2000 RBX: ffff8881002f4f00 RCX: 0000000000000040
Aug 20 08:07:45 dom0 kernel: RDX: ffff8881002f4f00 RSI: 0000000000000000 RDI: ffff8881889d2000
Aug 20 08:07:45 dom0 kernel: RBP: ffffffff829d6fc0 R08: 0000000000000000 R09: 0000000000000010
Aug 20 08:07:45 dom0 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff888100267b40
Aug 20 08:07:45 dom0 kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
Aug 20 08:07:45 dom0 kernel: FS:  0000000000000000(0000) GS:ffff888140640000(0000) knlGS:0000000000000000
Aug 20 08:07:45 dom0 kernel: CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
Aug 20 08:07:45 dom0 kernel: CR2: 00005789f89738e0 CR3: 0000000002810000 CR4: 0000000000050660
Aug 20 08:07:45 dom0 kernel: Call Trace:
Aug 20 08:07:45 dom0 kernel:  switch_mm+0x1c/0x30
Aug 20 08:07:45 dom0 kernel:  play_dead_common+0xa/0x20
Aug 20 08:07:45 dom0 kernel:  xen_pv_play_dead+0xa/0x60
Aug 20 08:07:45 dom0 kernel:  do_idle+0xc7/0xe0
Aug 20 08:07:45 dom0 kernel:  cpu_startup_entry+0x19/0x20
Aug 20 08:07:45 dom0 kernel:  asm_cpu_bringup_and_idle+0x5/0x1000
Aug 20 08:07:45 dom0 kernel: ---[ end trace 55d0d8364636c8ab ]---

I censored the complete journal a bit:

sed -i -E '/(kernel:|dom0) (audit|CROND|sudo|runuser|\[drm:drm_(mode_object_(put|get)|ioctl))/d' journal-suspend-fail

journal-suspend-fail.log
hypervisor.log

@isodude
Copy link
Author

isodude commented Sep 18, 2021

I did a fresh run without the XEN kernel and booted Linux 5.15, it was flawless even with s2Idle sleep. No artifacts. But of course I can't test this case. Booting 5.15 with runs into some weird deadlock issue.

I'm currently starting to bisect to see if I can find if there's a sweet spot of no deadlock issue and working s2idle sleep/resume.

@isodude
Copy link
Author

isodude commented Sep 18, 2021

Linux 5.14 works with S3 sleep, if running without XEN, with XEN there's massive artifacts and VMs don't start.

Maybe this is due to a locking issue inside amdgpu, which somehow happen due to unstable tsc clocks in xen.

@isodude
Copy link
Author

isodude commented Sep 19, 2021

I found that sweet-spot :-)
There's still artificts, but VMs are starting even though I have suspended the machine once.

Upgrading the laptop BIOS to v0.1.34, removing tsc=unstable in xen cmdline.
Selecting Linux mode for suspend (S3).
Compiling linux 5.14 ( adding patch from here #6881 ).

@isodude
Copy link
Author

isodude commented Sep 20, 2021

This commit seems to totally deadlock the amdgpu drivers when running under Xen, works like a clock if booted without Xen.

commit 446a98b19fd6da97a1fb148abb1766ad89c9b767 (HEAD)
Author: Thomas Gleixner <[email protected]>
Date:   Thu Jul 29 23:51:58 2021 +0200

    PCI/MSI: Use new mask/unmask functions
    
    Switch the PCI/MSI core to use the new mask/unmask functions. No functional
    change.
    
    Signed-off-by: Thomas Gleixner <[email protected]>
    Tested-by: Marc Zyngier <[email protected]>
    Reviewed-by: Marc Zyngier <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 26dd91f33374..ce841f327ff6 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -135,74 +135,6 @@ void __weak arch_restore_msi_irqs(struct pci_dev *dev)
  * reliably as devices without an INTx disable bit will then generate a
  * level IRQ which will never be cleared.
  */
-static void __pci_msi_desc_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
-{
-       raw_spinlock_t *lock = &desc->dev->msi_lock;
-       unsigned long flags;
-
-       if (pci_msi_ignore_mask || !desc->msi_attrib.maskbit)
-               return;
-
-       raw_spin_lock_irqsave(lock, flags);
-       desc->msi_mask &= ~mask;
-       desc->msi_mask |= flag;
-       pci_write_config_dword(msi_desc_to_pci_dev(desc), desc->mask_pos,
-                              desc->msi_mask);
-       raw_spin_unlock_irqrestore(lock, flags);
-}
-
-static void msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
-{
-       __pci_msi_desc_mask_irq(desc, mask, flag);
-}
-
-static void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
-{
-       return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
-}
-
-/*
- * This internal function does not flush PCI writes to the device.
- * All users must ensure that they read from the device before either
- * assuming that the device state is up to date, or returning out of this
- * file.  This saves a few milliseconds when initialising devices with lots
- * of MSI-X interrupts.
- */
-static u32 __pci_msix_desc_mask_irq(struct msi_desc *desc, u32 flag)
-{
-       void __iomem *desc_addr = pci_msix_desc_addr(desc);
-       u32 ctrl = desc->msix_ctrl;
-
-       if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-               return 0;
-
-       ctrl &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
-       if (ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT)
-               ctrl |= PCI_MSIX_ENTRY_CTRL_MASKBIT;
-
-       writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
-
-       return ctrl;
-}
-
-static void msix_mask_irq(struct msi_desc *desc, u32 flag)
-{
-       desc->msix_ctrl = __pci_msix_desc_mask_irq(desc, flag);
-}
-
-static void msi_set_mask_bit(struct irq_data *data, u32 flag)
-{
-       struct msi_desc *desc = irq_data_get_msi_desc(data);
-
-       if (desc->msi_attrib.is_msix) {
-               msix_mask_irq(desc, flag);
-               readl(desc->mask_base);         /* Flush write to device */
-       } else {
-               unsigned offset = data->irq - desc->irq;
-               msi_mask_irq(desc, 1 << offset, flag << offset);
-       }
-}
-
 static inline __attribute_const__ u32 msi_multi_mask(struct msi_desc *desc)
 {
        /* Don't shift by >= width of type */
@@ -234,6 +166,11 @@ static inline void pci_msi_unmask(struct msi_desc *desc, u32 mask)
        pci_msi_update_mask(desc, mask, 0);
 }
 
+static inline void __iomem *pci_msix_desc_addr(struct msi_desc *desc)
+{
+       return desc->mask_base + desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+}
+
 /*
  * This internal function does not flush PCI writes to the device.  All
  * users must ensure that they read from the device before either assuming
@@ -289,7 +226,9 @@ static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
  */
 void pci_msi_mask_irq(struct irq_data *data)
 {
-       msi_set_mask_bit(data, 1);
+       struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+       __pci_msi_mask_desc(desc, BIT(data->irq - desc->irq));
 }
 EXPORT_SYMBOL_GPL(pci_msi_mask_irq);
 
@@ -299,7 +238,9 @@ EXPORT_SYMBOL_GPL(pci_msi_mask_irq);
  */
 void pci_msi_unmask_irq(struct irq_data *data)
 {
-       msi_set_mask_bit(data, 0);
+       struct msi_desc *desc = irq_data_get_msi_desc(data);
+
+       __pci_msi_unmask_desc(desc, BIT(data->irq - desc->irq));
 }
 EXPORT_SYMBOL_GPL(pci_msi_unmask_irq);
 
@@ -352,7 +293,8 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
                /* Don't touch the hardware now */
        } else if (entry->msi_attrib.is_msix) {
                void __iomem *base = pci_msix_desc_addr(entry);
-               bool unmasked = !(entry->msix_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
+               u32 ctrl = entry->msix_ctrl;
+               bool unmasked = !(ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT);
 
                if (entry->msi_attrib.is_virtual)
                        goto skip;
@@ -366,14 +308,14 @@ void __pci_write_msi_msg(struct msi_desc *entry, struct msi_msg *msg)
                 * undefined."
                 */
                if (unmasked)
-                       __pci_msix_desc_mask_irq(entry, PCI_MSIX_ENTRY_CTRL_MASKBIT);
+                       pci_msix_write_vector_ctrl(entry, ctrl | PCI_MSIX_ENTRY_CTRL_MASKBIT);
 
                writel(msg->address_lo, base + PCI_MSIX_ENTRY_LOWER_ADDR);
                writel(msg->address_hi, base + PCI_MSIX_ENTRY_UPPER_ADDR);
                writel(msg->data, base + PCI_MSIX_ENTRY_DATA);
 
                if (unmasked)
-                       __pci_msix_desc_mask_irq(entry, 0);
+                       pci_msix_write_vector_ctrl(entry, ctrl);
 
                /* Ensure that the writes are visible in the device */
                readl(base + PCI_MSIX_ENTRY_DATA);
@@ -491,7 +433,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
        arch_restore_msi_irqs(dev);
 
        pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control);
-       msi_mask_irq(entry, msi_multi_mask(entry), entry->msi_mask);
+       pci_msi_update_mask(entry, 0, 0);
        control &= ~PCI_MSI_FLAGS_QSIZE;
        control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE;
        pci_write_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, control);
@@ -522,7 +464,7 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
 
        arch_restore_msi_irqs(dev);
        for_each_pci_msi_entry(entry, dev)
-               msix_mask_irq(entry, entry->msix_ctrl);
+               pci_msix_write_vector_ctrl(entry, entry->msix_ctrl);
 
        pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);
 }
@@ -704,7 +646,6 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
 {
        struct msi_desc *entry;
        int ret;
-       unsigned mask;
 
        pci_msi_set_enable(dev, 0);     /* Disable MSI during set up */
 
@@ -713,8 +654,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
                return -ENOMEM;
 
        /* All MSIs are unmasked by default; mask them all */
-       mask = msi_multi_mask(entry);
-       msi_mask_irq(entry, mask, mask);
+       pci_msi_mask(entry, msi_multi_mask(entry));
 
        list_add_tail(&entry->list, dev_to_msi_list(&dev->dev));
 
@@ -741,7 +681,7 @@ static int msi_capability_init(struct pci_dev *dev, int nvec,
        return 0;
 
 err:
-       msi_mask_irq(entry, mask, 0);
+       pci_msi_unmask(entry, msi_multi_mask(entry));
        free_msi_irqs(dev);
        return ret;
 }
@@ -1021,7 +961,7 @@ static void pci_msi_shutdown(struct pci_dev *dev)
        dev->msi_enabled = 0;
 
        /* Return the device with MSI unmasked as initial states */
-       msi_mask_irq(desc, msi_multi_mask(desc), 0);
+       pci_msi_unmask(desc, msi_multi_mask(desc));
 
        /* Restore dev->irq to its default pin-assertion IRQ */
        dev->irq = desc->msi_attrib.default_irq;
@@ -1107,7 +1047,7 @@ static void pci_msix_shutdown(struct pci_dev *dev)
 
        /* Return the device with MSI-X masked as initial states */
        for_each_pci_msi_entry(entry, dev)
-               __pci_msix_desc_mask_irq(entry, 1);
+               pci_msix_mask(entry);
 
        pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_ENABLE, 0);
        pci_intx_for_msi(dev, 1);

git bisect log

git bisect start
# new: [4357f03d6611753936e4d52fc251b54a6afb1b54] Merge tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect new 4357f03d6611753936e4d52fc251b54a6afb1b54
# old: [7d2a07b769330c34b4deabeed939325c77a7ec2f] Linux 5.14
git bisect old 7d2a07b769330c34b4deabeed939325c77a7ec2f
# new: [1b4f3dfb4792f03b139edf10124fcbeb44e608e6] Merge tag 'usb-serial-5.15-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next
git bisect new 1b4f3dfb4792f03b139edf10124fcbeb44e608e6
# old: [29ce8f9701072fc221d9c38ad952de1a9578f95c] Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
git bisect old 29ce8f9701072fc221d9c38ad952de1a9578f95c
# new: [e7c1bbcf0c315c56cd970642214aa1df3d8cf61d] Merge tag 'hwmon-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect new e7c1bbcf0c315c56cd970642214aa1df3d8cf61d
# new: [679369114e55f422dc593d0628cfde1d04ae59b3] Merge tag 'for-5.15/block-2021-08-30' of git://git.kernel.dk/linux-block
git bisect new 679369114e55f422dc593d0628cfde1d04ae59b3
# old: [c7a5238ef68b98130fe36716bb3fa44502f56001] Merge tag 's390-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect old c7a5238ef68b98130fe36716bb3fa44502f56001
# old: [e5e726f7bb9f711102edea7e5bd511835640e3b4] Merge tag 'locking-core-2021-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect old e5e726f7bb9f711102edea7e5bd511835640e3b4
# old: [158ee7b65653d9f841823c249014c2d0dfdeeb8f] block: mark blkdev_fsync static
git bisect old 158ee7b65653d9f841823c249014c2d0dfdeeb8f
# new: [47fb0cfdb7a71a8a0ff8fe1d117363dc81f6ca77] Merge tag 'irqchip-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
git bisect new 47fb0cfdb7a71a8a0ff8fe1d117363dc81f6ca77
# old: [4513fb87e1402ad815912ec7f027eb17149f44ee] Merge branch irq/misc-5.15 into irq/irqchip-next
git bisect old 4513fb87e1402ad815912ec7f027eb17149f44ee
# new: [88ffe2d0a55a165e55cedad1693f239d47e3e17e] genirq/cpuhotplug: Demote debug printk to KERN_DEBUG
git bisect new 88ffe2d0a55a165e55cedad1693f239d47e3e17e
# old: [fcacdfbef5a1633211ebfac1b669a7739f5b553e] PCI/MSI: Provide a new set of mask and unmask functions
git bisect old fcacdfbef5a1633211ebfac1b669a7739f5b553e
# new: [91cc470e797828d779cd4c1efbe8519bcb358bae] genirq: Change force_irqthreads to a static key
git bisect new 91cc470e797828d779cd4c1efbe8519bcb358bae
# new: [428e211641ed808b55cdc7d880a0ee349eff354b] genirq/affinity: Replace deprecated CPU-hotplug functions.
git bisect new 428e211641ed808b55cdc7d880a0ee349eff354b

@bigdx
Copy link

bigdx commented Sep 20, 2021

If I remember right the TSC Problem is solved via BIOS Updates on the most systems.

So that commit is the core problem?

If I can contribute somehow let me know.

BTW... what artifacts do you mean? I have none with my P14s AMD G2 (Ryzen 5850U) with kernel 5.10.61 as 5.13.x wont boot.

@isodude
Copy link
Author

isodude commented Sep 20, 2021

If I remember right the TSC Problem is solved via BIOS Updates on the most systems.

Yeah, it should be fixed.

So that commit is the core problem?

yup, I reverted and compiled, 5.15 now boots. No s2idle though, so it seems we still have issues inside Xen as it works booting without Xen.

If I can contribute somehow let me know

Do you get artifacts with kernel-latest? (after suspend/resume)

BTW... what artifacts do you mean?

Graphical artifacts, if I type text the text desintegrates into color pixels. This works flawless without Xen.

I have none with my P14s AMD G2 (Ryzen 5850U) with kernel 5.10.61 as 5.13.x wont boot.

Interesting, maybe I should do a bisect between 5.10 and 5.11, because I had these problems in 5.11 if I recall correctly, or, when ever my first release with working S3 was. Does suspend/resume in S3 work for you?

@bigdx
Copy link

bigdx commented Sep 20, 2021

yup, I reverted and compiled, 5.15 now boots. No s2idle though, so it seems we still have issues inside Xen as it works booting without Xen.

s2idle is yet not possible with xen and as such no suspend for intel 11th gen. This was explained in #6411
With xen only S3 is possible and correspondingly only with Ryzen and older Intels (<=10th)

Do you get artifacts with kernel-latest? (after suspend/resume)

Graphical artifacts, if I type text the text desintegrates into color pixels. This works flawless without Xen.

After suspend (with 5.10.61) I have the described issues (see #6066 (comment)): With VMs ran at suspend i get frozen unlock screen but movable mouse pointer, then screen turns black 2 or 3 times. Sometimes login screen appears (same story). Finally complete hang up (frozen mouse or black screen). But no "text desintegrates into color pixels". Maybe because the hole graphics expect mouse is frozen?
I have to test with no VMs suspended again, not sure if there where artifacts, but it was unusable slow/laggy anyway.

Btw, I had always (suspend not possible as mentioned above) artifacts with a Intel 11th Gen i7 with every kernel >5.8.

Interesting, maybe I should do a bisect between 5.10 and 5.11, because I had these problems in 5.11 if I recall correctly, or, when ever my first release with working S3 was. Does suspend/resume in S3 work for you?

I forgot everything about kernel compilation (they just worked for me till now ^^) so no, could not try this solution yet and without I keep getting the above mentioned issues.

Is there somewhere a documentation (for dummies ^^) how to do get the qubes kernel modified and compiled? Did you make any further changes expected reverting that part? Then I can tell if there are artifacts or not ;-)

@isodude
Copy link
Author

isodude commented Sep 20, 2021

s2idle is yet not possible with xen and as such no suspend for intel 11th gen. This was explained in #6411
With xen only S3 is possible and correspondingly only with Ryzen and older Intels (<=10th)

Thanks! I'll stop looking then. For everyone looking at this anyhow, s2idle now works on kernel 5.15 on AMD Ryzen 4750U at least, WITHOUT Xen :-) It did not work in 5.14.

After suspend (with 5.10.61) I have the described issues (see #6066 (comment)): With VMs ran at suspend i get frozen unlock screen but movable mouse pointer, then screen turns black 2 or 3 times. Sometimes login screen appears (same story). Finally complete hang up (frozen mouse or black screen). But no "text desintegrates into color pixels". Maybe because the hole graphics expect mouse is frozen?
I have to test with no VMs suspended again, not sure if there where artifacts, but it was unusable slow/laggy anyway.
Btw, I had always (suspend not possible as mentioned above) artifacts with a Intel 11th Gen i7 with every kernel >5.8.

Then there is no change between 5.8 and 5.15 sadly. I still have artifacts after S3 suspend, laptop not resuming after a while.

This specific issue is solved though. VMs can be started afterwards.

I forgot everything about kernel compilation (they just worked for me till now ^^) so no, could not try this solution yet and without I keep getting the above mentioned issues.

No need, just do sudo qubes-dom0-update kernel-latest it will install kernel 5.12.14.

Is there somewhere a documentation (for dummies ^^) how to do get the qubes kernel modified and compiled? Did you make any further changes expected reverting that part? Then I can tell if there are artifacts or not ;-)

I can show you, no problems :-) But the guide from Qubes OS is quite ok. I will update my repos that I have in http://github.com/isodude with the latest patches.

@bigdx
Copy link

bigdx commented Sep 20, 2021

Then there is no change between 5.8 and 5.15 sadly.

For Intel 11th, right.
But as you said with ryzen it starts with >5.10 and only AFTER resume (vs intel right from the start). So i guess the "graphics driver" doesn't resumes correctly on ryzen?

I still have artifacts after S3 suspend, laptop not resuming after a while.

... after a while? After how many turns and with what behavier?

This specific issue is solved though. VMs can be started afterwards.

Great!

No need, just do sudo qubes-dom0-update kernel-latest it will install kernel 5.12.14.

Sure, but without that patch it wont boot? ;-)

I can show you, no problems :-) But the guide from Qubes OS is quite ok. I will update my repos that I have in http://github.com/isodude with the latest patches.

Many thanks! I give it a try and eventually come back to you!

@isodude
Copy link
Author

isodude commented Sep 20, 2021

For Intel 11th, right.
But as you said with ryzen it starts with >5.10 and only AFTER resume (vs intel right from the start). So i guess the "graphics driver" doesn't resumes correctly on ryzen?

I don't know pre 5.10 TBH. Yes, this is only after resume.

... after a while? After how many turns and with what behavier?

I will check more specifically, I just know that suspend/resume worked once, but not when it slept through the night.

Sure, but without that patch it wont boot? ;-)

The problem with boot arises after 5.13, the patch refenced above between 5.14 and 5.15.

Many thanks! I give it a try and eventually come back to you!

No problem, took a while for me :)

@isodude
Copy link
Author

isodude commented Sep 21, 2021

Current status:
5.15-rc2 boots without revert of the mentioned commit, no S3 suspend due to

Sep 21 07:15:37 dom0 kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):
Sep 21 07:15:37 dom0 kernel: task:xen-balloon     state:S stack:    0 pid:  136 ppid:     2 flags:0x00004000
Sep 21 07:15:37 dom0 kernel: Call Trace:
Sep 21 07:15:37 dom0 kernel:  __schedule+0x25e/0x6a0
Sep 21 07:15:37 dom0 kernel:  schedule+0x44/0xa0
Sep 21 07:15:37 dom0 kernel:  schedule_timeout+0x95/0x140
Sep 21 07:15:37 dom0 kernel:  ? trigger_dyntick_cpu+0x40/0x40
Sep 21 07:15:37 dom0 kernel:  balloon_thread+0x2af/0x310
Sep 21 07:15:37 dom0 kernel:  ? finish_wait+0x80/0x80
Sep 21 07:15:37 dom0 kernel:  ? alloc_xenballooned_pages+0xf0/0xf0
Sep 21 07:15:37 dom0 kernel:  kthread+0x10f/0x130
Sep 21 07:15:37 dom0 kernel:  ? set_kthread_struct+0x40/0x40
Sep 21 07:15:37 dom0 kernel:  ret_from_fork+0x22/0x30

4.14 ~just before the mentioned commit above: boots fine, I fixed an issue with xen-libs being wrong versions in chroot-dom0-fc32, seems to have fixed my issues with suspending with VMs running. Still artifacts when writing text.

@isodude
Copy link
Author

isodude commented Sep 21, 2021

I just realized that it's important to change the kernel version of the Qubes VMs. It's probably working with a 5.12 kernel on VMs and 5.14 on dom0, but not 5.14 on both.

I'm trying to bisect the xen-balloon issue now.

@bigdx
Copy link

bigdx commented Sep 21, 2021

I don't know pre 5.10 TBH. Yes, this is only after resume.

I will check more specifically, I just know that suspend/resume worked once, but not when it slept through the night.

The problem with boot arises after 5.13, the patch refenced above between 5.14 and 5.15.

Weird, was sure 5.12 didnt boot (same way as 5.13) but tried again now and it boots, but still no resume, no artefacts so, at least in the short period before it hangs or display turns black ;-)

Sounds promising!

@isodude
Copy link
Author

isodude commented Sep 21, 2021

I think the artifacts are due to some steal issue, because whenever there's load on the system ( e.g. start of a vm ), if I keep hitting one letter in a row, there will be artifacts.

I'm pretty satisfied right now, got a 5.14 kernel going with a good patch that actually suspends with VMs running. I will bisect up to 5.15 though and see which commit breaks xen-balloon though. It seems more stable the higher I go, so.. promising :-)

@isodude
Copy link
Author

isodude commented Sep 21, 2021

kernel-latest (5.12.14-1) with @marmarek's patches for Xen and latest BIOS updates makes it possible to suspend/resume.

@isodude
Copy link
Author

isodude commented Sep 21, 2021

To build vmm-xen you add builder.conf to qubes-builder directory and run

make install-deps remount get-sources

Edit qubes-builder/qubes-src/vmm-xen/xen.spec.in and add the patches below as Patch1101 and Patch1102.

run make vmm-xen in qubes-builder.

When done enter dom0 and copy the files from the VM (e.g. qvm-run --pass-io 'tar -c - -C qubes-builder/qubes-src/vmm-xen/pkgs . ' | tar -x -C ~/compiled.
Reinstalll the xen packages

sudo dnf reinstall ~/compiled/....

The packages should be python3-xen xen xen-libs xen-hypervisor xen-runtime xen-licenses. This way you don't need to update all packages around xen. I guess that marmarek will update the Xen packages with the patches soon.

builder.conf

VERBOSE ?= 2

BACKEND_VMM ?= xen

GIT_BASEURL ?= https://github.com
GIT_PREFIX ?= QubesOS/qubes-

RELEASE ?= 4.1

DIST_DOM0 ?= fc32
DISTS_VM += fc33

COMPONENTS = \
builder \
builder-rpm \
vmm-xen \

BRANCH_vmm_xen = xen-4.14
BUILDER_PLUGINS += builder-deb
BUILDER_PLUGINS += builder-rpm

USE_QUBES_REPO_VERSION = $(RELEASE)

INSTALLER_KICKSTART=/home/user/qubes-src/installer-qubes-os/conf/iso-full-online.ks
From: Marek Marczykowski-Górecki @ 2021-08-18 11:30 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, Andrew Cooper,
	Roger Pau Monné,
	Wei Liu

set_xcr0() and set_msr_xss() use cached value to avoid setting the
register to the same value over and over. But suspend/resume implicitly
reset the registers and since percpu areas are not deallocated on
suspend anymore, the cache gets stale.
Reset the cache on resume, to ensure the next write will really hit the
hardware. Choose value 0, as it will never be a legitimate write to
those registers - and so, will force write (and cache update).

Note the cache is used io get_xcr0() and get_msr_xss() too, but:
- set_xcr0() is called few lines below in xstate_init(), so it will
  update the cache with appropriate value
- get_msr_xss() is not used anywhere - and thus not before any
  set_msr_xss() that will fill the cache

Fixes: aca2a985a55a "xen: don't free percpu areas during suspend"
Signed-off-by: Marek Marczykowski-Górecki <[email protected]>
---
 xen/arch/x86/xstate.c | 7 +++++++
 1 file changed, 7 insertions(+)

--- a/xen/arch/x86/xstate.c.orig	2021-04-27 15:00:16.000000000 +0200
+++ b/xen/arch/x86/xstate.c	2021-09-19 14:30:14.739000000 +0200
@@ -608,6 +608,13 @@ void xstate_init(struct cpuinfo_x86 *c)
         return;
     }
 
+    /*
+     * Clear the cached value to make set_xcr0() and set_msr_xss() really
+     * write it.
+     */
+    this_cpu(xcr0) = 0;
+    this_cpu(xss) = ~0;
+
     cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx);
 
     BUG_ON((eax & XSTATE_FP_SSE) != XSTATE_FP_SSE);
-- 
2.31.1

patch 2

From: Juergen Gross @ 2021-08-18 10:21 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, George Dunlap, Dario Faggioli,
	Marek Marczykowski-Górecki

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <[email protected]>
Signed-off-by: Juergen Gross <[email protected]>
Tested-by: Marek Marczykowski-Górecki <[email protected]>
---
An alternative way to fix the issue would be to keep the sched_resource
of offline cpus allocated like we already do with idle vcpus and units.
This fix would be more intrusive, but it would avoid similar other bugs
like this one.
---
 xen/common/sched/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 6d34764d38..9ac1b01ca8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
     struct vcpu_runstate_info state = { 0 };
     const struct vcpu *v = idle_vcpu[cpu];
 
-    if ( cpu_online(cpu) && v )
+    if ( cpu_online(cpu) && v && get_sched_res(cpu) )
         vcpu_runstate_get(v, &state);
 
     return state.time[RUNSTATE_running];
-- 
2.26.2

@johnnyboy-3
Copy link

Hi,
i tried your fix #6859 (comment) and the one from #6881 to test suspend/resume on R4.1-beta1 with kernel versions 5.13.13 and 5.10 and both failed.
Computer resumes and I can see the xscreensaver, but the system is too unresponsive - not able to unlock the screensaver and display shuts off after ~20 seconds.

@isodude
Copy link
Author

isodude commented Sep 22, 2021

So 5.12.14 survives one hour at least in suspend, but not several hours. It is non responsive with black screen at resume, power led is indicating that it should be up and running.
5.12.14.log

@isodude
Copy link
Author

isodude commented Sep 22, 2021

@johnnyboy-3 you are referencing this patch right?
patch-for-5.13.txt

Make sure that:

  • smt=off in xen commandline
  • bios is patched with TSC fix
  • you remove clocksource=hpet tsc=unstable from xen commandline

@isodude
Copy link
Author

isodude commented Sep 22, 2021

Interesting thread: https://gitlab.freedesktop.org/drm/amd/-/issues/892

@isodude
Copy link
Author

isodude commented Sep 22, 2021

@johnnyboy-3 also make sure that you are running the correct kernel version on your VMs.

@bigdx
Copy link

bigdx commented Sep 23, 2021

Doesnt work for me ;-(

Build and reinstalled vmm-xen in dom0 (no issues as far as i could tell) and tried kernel 5.12 and 5.13 for VMs, no difference. dom0 kernel is the 5.12.14 as it is, no patch/change, thats how understood you. But 5.10 doesnt change anything and didnt tried to patch 5.13 yet.

BIOS is uptodate (and I assume TSC is fixed), xen-commandline (the one in grub, right?) has smt=off (=on works neither) and no clocksource or tsc set.

Any suggestions what else I could try?

Weird, I thought while suspended nothing CAN change/happen but obviously after one hour something changes?!

@isodude
Copy link
Author

isodude commented Sep 23, 2021

I also have been running a pretty new linux-firmware, maybe that does a difference aswell? Basically I

What version of linux-firmware are you running?

@bigdx What is the actual error you get now?

@bigdx
Copy link

bigdx commented Sep 24, 2021

What version of linux-firmware are you running?

The "default" based on the beta-kernel-latest ISO from 9/11/21 and all updates till then. Cant check version or try something before monday.

@bigdx What is the actual error you get now?

Still the same, frozen unlock screen with movable mouse pointer etc...

@isodude
Copy link
Author

isodude commented Sep 24, 2021

The "default" based on the beta-kernel-latest ISO from 9/11/21 and all updates till then. Cant check version or try something before monday.

Great! Then it's not related to that somehow.

Still the same, frozen unlock screen with movable mouse pointer etc...

And nothing in the logs? And to be exakt, you see the dialog to enter password?

Exactly what packages did you update in dom0 from vmm-xen?

What BIOS version are you running? It should be a note in the changelog that TSC is fixed, if 5000 CPU even have these problems to begin with.

I tried to boot without xen to ensure that the kernel itself is handling suspend/resume correctly. It's possible if you delete the multiboot line, change multiboot2 with linux and module --nounzip with initrd.

@bigdx
Copy link

bigdx commented Sep 24, 2021

And nothing in the logs? And to be exakt, you see the dialog to enter password?

To be honest, havent checked, had not much time and hoped it works out of the box ;-) Which log would be of interest for you?
Exactly, the xscreensaver password dialog.

Exactly what packages did you update in dom0 from vmm-xen?

All, I followed you example and installed all that have build.

What BIOS version are you running? It should be a note in the changelog that TSC is fixed, if 5000 CPU even have these problems to begin with.

The changelog doesnt mention that, but the initial/first BIOS was released after TSC has been fixed, so I assume that fix is included. Otherwise I would have corresponding issues as I dont have set any xen option which solved them.

I tried to boot without xen to ensure that the kernel itself is handling suspend/resume correctly. It's possible if you delete the multiboot line, change multiboot2 with linux and module --nounzip with initrd.

Will try that on monday :-)

@johnnyboy-3
Copy link

johnnyboy-3 commented Sep 25, 2021

Hi again

@johnnyboy-3 you are referencing this patch right?
patch-for-5.13.txt

Make sure that:
* smt=off in xen commandline
* bios is patched with TSC fix
* you remove clocksource=hpet tsc=unstable from xen commandline

Yes. I checked all three except bios TSC fix. I run a desktop PC with Ryzen 2400G if that matters.

@johnnyboy-3 also make sure that you are running the correct kernel version on your VMs.

I also turned off auto-boot of all AppVMs and didn't run any while testing.

Exactly what packages did you update in dom0 from vmm-xen?

I tried a first run with python3-xen xen xen-libs xen-hypervisor xen-runtime xen-licenses only and a second one with all rpms build, preferably with debuginfo in the name.

I tried to boot without xen to ensure that the kernel itself is handling suspend/resume correctly. It's possible if you delete the multiboot line, change multiboot2 with linux and module --nounzip with initrd.

Tried that too, works without problems.

The following is a journalctl snippet from the second run (i think, time got a bit messed up due to multiple systems):

Sep 24 22:19:25 dom0 systemd[1]: Starting Qubes suspend hooks...
Sep 24 22:19:25 dom0 pulseaudio[4090]: Disabling timer-based scheduling because running inside a VM.
Sep 24 22:19:25 dom0 rtkit-daemon[4097]: Supervising 2 threads of 1 processes of 1 users.
Sep 24 22:19:25 dom0 rtkit-daemon[4097]: Successfully made thread 4368 of process 4090 (/usr/bin/pulseaudio) owned by '1000' RT at priority 5.
Sep 24 22:19:25 dom0 rtkit-daemon[4097]: Supervising 3 threads of 1 processes of 1 users.
Sep 24 22:19:25 dom0 qmemman.daemon.algo[2481]: balance_when_enough_memory(xen_free_memory=12360429316, total_mem_pref=1819228569.6000001, total_available_memory=14836168042.4)
Sep 24 22:19:25 dom0 qmemman.systemstate[2481]: stat: dom '0' act=4294967296 pref=1819228569.6000001 last_target=4294967296
Sep 24 22:19:25 dom0 qmemman.systemstate[2481]: stat: xenfree=12412858116 memset_reqs=[('0', 4294967296)]
Sep 24 22:19:25 dom0 qmemman.systemstate[2481]: mem-set domain 0 to 4294967296
Sep 24 22:19:25 dom0 52qubes-pause-vms[4366]: 0
Sep 24 22:19:25 dom0 systemd[1]: Finished Qubes suspend hooks.
Sep 24 22:19:25 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:25 dom0 systemd[1]: Reached target Sleep.
Sep 24 22:19:25 dom0 kernel: kauditd_printk_skb: 2 callbacks suppressed
Sep 24 22:19:25 dom0 kernel: audit: type=1130 audit(1632514765.898:280): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:25 dom0 systemd[1]: Starting Suspend...
Sep 24 22:19:25 dom0 systemd-sleep[4370]: Suspending system...
Sep 24 22:19:25 dom0 kernel: PM: suspend entry (deep)
Sep 24 22:19:39 dom0 kernel: Filesystems sync: 0.520 seconds
Sep 24 22:19:39 dom0 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done.
Sep 24 22:19:39 dom0 kernel: OOM killer disabled.
Sep 24 22:19:39 dom0 kernel: Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done.
Sep 24 22:19:39 dom0 kernel: printk: Suspending console(s) (use no_console_suspend to debug)
Sep 24 22:19:39 dom0 kernel: sd 4:0:0:0: [sdc] Synchronizing SCSI cache
Sep 24 22:19:39 dom0 kernel: sd 5:0:0:0: [sdd] Synchronizing SCSI cache
Sep 24 22:19:39 dom0 kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Sep 24 22:19:39 dom0 kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Sep 24 22:19:39 dom0 kernel: sd 4:0:0:0: [sdc] Stopping disk
Sep 24 22:19:39 dom0 kernel: sd 0:0:0:0: [sda] Stopping disk
Sep 24 22:19:39 dom0 kernel: sd 1:0:0:0: [sdb] Stopping disk
Sep 24 22:19:39 dom0 kernel: [drm] free PSP TMR buffer
Sep 24 22:19:39 dom0 kernel: sd 5:0:0:0: [sdd] Stopping disk
Sep 24 22:19:39 dom0 kernel: PM: suspend devices took 0.943 seconds
Sep 24 22:19:39 dom0 kernel: ACPI: Preparing to enter system sleep state S3
Sep 24 22:19:39 dom0 kernel: PM: Saving platform NVS memory
Sep 24 22:19:39 dom0 kernel: Disabling non-boot CPUs ...
Sep 24 22:19:39 dom0 kernel: ------------[ cut here ]------------
Sep 24 22:19:39 dom0 kernel: WARNING: CPU: 1 PID: 0 at arch/x86/mm/tlb.c:462 switch_mm_irqs_off+0x381/0x3a0
Sep 24 22:19:39 dom0 kernel: Modules linked in: nf_tables nfnetlink rt2800usb rt2x00usb rt2800lib rt2x00lib mac80211 cfg80211 rfkill snd_hda_codec_realtek snd_hda_codec_generic libarc4 ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec joydev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd intel_rapl_msr intel_rapl_common soundcore sp5100_tco k10temp wmi_bmof pcspkr r8169 i2c_piix4 gpio_amdpt gpio_generic wmi video fuse xenfs ip_tables dm_thin_pool dm_persistent_data dm_bio_prison dm_crypt trusted asn1_encoder amdgpu crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel drm_ttm_helper ttm iommu_v2 gpu_sched i2c_algo_bit drm_kms_helper cec ccp drm xhci_pci xhci_pci_renesas xhci_hcd xen_acpi_processor xen_privcmd xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
Sep 24 22:19:39 dom0 kernel: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.13.13-1.fc32.qubes.x86_64 #1
Sep 24 22:19:39 dom0 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P3.60 07/31/2019
Sep 24 22:19:39 dom0 kernel: RIP: e030:switch_mm_irqs_off+0x381/0x3a0
Sep 24 22:19:39 dom0 kernel: Code: 00 00 65 48 89 05 e7 8f fa 7e e9 77 fd ff ff b9 49 00 00 00 b8 01 00 00 00 31 d2 0f 30 e9 57 fd ff ff 41 89 f6 e9 9d fe ff ff <0f> 0b e8 78 fa ff ff e9 f2 fc ff ff 0f 0b e9 47 fe ff ff 0f 0b e9
Sep 24 22:19:39 dom0 kernel: RSP: e02b:ffffc900400afeb8 EFLAGS: 00010006
Sep 24 22:19:39 dom0 kernel: RAX: 0000000101a3c000 RBX: ffff8881002c0000 RCX: 0000000000000040
Sep 24 22:19:39 dom0 kernel: RDX: ffff8881002c0000 RSI: 0000000000000000 RDI: ffff888181a3c000
Sep 24 22:19:39 dom0 kernel: RBP: ffffffff829d84e0 R08: 0000000000000000 R09: 0000000000000000
Sep 24 22:19:39 dom0 kernel: R10: 0000000000000004 R11: 0000000000000000 R12: ffff88810a04ea40
Sep 24 22:19:39 dom0 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
Sep 24 22:19:39 dom0 kernel: FS: 0000000000000000(0000) GS:ffff888127240000(0000) knlGS:0000000000000000
Sep 24 22:19:39 dom0 kernel: CS: 10000e030 DS: 002b ES: 002b CR0: 0000000080050033
Sep 24 22:19:39 dom0 kernel: CR2: 000079f1f8a86000 CR3: 0000000002810000 CR4: 0000000000050660
Sep 24 22:19:39 dom0 kernel: Call Trace:
Sep 24 22:19:39 dom0 kernel: switch_mm+0x1c/0x30
Sep 24 22:19:39 dom0 kernel: play_dead_common+0xa/0x20
Sep 24 22:19:39 dom0 kernel: xen_pv_play_dead+0xa/0x60
Sep 24 22:19:39 dom0 kernel: do_idle+0xd1/0xe0
Sep 24 22:19:39 dom0 kernel: cpu_startup_entry+0x19/0x20
Sep 24 22:19:39 dom0 kernel: asm_cpu_bringup_and_idle+0x5/0x1000
Sep 24 22:19:39 dom0 kernel: ---[ end trace 4c31b68b7e7d17f3 ]---
Sep 24 22:19:39 dom0 kernel: smpboot: CPU 1 is now offline
Sep 24 22:19:39 dom0 kernel: smpboot: CPU 2 is now offline
Sep 24 22:19:39 dom0 kernel: smpboot: CPU 3 is now offline
Sep 24 22:19:39 dom0 kernel: ACPI: Low-level resume complete
Sep 24 22:19:39 dom0 kernel: PM: Restoring platform NVS memory
Sep 24 22:19:39 dom0 kernel: Enabling non-boot CPUs ...
Sep 24 22:19:39 dom0 kernel: installing Xen timer for CPU 1
Sep 24 22:19:39 dom0 kernel: xen_acpi_processor: Uploading Xen processor PM info
Sep 24 22:19:39 dom0 kernel: xen_acpi_processor: (_PXX): Hypervisor error (-19) for ACPI CPU1
Sep 24 22:19:39 dom0 kernel: xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU3
Sep 24 22:19:39 dom0 kernel: xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU5
Sep 24 22:19:39 dom0 kernel: xen_acpi_processor: (PXX): Hypervisor error (-19) for ACPI CPU7
Sep 24 22:19:39 dom0 kernel: cpu 1 spinlock event irq 67
Sep 24 22:19:39 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 24 22:19:39 dom0 kernel: ACPI: _PR
.C002: Found 2 idle states
Sep 24 22:19:39 dom0 kernel: CPU1 is up
Sep 24 22:19:39 dom0 kernel: installing Xen timer for CPU 2
Sep 24 22:19:39 dom0 kernel: cpu 2 spinlock event irq 73
Sep 24 22:19:39 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 24 22:19:39 dom0 kernel: ACPI: _PR
.C004: Found 2 idle states
Sep 24 22:19:39 dom0 kernel: CPU2 is up
Sep 24 22:19:39 dom0 kernel: installing Xen timer for CPU 3
Sep 24 22:19:39 dom0 kernel: cpu 3 spinlock event irq 79
Sep 24 22:19:39 dom0 kernel: [Firmware Bug]: ACPI MWAIT C-state 0x0 not supported by HW (0x0)
Sep 24 22:19:39 dom0 kernel: ACPI: _PR
.C006: Found 2 idle states
Sep 24 22:19:39 dom0 kernel: CPU3 is up
Sep 24 22:19:39 dom0 kernel: ACPI: Waking up from system sleep state S3
Sep 24 22:19:39 dom0 kernel: usb usb1: root hub lost power or was reset
Sep 24 22:19:39 dom0 kernel: usb usb2: root hub lost power or was reset
Sep 24 22:19:39 dom0 kernel: [drm] PCIE GART of 1024M enabled.
Sep 24 22:19:39 dom0 kernel: [drm] PTB located at 0x000000F400900000
Sep 24 22:19:39 dom0 kernel: [drm] PSP is resuming...
Sep 24 22:19:39 dom0 kernel: [drm] reserve 0x400000 from 0xf403c00000 for PSP TMR
Sep 24 22:19:39 dom0 kernel: sd 0:0:0:0: [sda] Starting disk
Sep 24 22:19:39 dom0 kernel: sd 1:0:0:0: [sdb] Starting disk
Sep 24 22:19:39 dom0 kernel: sd 4:0:0:0: [sdc] Starting disk
Sep 24 22:19:39 dom0 kernel: sd 5:0:0:0: [sdd] Starting disk
Sep 24 22:19:39 dom0 kernel: ata9: SATA link down (SStatus 0 SControl 300)
Sep 24 22:19:39 dom0 kernel: usb 1-6: reset low-speed USB device number 3 using xhci_hcd
Sep 24 22:19:39 dom0 kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 24 22:19:39 dom0 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Sep 24 22:19:39 dom0 kernel: ata2.00: configured for UDMA/133
Sep 24 22:19:39 dom0 kernel: ata1.00: configured for UDMA/133
Sep 24 22:19:39 dom0 kernel: usb 1-5: reset low-speed USB device number 2 using xhci_hcd
Sep 24 22:19:39 dom0 kernel: [drm] psp command (0x5) failed and response status is (0x0)
Sep 24 22:19:39 dom0 kernel: [drm:psp_hw_start [amdgpu]] ERROR PSP load tmr failed!
Sep 24 22:19:39 dom0 kernel: [drm:psp_resume [amdgpu]] ERROR PSP resume failed
Sep 24 22:19:39 dom0 kernel: [drm:amdgpu_device_fw_loading [amdgpu]] ERROR resume of IP block failed -22
Sep 24 22:19:39 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: amdgpu_device_ip_resume failed (-22).
Sep 24 22:19:39 dom0 kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -22
Sep 24 22:19:39 dom0 kernel: amdgpu 0000:06:00.0: PM: failed to resume async: error -22
Sep 24 22:19:39 dom0 kernel: PM: resume devices took 2.297 seconds
Sep 24 22:19:39 dom0 kernel: OOM killer enabled.
Sep 24 22:19:39 dom0 kernel: Restarting tasks ... done.
Sep 24 22:19:39 dom0 kernel: video LNXVIDEO:01: Restoring backlight state
Sep 24 22:19:39 dom0 kernel: PM: suspend exit
Sep 24 22:19:39 dom0 kernel: audit: type=1130 audit(1632514779.968:281): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:39 dom0 kernel: audit: type=1131 audit(1632514779.968:282): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:39 dom0 audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:39 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: The canary thread is apparently starving. Taking action.
Sep 24 22:19:39 dom0 systemd-sleep[4370]: System resumed.
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: Demoting known real-time threads.
Sep 24 22:19:39 dom0 systemd[1]: systemd-suspend.service: Succeeded.
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: Successfully demoted thread 4368 of process 4090 (/usr/bin/pulseaudio).
Sep 24 22:19:39 dom0 systemd[1]: Finished Suspend.
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: Successfully demoted thread 4116 of process 4090 (/usr/bin/pulseaudio).
Sep 24 22:19:39 dom0 systemd[1]: Stopped target Sleep.
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: Successfully demoted thread 4090 of process 4090 (/usr/bin/pulseaudio).
Sep 24 22:19:39 dom0 systemd[1]: Reached target Suspend.
Sep 24 22:19:39 dom0 rtkit-daemon[4097]: Demoted 3 threads.
Sep 24 22:19:39 dom0 systemd[1]: Stopped target Suspend.
Sep 24 22:19:39 dom0 systemd[1]: Stopping Qubes suspend hooks...
Sep 24 22:19:39 dom0 systemd-logind[2484]: Operation 'sleep' finished.
Sep 24 22:19:40 dom0 upowerd[2485]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/usb1/1-5
Sep 24 22:19:40 dom0 upowerd[2485]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/usb1/1-6
Sep 24 22:19:40 dom0 upowerd[2485]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/usb1/1-5
Sep 24 22:19:40 dom0 upowerd[2485]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/usb1/1-6
Sep 24 22:19:40 dom0 52qubes-pause-vms[4464]: 0
Sep 24 22:19:40 dom0 systemd[1]: qubes-suspend.service: Succeeded.
Sep 24 22:19:40 dom0 systemd[1]: Stopped Qubes suspend hooks.
Sep 24 22:19:40 dom0 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:40 dom0 kernel: audit: type=1131 audit(1632514780.256:289): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=qubes-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Sep 24 22:19:48 dom0 kernel: ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 24 22:19:48 dom0 kernel: ata5.00: configured for UDMA/133
Sep 24 22:19:48 dom0 kernel: ata6: softreset failed (device not ready)
Sep 24 22:19:48 dom0 kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 24 22:19:48 dom0 kernel: ata6.00: configured for UDMA/133
Sep 24 22:19:50 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=12509, emitted seq=12513
Sep 24 22:19:50 dom0 kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process xfwm4 pid 3952 thread xfwm4:cs0 pid 3955
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset begin!
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] ERROR ring kiq_2.1.0 test failed (-110)
Sep 24 22:19:50 dom0 kernel: [drm] free PSP TMR buffer
Sep 24 22:19:50 dom0 kernel: [drm] psp command (0x7) failed and response status is (0xFFFF0007)
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset succeeded, trying to resume
Sep 24 22:19:50 dom0 kernel: [drm] PCIE GART of 1024M enabled.
Sep 24 22:19:50 dom0 kernel: [drm] PTB located at 0x000000F400900000
Sep 24 22:19:50 dom0 kernel: [drm] VRAM is lost due to GPU reset!
Sep 24 22:19:50 dom0 kernel: [drm] PSP is resuming...
Sep 24 22:19:50 dom0 kernel: [drm] reserve 0x400000 from 0xf403c00000 for PSP TMR
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RAS: optional ras ta ucode is not available
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RAP: optional rap ta ucode is not available
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Sep 24 22:19:50 dom0 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 24 22:19:50 dom0 kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Sep 24 22:19:50 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Sep 24 22:19:50 dom0 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 24 22:19:51 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.0.0
Sep 24 22:19:51 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.1.0
Sep 24 22:19:52 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.2.0
Sep 24 22:19:52 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.3.0
Sep 24 22:19:53 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.0.1
Sep 24 22:19:53 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.1.1
Sep 24 22:19:54 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.2.1
Sep 24 22:19:54 dom0 kernel: [drm] Fence fallback timer expired on ring comp_1.3.1
Sep 24 22:19:55 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:55 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow start
Sep 24 22:19:55 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: recover vram bo from shadow done
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: GPU reset(2) succeeded!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: [drm] Skip scheduling IBs!
Sep 24 22:19:55 dom0 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Sep 24 22:19:55 dom0 kernel: kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
Sep 24 22:19:55 dom0 kernel: kfd kfd: amdgpu: Error initializing iommuv2
Sep 24 22:19:55 dom0 kernel: kfd kfd: amdgpu: device 1002:15dd NOT added due to errors
Sep 24 22:19:55 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:55 dom0 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 24 22:19:56 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 2942 thread X:cs0 pid 3796)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000800107381000 from IH client 0x1b (UTCL2)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 2942 thread X:cs0 pid 3796)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000800107382000 from IH client 0x1b (UTCL2)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 2942 thread X:cs0 pid 3796)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000800107380000 from IH client 0x1b (UTCL2)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32770, for process Xorg pid 2942 thread X:cs0 pid 3796)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: in page starting at address 0x0000800107387000 from IH client 0x1b (UTCL2)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MORE_FAULTS: 0x1
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: WALKER_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: MAPPING_ERROR: 0x0
Sep 24 22:19:56 dom0 kernel: amdgpu 0000:06:00.0: amdgpu: RW: 0x1
Sep 24 22:19:56 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:56 dom0 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 24 22:19:57 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:57 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:58 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:58 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:59 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:19:59 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:00 dom0 kernel: kauditd_printk_skb: 27 callbacks suppressed
Sep 24 22:20:00 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:00 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:01 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:01 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:02 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:02 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:03 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:04 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:04 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:05 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:05 dom0 qmemman.daemon.algo[2481]: balance_when_enough_memory(xen_free_memory=12360319581, total_mem_pref=1863797145.6000001, total_available_memory=14791489731.4)
Sep 24 22:20:05 dom0 qmemman.systemstate[2481]: stat: dom '0' act=4294967296 pref=1863797145.6000001 last_target=4294967296
Sep 24 22:20:05 dom0 qmemman.systemstate[2481]: stat: xenfree=12412748381 memset_reqs=[('0', 4294967296)]
Sep 24 22:20:05 dom0 qmemman.systemstate[2481]: mem-set domain 0 to 4294967296
Sep 24 22:20:05 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 24 22:20:06 dom0 kernel: [drm] Fence fallback timer expired on ring sdma0

(sorry, still quite long, can i use spoilers or something?)

system-log-resume-suspend-error.log

@isodude
Copy link
Author

isodude commented Sep 27, 2021

This is not related to this issue anymore. This issue was fixed with the updated Xen 4.12 patches, which now are included in Xen 4.13.

I will open a new issue concerning resume crashing on ryzen.

@isodude isodude closed this as completed Sep 27, 2021
@isodude
Copy link
Author

isodude commented Sep 27, 2021

Let's see how things fare when marmarek pushed the new xen 4.13 branch, I will check if that resolves things.

@andrewdavidwong andrewdavidwong removed the needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. label Sep 27, 2021
@isodude
Copy link
Author

isodude commented Sep 30, 2021

@johnnyboy-3 @bigdx I started a much more specific case here instead: #6923

@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core C: kernel C: Xen hardware support P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.
Projects
None yet
Development

No branches or pull requests

4 participants