Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No graphical session after update to 41.20241117.3 on Aurora. #8

Open
mejiasjrg opened this issue Nov 23, 2024 · 37 comments
Open

No graphical session after update to 41.20241117.3 on Aurora. #8

mejiasjrg opened this issue Nov 23, 2024 · 37 comments
Labels
bug Something isn't working

Comments

@mejiasjrg
Copy link

Describe the bug

The system updated from 41.20241112.1 to 41.20241117.3 and on the next boot it showed the arrow cursor for a little while and then a blank screen. It didn't even show the login screen. I changed to a terminal with Ctrl-Alt-F2 and could login with no issues. Tried sudo systemctl restart sddm, and got the blank screen again.
Rebooted the machine and selected the second entry in the grub menu, and the system booted normally to the graphical session. Then I issued rpm-ostree rollback, reboted, and issued systemctl start rpm-ostreed-automatic.service to make the system update again, hoping there might be a new (different) update, but instead updated to 41.20241117.3 again, and again got the blank screen.
I finally did the rollback again and am posting from the system running 41.20241112.1

What did you expect to happen?

I expected the system to boot normally running 41.20241117.3

Output of bootc status

Current staged image: ghcr.io/ublue-os/aurora:stable
    Image version: 41.20241117.3 (2024-11-17 15:47:30 UTC)
    Image digest: sha256:fb3a023ce3c7609591c7524cb0c5a679453e5e0ff62cef6f334b4d7ca165965b
Current booted image: ghcr.io/ublue-os/aurora:stable
    Image version: 41.20241112.1 (2024-11-12 21:06:27 UTC)
    Image digest: sha256:f99e7cf789398600ffe7d1ade44ca0bceb31d76cbe8b8eb140a8b284804d6f6c
Current rollback image: ghcr.io/ublue-os/aurora:stable
    Image version: 41.20241117.3 (2024-11-17 15:47:30 UTC)
    Image digest: sha256:fb3a023ce3c7609591c7524cb0c5a679453e5e0ff62cef6f334b4d7ca165965b

Output of groups

jm wheel

Extra information or context

For the average user Aurora is intended to, this would have been a show-stopper. Not good.

@mejiasjrg mejiasjrg changed the title No graphical session after update to 41.20241117.3 No graphical session after update to 41.20241117.3 on Aurora. Nov 23, 2024
@dosubot dosubot bot added the bug Something isn't working label Nov 23, 2024
@tulilirockz
Copy link
Contributor

These are all the commits that affected that image from 12/nov to 17/nov. Apparently it is f41 update that broke your system, as you are on the first build of day 12/nov (before the f41-in-stable commit).

Image

@castrojo
Copy link
Member

Some hardware information could be useful, please run a ujust device-info and paste the URL it returns, thanks!

@mejiasjrg
Copy link
Author

Some hardware information could be useful, please run a ujust device-info and paste the URL it returns, thanks!

Here you are:
https://paste.centos.org/view/9776d16c

It might be worth mentioning that I'm also running Bluefin-gts and Vauxite:latest in this same machine flawlessly. As they are both based on ostree images, this issue seems to be limited to bootc images.
By the way, how do I prevent the system from staging the 41.20241117.3 image? It did last night, and I just checked and the image is staged again.
Thank you!

@castrojo
Copy link
Member

Use ujust rebase-helper to pin to the image you want, I don't think you want to turn off updates though.

@mejiasjrg
Copy link
Author

Use ujust rebase-helper to pin to the image you want, I don't think you want to turn off updates though.

Correct. I just want it to boot to the working image while a solution is found.
Thank you!

@renner0e
Copy link

What GPU do you have. By any chance an old intel haswell iGPU (4th Gen, from 2014?)
The pastebin is expired

@renner0e
Copy link

renner0e commented Nov 25, 2024

Use ujust rebase-helper to pin to the image you want, I don't think you want to turn off updates though.

Correct. I just want it to boot to the working image while a solution is found. Thank you!

I rebased to from aurora:latest to aurora:40 but that did not fix anything.
Rollbacks did not work at all.

Only a reinstall to aurora (ISO from august I had laying around) would "fix" it.
Updating to F41 made the issue appear again.
But this time the rollback actually reverted me back to a working system.

I "fixed" it by rebasing to aurora:40 after a reinstall.

@mejiasjrg
Copy link
Author

What GPU do you have. By any chance an old intel haswell iGPU (4th Gen, from 2014?) The pastebin is expired

I don't have access to the machine at the moment, and I don't remember what iGPU it has. But it is a 12 year-old machine.
In about 4 hours I will be able to paste the machine info again.

@mejiasjrg
Copy link
Author

Use ujust rebase-helper to pin to the image you want, I don't think you want to turn off updates though.

Correct. I just want it to boot to the working image while a solution is found. Thank you!

I rebased to from aurora:latest to aurora:40 but that did not fix anything. Rollbacks did not work at all.

Only a reinstall to aurora (ISO from august I had laying around) would "fix" it. Updating to F41 made the issue appear again. But this time the rollback actually reverted me back to a working system.

I "fixed" it by rebasing to aurora:40 after a reinstall.

As per Jorge Castro's suggestion, I rebased to 41.20241112.1 using the ujust rebase-helper, and it is working fine.

@mejiasjrg
Copy link
Author

mejiasjrg commented Nov 25, 2024

What GPU do you have. By any chance an old intel haswell iGPU (4th Gen, from 2014?) The pastebin is expired

Here is the device-info, as TXT this time, so it doesn't go anywhere.
It says the PCI Video Card is: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller

Aurora@Lenovo_B470e-2024-11-25.txt

Edit: what neofetch shows...
Image

@renner0e
Copy link

renner0e commented Nov 26, 2024

Intel 2nd Gen Sandy Bridge 2012

I wonder if the behavior would change if you chuck in a different GPU and use that and see if that works.

See if some new QT component/Plasma is acting weirdly with the GPU/driver as I don't have this issue on my other modern machines.

Would be a lot easier if the atomic stuff had proper live ISOs for testing.

Have you looked at the memory usage? There might be a memory leak somewhere.

Go to another tty and do startplasma-wayland, plasmashell should crash/no wallpaper, panel etc. if we're having the same issue. Look at the memory usage with htop at the same time.

@mejiasjrg
Copy link
Author

I wonder if the behavior would change if you chuck in a different GPU and use that and see if that works

As this is a laptop, I don't think that is possible.

Have you looked at the memory usage? There might be a memory leak somewhere.

Not really, but I think the memory usage is about the same. I'm afraid I don´t know how to detect a memory leak.

Go to another tty and do startplasma-wayland, plasmashell should crash/no wallpaper, panel etc. if we're having the same issue. Look at the memory usage with htop at the same time.

I did that, but it just reloaded plasma. I mean, it showed the splash screen and then the desktop with all the apps open (Firefox, Terminal & Dolphin). Memory usage kept about the same: 1.39 GB before, 1.41 GB after. (Remember I'm running the 41.20241112.1 image, and as it is pinned, the 41.20241117.3 image is not available).

@renner0e
Copy link

renner0e commented Nov 26, 2024

I was describing the behavior of the apparently broken images newer than 41-20241117.

In my case the memory (and swap) usage was rapidly growing and was nowhere near reasonable and everything slowed down to a halt and/or my kernel probably panicked and I had to hard shutdown afterwards.

@castrojo
Copy link
Member

Check the kernel and mesa versions for each image, maybe it was a change in there, seems like a good place to start.

@renner0e
Copy link

Check the kernel and mesa versions for each image, maybe it was a change in there, seems like a good place to start.

Will do a little testing in the coming weeks

kinoite/aurora41: 11th Nov

mesa-libGLU-9.0.3-5.fc41.x86_64
kernel-tools-libs-6.11.6-300.fc41.x86_64
kernel-modules-core-6.11.6-300.fc41.x86_64
kernel-core-6.11.6-300.fc41.x86_64
kernel-modules-6.11.6-300.fc41.x86_64
kernel-6.11.6-300.fc41.x86_64
kernel-tools-6.11.6-300.fc41.x86_64
kernel-modules-extra-6.11.6-300.fc41.x86_64
mesa-filesystem-24.2.4-1.fc41.x86_64
mesa-libgbm-24.2.4-1.fc41.x86_64
mesa-libglapi-24.2.4-1.fc41.x86_64
mesa-dri-drivers-24.2.4-1.fc41.x86_64
mesa-libEGL-24.2.4-1.fc41.x86_64
mesa-libGL-24.2.4-1.fc41.x86_64
mesa-va-drivers-24.2.4-1.fc41.x86_64
mesa-vulkan-drivers-24.2.4-1.fc41.x86_64
mesa-libxatracker-24.2.4-1.fc41.x86_64

kinoite/aurora41: 12th Nov 6.11.6 and 6.11.7???

mesa-libGLU-9.0.3-5.fc41.x86_64
kernel-tools-libs-6.11.7-300.fc41.x86_64
kernel-tools-6.11.7-300.fc41.x86_64
kernel-modules-core-6.11.6-300.fc41.x86_64
kernel-core-6.11.6-300.fc41.x86_64
kernel-modules-6.11.6-300.fc41.x86_64
kernel-6.11.6-300.fc41.x86_64
kernel-modules-extra-6.11.6-300.fc41.x86_64
mesa-filesystem-24.2.4-1.fc41.x86_64
mesa-libgbm-24.2.4-1.fc41.x86_64
mesa-libglapi-24.2.4-1.fc41.x86_64
mesa-dri-drivers-24.2.4-1.fc41.x86_64
mesa-libEGL-24.2.4-1.fc41.x86_64
mesa-libGL-24.2.4-1.fc41.x86_64
mesa-va-drivers-24.2.4-1.fc41.x86_64
mesa-vulkan-drivers-24.2.4-1.fc41.x86_64
mesa-libxatracker-24.2.4-1.fc41.x86_64

aurora40: from Oct 28 ✅
ghcr.io/ublue-os/aurora:40-20241028

mesa-libGLU-9.0.3-4.fc40.x86_64
mesa-filesystem-24.1.7-1.fc40.x86_64
mesa-va-drivers-24.1.7-1.fc40.x86_64
mesa-libglapi-24.1.7-1.fc40.x86_64
mesa-dri-drivers-24.1.7-1.fc40.x86_64
mesa-libgbm-24.1.7-1.fc40.x86_64
mesa-libEGL-24.1.7-1.fc40.x86_64
mesa-libGL-24.1.7-1.fc40.x86_64
mesa-vulkan-drivers-24.1.7-1.fc40.x86_64
mesa-libxatracker-24.1.7-1.fc40.x86_64
kernel-tools-libs-6.11.4-201.fc40.x86_64
kernel-tools-6.11.4-201.fc40.x86_64
kernel-modules-core-6.11.4-201.fc40.x86_64
kernel-core-6.11.4-201.fc40.x86_64
kernel-modules-6.11.4-201.fc40.x86_64
kernel-6.11.4-201.fc40.x86_64
kernel-modules-extra-6.11.4-201.fc40.x86_64
kernel-headers-6.11.3-200.fc40.x86_64

kinoite40: 25th Nov old mesa, new kernel
I have to test this image in the coming weeks.
ghcr.io/ublue-os/kinoite-main:40-20241125

mesa-libGLU-9.0.3-4.fc40.x86_64
kernel-modules-core-6.11.8-200.fc40.x86_64
kernel-core-6.11.8-200.fc40.x86_64
kernel-modules-6.11.8-200.fc40.x86_64
kernel-6.11.8-200.fc40.x86_64
kernel-modules-extra-6.11.8-200.fc40.x86_64
mesa-filesystem-24.1.7-1.fc40.x86_64
mesa-va-drivers-24.1.7-1.fc40.x86_64
mesa-libglapi-24.1.7-1.fc40.x86_64
mesa-dri-drivers-24.1.7-1.fc40.x86_64
mesa-libgbm-24.1.7-1.fc40.x86_64
mesa-libEGL-24.1.7-1.fc40.x86_64
mesa-libGL-24.1.7-1.fc40.x86_64
mesa-vulkan-drivers-24.1.7-1.fc40.x86_64
mesa-libxatracker-24.1.7-1.fc40.x86_64
kernel-tools-libs-6.11.8-200.fc40.x86_64
kernel-tools-6.11.8-200.fc40.x86_64

@mejiasjrg
Copy link
Author

Check the kernel and mesa versions for each image, maybe it was a change in there, seems like a good place to start.

I thought of rebasing to specific daily images until I found the last working image and the first non-working one, and then find the differences and, hopefully, pin down the culprit. But I just ran ujust rebase-helper and found the oldest image it shows is 41.20241120. Is there a way to force ujust rebase-helper to show older images? If not, how do I manually rebase to a specific image? (I want to try 41.20241115.x, whatever the last of that day was).
Thank you!

@renner0e
Copy link

Check the kernel and mesa versions for each image, maybe it was a change in there, seems like a good place to start.

I thought of rebasing to specific daily images until I found the last working image and the first non-working one, and then find the differences and, hopefully, pin down the culprit. But I just ran ujust rebase-helper and found the oldest image it shows is 41.20241120. Is there a way to force ujust rebase-helper to show older images? If not, how do I manually rebase to a specific image? (I want to try 41.20241115.x, whatever the last of that day was). Thank you!

sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/aurora:41-20241115

@DeeBeeDouble
Copy link

This issue is happening to me as well on both the aurora-nvidia and kinoite-nvidia images. My GPU is a Gtx 750 ti.

@m2Giles
Copy link
Member

m2Giles commented Nov 29, 2024

On a GTX 750ti, the Nvidia images uses a driver that does not support your card

@DeeBeeDouble
Copy link

On a GTX 750ti, the Nvidia images uses a driver that does not support your card

Huh? As of the latest beta driver (565) the GTX 750ti is still listed as supported on Nvidia's website. Also, the driver was working just fine before the update a week ago or so.

@m2Giles
Copy link
Member

m2Giles commented Nov 29, 2024 via email

@mejiasjrg
Copy link
Author

Check the kernel and mesa versions for each image, maybe it was a change in there, seems like a good place to start.

I thought of rebasing to specific daily images until I found the last working image and the first non-working one, and then find the differences and, hopefully, pin down the culprit. But I just ran ujust rebase-helper and found the oldest image it shows is 41.20241120. Is there a way to force ujust rebase-helper to show older images? If not, how do I manually rebase to a specific image? (I want to try 41.20241115.x, whatever the last of that day was). Thank you!

Update!
Here the results of my "rebasing experiment":

Image

Here is the summary of changes from 41.20241112.1 to 41.20241113: 41.20241113.txt
And the result of rpm-ostree db diff --changelogs: rpm-ostree db diff --changelogs.txt

I was expecting to see something related to plasma, sddm or wayland, but, to my inexpert eye, there is nothing.
Hope this helps those with the proper knowledge to find out what's going on.

By the way @renner0e , I manage to reproduce the behavior you described when running startplasma-wayland, except that memory usage tended to peak at 2.7 GB and then go back to less than 1.0 GB, and the CPU usage peaked at 50-60% and went back to 5-10%, but the system was still responsive. However in one occasion the system went over 3 GB memory and 100% CPU usage and was unresponsive. At the moment I had to do something else and left the machine unattended for about 10 minutes, and then memory usage was about 1.3 GB, CPU usage about 15%, and the system was responsive again.

@m2Giles
Copy link
Member

m2Giles commented Dec 1, 2024

That's right around the point that we started shipping F41 for stable.

Those different images around that point were fixing signing issues and ISOs

@renner0e
Copy link

renner0e commented Dec 1, 2024

6.11.3-300.fc41.x86_64 -> 6.11.7-300.fc41.x86_64

I guess it is probably the kernel?

* GL Support (glxinfo -B | grep -E "OpenGL version|OpenGL renderer"):
     OpenGL renderer string: Mesa Intel(R) HD Graphics (HSW GT1)
     OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.2.4

* DRM Information (journalctl -k -b --no-hostname | grep -o 'kernel:.*drm.*$' | cut -d ' ' -f 2- ):
     ACPI: bus type drm_connector registered
     [drm] Initialized simpledrm 1.0.0 for simple-framebuffer.0 on minor 0
     simple-framebuffer simple-framebuffer.0: [drm] fb0: simpledrmdrmfb frame buffer device
     i915 0000:00:02.0: [drm] Found HASWELL (device ID 0402) display version 7.00
     i915 0000:00:02.0: [drm] PipeC fused off
     [drm] Initialized i915 1.6.0 for 0000:00:02.0 on minor 1
     fbcon: i915drmfb (fb0) is primary device
     i915 0000:00:02.0: [drm] fb0: i915drmfb frame buffer device

https://en.wikipedia.org/wiki/List_of_Intel_graphics_processing_units
Pentium B940 SandyBridge = Gen6 Graphics called just "HD Graphics"
Celeron G1840 Haswell = Gen7.5 Graphics called just "HD Graphics"
So they are both using i915 driver.

There have been a bunch of changes from 6.11.3 to 6.11.7 to the i915 stuff
a snippet of:
git log --oneline "v6.11.3..v6.11.7" | grep i915

f0e1e960f4cf drm/i915/display: Don't enable decompression on Xe2 with Tile4
35b8dc55234a drm/i915/psr: Prevent Panel Replay if CRC calculation is enabled
a02e4094347b drm/i915: move rawclk from runtime to display runtime info
d1f12b7959a1 drm/i915/pps: Disable DPLS_GATING around pps sequence
8c067cb7c7d3 drm/i915/display/dp: Compute AS SDP when vrr is also enabled
1c780bd9df4f drm/i915/dp: Clear VSC SDP during post ddi disable routine
5b89dcf23575 drm/i915/hdcp: Add encoder check in hdcp2_get_capability
4912e8fb3c37 drm/i915/hdcp: Add encoder check in intel_hdcp_get_capability
d0b83f496ece drm/i915/display: WA for Re-initialize dispcnlunitt1 xosc clock
7976b0042489 drm/i915/display: Cache adpative sync caps to use it later
7831ec68987a drm/i915: disable fbc due to Wa_16023588340
c431632cb0c0 drm/i915: Skip programming FIA link enable bits for MTL+
aeefacb3e40f drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
6fc24abe15ac drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
ea15e5072f99 drm/i915/hdcp: fix connector refcounting

Bisecting the kernel seems like a good idea to me🙃. But I don't know how I would get my custom kernel into aurora.
I would absolutely need help with that.
For testing purposes, a vanilla Fedora kernel built with the bisected commit hash would probably suffice.

EDIT: This is probably bullshit as 6.11.10 works on official fedora kinoite

@renner0e
Copy link

renner0e commented Dec 1, 2024

TODO:

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:41
sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/kinoite-main:latest
  • ships 6.11.10 kernel see if that got fixed it in the meantime
  • test on proper fedora kinoite images, file bug report on fedora bugzilla.

Also test :rawhide as it ships a kernel-6.13.0-0.rc0.20241126git7eef7e306d3c from a couple days ago.

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:rawhide

@mejiasjrg
Copy link
Author

TODO:

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:41
sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/kinoite-main:latest
* ships 6.11.10 kernel see if that got fixed it in the meantime

* test on proper fedora kinoite images, file bug report on fedora bugzilla.

Also test :rawhide as it ships a kernel-6.13.0-0.rc0.20241126git7eef7e306d3c from a couple days ago.

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:rawhide

Last night I rebased from Aurora 41.20241112.1 directly to kinoite-main:latest, and this morning booted to it with same result: no graphical session. I did not try startplasma-wayland, but I expect it to behave the same as Aurora 41.20241113 - 41.20241117.3
Tonight I might try kinoite-main:rawhide.

@renner0e
Copy link

renner0e commented Dec 3, 2024

TODO:

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:41
sudo rpm-ostree rebase ostree-image-signed:docker://ghcr.io/ublue-os/kinoite-main:latest
* ships 6.11.10 kernel see if that got fixed it in the meantime

* test on proper fedora kinoite images, file bug report on fedora bugzilla.

Also test :rawhide as it ships a kernel-6.13.0-0.rc0.20241126git7eef7e306d3c from a couple days ago.

sudo rpm-ostree rebase ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:rawhide

Last night I rebased from Aurora 41.20241112.1 directly to kinoite-main:latest, and this morning booted to it with same result: no graphical session. I did not try startplasma-wayland, but I expect it to behave the same as Aurora 41.20241113 - 41.20241117.3 Tonight I might try kinoite-main:rawhide.

Ublue does not build rawhide tags so there is no kinoite-main:rawhide

You need to rebase to quay.io/fedora-ostree-desktops/kinoite:rawhide.

@renner0e
Copy link

renner0e commented Dec 3, 2024

For some reason vanilla/non-ublue variants with 6.11.10-300.fc41.x86_64 all work

images that work for me:


● fedora:fedora/41/x86_64/kinoite
                  Version: 41.20241203.0 (2024-12-03T02:12:56Z)
                   Commit: 598699548e36a8e624d1d650060c0a5015cddd71ed06b70670132e85402251e1
             GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1
                   Pinned: yes

● ostree-unverified-registry:quay.io/fedora-ostree-desktops/kinoite:41
                   Digest: sha256:4025e675d670e69cbe6ad17a1218db998a19495397a9cf0a2a1fad8a6186b6ee
                  Version: 41.20241203.0 (2024-12-03T02:29:43Z)

● ostree-unverified-registry:quay.io/fedora/fedora-kinoite:latest
                   Digest: sha256:e34d75931bb6d24d3d227905ae3342f0d6d99cb56b87c393ffc9a33b86e6b2cb
                  Version: 41.20241203.0 (2024-12-03T02:13:46Z)

for good measure I also tested GNOME, which also worked just fine

ublue-os/silverblue-main:latest

@mejiasjrg
Copy link
Author

Ublue does not build rawhide tags so there is no kinoite-main:rawhide

You need to rebase to quay.io/fedora-ostree-desktops/kinoite:rawhide.

Continuing with the tests, last night I rebased from Aurora 41.20241112.1 directly to quay.io/fedora-ostree-desktops/kinoite:rawhide, and booted it successfully to a graphical session:

Image

There is a minor glitch: the menu icon is not shown, but clicking on it shows the menu normally. Otherwise the system seemed to work properly.
Also the pinned icon on the taskbar for ptyxis/GNOME Terminal is not working, but it wasn't before rebasing, so it's not related to rawhide (that's the notification shown in the screen capture). I'll have to delve into that later.

So, with kernel 6.13.0-0.rc0.20241126git7eef7e306d3c the graphical session loads successfully.

@castrojo
Copy link
Member

castrojo commented Dec 4, 2024

Can someone retest with the latest aurora:latest please?

@renner0e
Copy link

renner0e commented Dec 4, 2024

Can someone retest with the latest aurora:latest please?

still borked as of 2024-12-04

● ostree-unverified-registry:ghcr.io/ublue-os/aurora:latest
Digest: sha256:1d1a2fd4debe8107431f01b8c463ac74a7e97fbed63a8ee7f434e8eafe516876
Version: latest-41.20241204 (2024-12-04T04:56:12Z)

ostree-unverified-registry:quay.io/fedora/fedora-kinoite:latest
Digest: sha256:e34d75931bb6d24d3d227905ae3342f0d6d99cb56b87c393ffc9a33b86e6b2cb
Version: 41.20241203.0 (2024-12-03T02:13:46Z)

fedora:fedora/41/x86_64/kinoite
Version: 41.20241203.0 (2024-12-03T02:12:56Z)
Commit: 598699548e36a8e624d1d650060c0a5015cddd71ed06b70670132e85402251e1
GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1
Pinned: yes

There is a minor glitch: the menu icon is not shown, but clicking on it shows the menu normally. Otherwise the system seemed to work properly. Also the pinned icon on the taskbar for ptyxis/GNOME Terminal is not working, but it wasn't before rebasing, so it's not related to rawhide (that's the notification shown in the screen capture). I'll have to delve into that later.

Kickoff application launcher looks for an icon which is saved somewhere in /usr non-aurora images will not have this icon, thus the fallback icon/no icon is set. If you switch back to aurora everything will be fine afterwards.

@mejiasjrg
Copy link
Author

Can someone retest with the latest aurora:latest please?

latest-41.20241204 still doesn't work on my machine. Same behavior as described before.

@renner0e
Copy link

renner0e commented Dec 8, 2024

I built my own kinoite-main:41 image and removed the akmods and main-kernel stuff and it works fine after that.

This was committed on Nov 12th (same date as last working image):
ublue-os/akmods#271

My changes are below and then:
just build kinoite 41

diff --git a/Containerfile b/Containerfile
index 681dd3f..87266a4 100644
--- a/Containerfile
+++ b/Containerfile
@@ -3,12 +3,9 @@ ARG SOURCE_IMAGE="${SOURCE_IMAGE:-silverblue}"
 ARG SOURCE_ORG="${SOURCE_ORG:-fedora-ostree-desktops}"
 ARG BASE_IMAGE="quay.io/${SOURCE_ORG}/${SOURCE_IMAGE}"
 ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION:-40}"
-ARG KERNEL_VERSION="${KERNEL_VERSION:-6.9.7-200.fc40.x86_64}"
 ARG IMAGE_REGISTRY=ghcr.io/ublue-os
 
 FROM ${IMAGE_REGISTRY}/config:latest AS config
-FROM ${IMAGE_REGISTRY}/akmods:main-${FEDORA_MAJOR_VERSION} AS akmods
-FROM ${IMAGE_REGISTRY}/main-kernel:${KERNEL_VERSION} AS kernel
 
 FROM scratch AS ctx
 COPY / /
@@ -17,15 +14,12 @@ FROM ${BASE_IMAGE}:${FEDORA_MAJOR_VERSION}
 
 ARG IMAGE_NAME="${IMAGE_NAME:-silverblue}"
 ARG FEDORA_MAJOR_VERSION="${FEDORA_MAJOR_VERSION:-40}"
-ARG KERNEL_VERSION="${KERNEL_VERSION:-6.9.7-200.fc40.x86_64}"
 
 COPY sys_files/usr /usr
 
 RUN --mount=type=cache,dst=/var/cache/rpm-ostree \
     --mount=type=bind,from=ctx,src=/,dst=/ctx \
     --mount=type=bind,from=config,src=/rpms,dst=/tmp/rpms \
-    --mount=type=bind,from=akmods,src=/rpms/ublue-os,dst=/tmp/akmods-rpms \
-    --mount=type=bind,from=kernel,src=/tmp/rpms,dst=/tmp/kernel-rpms \
     rm -f /usr/bin/chsh && \
     rm -f /usr/bin/lchsh && \
     mkdir -p /var/lib/alternatives && \
diff --git a/install.sh b/install.sh
index 28dfbab..813e4d4 100755
--- a/install.sh
+++ b/install.sh
@@ -11,26 +11,8 @@ curl -Lo /etc/yum.repos.d/_copr_kylegospo_oversteer.repo https://copr.fedorainfr
 
 rpm-ostree install \
     /tmp/rpms/*.rpm \
-    /tmp/akmods-rpms/*.rpm \
     fedora-repos-archive
 
-# Handle Kernel Skew with override replace
-rpm-ostree cliwrap install-to-root /
-if [[ "${KERNEL_VERSION}" == "${QUALIFIED_KERNEL}" ]]; then
-    echo "Installing signed kernel from kernel-cache."
-    cd /tmp
-    rpm2cpio /tmp/kernel-rpms/kernel-core-*.rpm | cpio -idmv
-    cp ./lib/modules/*/vmlinuz /usr/lib/modules/*/vmlinuz
-    cd /
-else
-    echo "Install kernel version ${KERNEL_VERSION} from kernel-cache."
-    rpm-ostree override replace \
-        --experimental \
-        --install=zstd \
-        /tmp/kernel-rpms/kernel-[0-9]*.rpm \
-        /tmp/kernel-rpms/kernel-core-*.rpm \
-        /tmp/kernel-rpms/kernel-modules-*.rpm
-fi
 
 # use negativo17 for 3rd party packages with higher priority than default
 curl -Lo /etc/yum.repos.d/negativo17-fedora-multimedia.repo https://negativo17.org/repos/fedora-multimedia.repo
@@ -81,7 +63,3 @@ fi
 CSFG=/usr/lib/systemd/system-generators/coreos-sulogin-force-generator
 curl -sSLo ${CSFG} https://raw.githubusercontent.com/coreos/fedora-coreos-config/refs/heads/stable/overlay.d/05core/usr/lib/systemd/system-generators/coreos-sulogin-force-generator
 chmod +x ${CSFG}
-
-if [[ "${KERNEL_VERSION}" == "${QUALIFIED_KERNEL}" ]]; then
-    /ctx/initramfs.sh
-fi
diff --git a/post-install.sh b/post-install.sh
index 7adc274..c591ba7 100755
--- a/post-install.sh
+++ b/post-install.sh
@@ -6,14 +6,7 @@ if [[ "$IMAGE_NAME" == "base" ]]; then
     systemctl enable getty@tty1
 fi
 
-systemctl enable rpm-ostreed-automatic.timer
-systemctl enable flatpak-system-update.timer
-
-systemctl --global enable flatpak-user-update.timer
-
-cp /usr/share/ublue-os/update-services/etc/rpm-ostreed.conf /etc/rpm-ostreed.conf
-
-ln -s "/usr/share/fonts/google-noto-sans-cjk-fonts" "/usr/share/fonts/noto-cjk" 
+ln -s "/usr/share/fonts/google-noto-sans-cjk-fonts" "/usr/share/fonts/noto-cjk"
 
 rm -f /etc/yum.repos.d/_copr_ublue-os_staging.repo
 rm -f /etc/yum.repos.d/_copr_kylegospo_oversteer.repo

@mejiasjrg
Copy link
Author

I've been fiddling with aurora:stable, but as I've had not much time to do it, this spans from Dec-12 to this day. Anyway, here is my report.

The system currently running:
❯ rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/aurora:stable
Digest: sha256:a5abc38a6c2f83546f8c4df05de254a43a8410fd6d2fff8fcf3d130abcc35df6
Version: 41.20241208.4 (2024-12-08T16:49:36Z)
LayeredPackages: gdm lightdm lightdm-greeter lightdm-settings plasma-workspace-x11

ostree-image-signed:docker://ghcr.io/ublue-os/aurora:stable
Digest: sha256:a5abc38a6c2f83546f8c4df05de254a43a8410fd6d2fff8fcf3d130abcc35df6
Version: 41.20241208.4 (2024-12-08T16:49:36Z)
LayeredPackages: gdm lightdm lightdm-greeter lightdm-settings plasma-workspace-x11

Starting from aurora:40 I installed rpm-ostree install gdm lightdm lightdm-greeter lightdm-settings plasma-workspace-x11, disabled sddm systemctl disable sddm, and rebooted to aurora:stable 41.20241208.4, logged in as my user and issued systemctl start lightdm.service, and lightdm's greeter showed up and worked (Later I did systemctl enable lightdm.service to boot directly to lightdm).
From lightdm I logged in to a Plasma-wayland session, and plasmashell crashed as described before by @renner0e when issuing startplasma-wayland at the commandline.
Then, after rebooting, tried a Plasma-x11 session, and it crashed as well.
Rebooted again and I logged in to a Gnome-wayland session, and the desktop loaded, barebones but functional. (I'm in fact posting this from the Gnome session)
I also tried using gdm instead of lightdm, repeating the previous sequence, with similar results.
On the Gnome session I tried running different apps, and found that GTK apps run well, and so do Flatpak apps (some of them anyway. More on that later). Also apps in a Debian distrobox run well. QT apps, on the other hand, tend to crash, though some of them load. Here are some examples, starting them from the commandline:
❯ kate

~ took 2s
Kate doesn't show its window, but it shows in the activities view.
❯ ark
KCrash: Application 'ark' crashing... crashRecursionCounter = 2
Instrucción ilegal ('core' generado)

~ took 13s
Ark plainly crashes.
❯ dolphin
kf.config.core: Watching absolute paths is not supported "/usr/share/color-schemes
BreezeLight.colors"
KCrash: Application 'dolphin' crashing... crashRecursionCounter = 2
Instrucción ilegal ('core' generado)

~ took 15s
Dolphin crashed the first day, but lately it shows its window but keeps sending the KCrash: Application 'dolphin' crashing... crashRecursionCounter = 2 message to the terminal, and after a while the system becomes completely non-responsive.
❯ konsole
kf.config.core: Watching absolute paths is not supported "/usr/share/color-schemes
BreezeLight.colors"

~ took 1m46s
Konsole used to show its window, and seemed to work fine, but lately is hanging the system similarly to Dolphin.
Okular, which is a flatpak, shows content correctly, and all seems to work, except the open and save dialogs, which I suppose are loaded from the base Aurora system.

So, summarizing:

  • sddm is crashing silently
  • plasma-wayland and plasma-x11 both crash
  • GTK apps run well (on the Gnome session)
  • QT apps tend to crash or to hang the system.

Based on that, I would say that the issue is affecting some QT library/ies and that's what brings the system down.
I wonder if that "Instrucción ilegal ('core' generado)" message means that we can get some additional info on the issue. If so, please point me out to how to get it so I can post it here.

@renner0e
Copy link

renner0e commented Dec 17, 2024

I think I have finally made some progress into this matter:

the package kf6-kimageformats should be responsible for the breakage on this specific hardware.

If you layer this package on vanilla fedora, everything still works.
If you remove it from kinoite-main and aurora, it works again.

I don't know why it not in the output of rpm-ostree db diff.

This affects aurora/bazzite because it is added in kinoite-main by ublue.
See https://github.com/ublue-os/main/blob/main/packages.json#L129

It is not present in upstream fedora, thus vanilla fedora works.

Workaround till a fix (probably upstream) is found.

sudo rpm-ostree override remove kf6-kimageformats and rebooted
on an up-to-date aurora:latest and it finally works.

● ostree-unverified-registry:ghcr.io/ublue-os/aurora:latest
Digest: sha256:00dcd0b8d0b00fa785b87395e4036bd1330f5ec2e4b955fe257b6f0b6bc6fea1
Version: latest-41.20241216 (2024-12-16T09:28:59Z)
RemovedBasePackages: kf6-kimageformats 6.8.0-1.fc41

More Info

Name        : kf6-kimageformats
Version     : 6.8.0
Release     : 1.fc41
Architecture: x86_64
Install Date: Mi 11 Dez 2024 05:15:15 CET
Group       : Unspecified
Size        : 1210693
License     : LGPLv2+
Signature   : RSA/SHA256, Do 07 Nov 2024 17:06:12 CET, Key ID d0622462e99d6ad1
Source RPM  : kf6-kimageformats-6.8.0-1.fc41.src.rpm
Build Date  : Do 07 Nov 2024 14:20:20 CET
Build Host  : buildvm-x86-11.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://invent.kde.org/frameworks/kimageformats
Bug URL     : https://bugz.fedoraproject.org/kf6-kimageformats
Summary     : KDE Frameworks 6 Tier 1 addon with additional image plugins for QtGui
Description :
This framework provides additional image format plugins for QtGui.  As
such it is not required for the compilation of any other software, but
may be a runtime requirement for Qt-based software to support certain
image formats.

It is getting late for me.
I will link an upstream bug report in the coming days and report it on either fedora's or kde's bugzilla if it doesn't exist yet and further look into the issue.

TODO:

https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports

figure out which debug packages have to be layered to create useful info to kde devs

enable fedora-debuginfo repo in /etc/yum.repos.d/fedora.repo

@renner0e
Copy link

libheif and heif-pixbuf-loader from negativo seem to cause the issue.

test from a "broken" image (with kf6-kimageformats installed)

Workaround

sudo rpm-ostree override replace --experimental --from repo="fedora" libheif heif-pixbuf-loader

The packages from regular fedora repos work

● ostree-unverified-registry:ghcr.io/ublue-os/aurora:latest
Digest: sha256:de7067e0df0dfcc840ceb185cbc8fcc8215396bdec83b4cc1fd18d001c1bc9c1
Version: latest-41.20241218.1 (2024-12-18T07:55:07Z)
RemoteOverrides: repo=fedora
├─ heif-pixbuf-loader 1:1.18.2-5.fc41 -> 1.17.6-2.fc41
└─ libheif 1:1.18.2-5.fc41 -> 1.17.6-2.fc41

Downgraded:
heif-pixbuf-loader 1:1.18.2-5.fc41 -> 1.17.6-2.fc41
libheif 1:1.18.2-5.fc41 -> 1.17.6-2.fc41

vanilla fedora ships 1.17.6

Name        : libheif
Version     : 1.17.6
Release     : 2.fc41
Architecture: x86_64
Install Date: Mi 18 Dez 2024 17:26:51 CET
Group       : Unspecified
Size        : 966069
License     : LGPL-3.0-or-later and MIT
Signature   : RSA/SHA256, Fr 19 Jul 2024 06:59:09 CEST, Key ID d0622462e99d6ad1
Source RPM  : libheif-1.17.6-2.fc41.src.rpm
Build Date  : Fr 19 Jul 2024 01:35:35 CEST
Build Host  : buildvm-x86-27.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://github.com/strukturag/libheif
Bug URL     : https://bugz.fedoraproject.org/libheif
Summary     : HEIF and AVIF file format decoder and encoder
Description :
libheif is an ISO/IEC 23008-12:2017 HEIF and AVIF (AV1 Image File Format)
file format decoder and encoder.

and negativo ships 1.18.2

Name        : libheif
Epoch       : 1
Version     : 1.18.2
Release     : 5.fc41
Architecture: x86_64
Install Date: Mi 11 Dez 2024 05:14:59 CET
Group       : Unspecified
Size        : 1614755
License     : LGPLv3+ and MIT
Signature   : RSA/SHA256, Do 05 Dez 2024 00:49:44 CET, Key ID 97f3008993e8909b
Source RPM  : libheif-1.18.2-5.fc41.src.rpm
Build Date  : Mi 04 Dez 2024 23:42:23 CET
Build Host  : wks2.lab.negativo17.org
URL         : https://github.com/strukturag/libheif
Summary     : ISO/IEC 23008-12:2017 HEIF and AVIF file format decoder and encoder
Description :
libheif is an ISO/IEC 23008-12:2017 HEIF and AVIF (AV1 Image File Format) file
format decoder and encoder.

HEIF and AVIF are new image file formats employing HEVC (h.265) or AV1 image
coding, respectively, for the best compression ratios currently possible.

This is likely the commit that caused it:
negativo17/libheif@4a4cbd9
Also coincides with the date as this was committed on Nov 12th, after that date the images broke.

%changelog
* Tue Nov 12 2024 Simone Caronni <[email protected]> - 1:1.18.2-4
- Re-enable OpenJPH.

The package was actually in the diff @mejiasjrg sent earlier from Nov 12th ->13th. I don't know how I missed that. It is right at the top.

Possibilities:

  • the library is broken (it should be broken on GNOME as well if this is the case, but nobody happened to notice as everything is flatpak anway)
  • some system QT component acting weirdly with that additional codec support enabled for whatever reason
  • regression in 1.18

Will test on either Arch Linux or KDE Linux to confirm the libheif issue.

On kde-linux (not kde-neon, the new thing) they ship the new 1.19 version and the live environment works.
https://gitlab.archlinux.org/archlinux/packaging/packages/libheif/-/blob/main/PKGBUILD?ref_type=heads

@mejiasjrg
Copy link
Author

I think I have finally made some progress into this matter:

the package kf6-kimageformats should be responsible for the breakage on this specific hardware.

If you layer this package on vanilla fedora, everything still works. If you remove it from kinoite-main and aurora, it works again.

I don't know why it not in the output of rpm-ostree db diff.

This affects aurora/bazzite because it is added in kinoite-main by ublue. See https://github.com/ublue-os/main/blob/main/packages.json#L129

It is not present in upstream fedora, thus vanilla fedora works.

Workaround till a fix (probably upstream) is found.

sudo rpm-ostree override remove kf6-kimageformats and rebooted on an up-to-date aurora:latest and it finally works.

Very well done @renner0e !!!
My machine is up and running plasma on aurora:stable after applying your fix (sudo rpm-ostree override remove kf6-kimageformats):
❯ rpm-ostree status
State: idle
AutomaticUpdates: stage; rpm-ostreed-automatic.timer: no runs since boot
Deployments:
● ostree-image-signed:docker://ghcr.io/ublue-os/aurora:stable
Digest: sha256:c71db2eea4e9c5f6715fda2560f15a153f4b7f86d15c2fa490d8b538bad189d9
Version: 41.20241216.1 (2024-12-16T13:41:20Z)
RemovedBasePackages: kf6-kimageformats 6.8.0-1.fc41
LayeredPackages: gdm lightdm lightdm-greeter lightdm-settings plasma-workspace-x11

ostree-image-signed:docker://ghcr.io/ublue-os/aurora:stable
Digest: sha256:c71db2eea4e9c5f6715fda2560f15a153f4b7f86d15c2fa490d8b538bad189d9
Version: 41.20241216.1 (2024-12-16T13:41:20Z)
LayeredPackages: gdm lightdm lightdm-greeter lightdm-settings plasma-workspace-x11

Of course, I still have to clean up the mess I made (layered packages), but that will be tomorrow.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants