Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FW16 BIOS 03.05 beta - Linux fails to boot with kernel parameter "efi=disable_early_pci_dma" set or kernel config CONFIG_EFI_DISABLE_PCI_DMA enabled #8

Open
2 of 8 tasks
sinatosk opened this issue Nov 21, 2024 · 12 comments
Labels
3.05 Laptop 16 AMD Ryzen 7040 Framework Laptop 16 (AMD Ryzen™ 7040 Series)

Comments

@sinatosk
Copy link

sinatosk commented Nov 21, 2024

Device Information

System Model or SKU

Please select one of the following

  • Framework Laptop 13 (11th Gen Intel® Core™)
  • Framework Laptop 13 (12th Gen Intel® Core™)
  • Framework Laptop 13 (13th Gen Intel® Core™)
  • Framework Laptop 13 (AMD Ryzen™ 7040 Series)
  • Framework Laptop 13 (Intel® Core™ Ultra Series 1)
  • Framework Laptop 16 (AMD Ryzen™ 7040 Series)

BIOS VERSION

03.05

DIY Edition information

If you are experiencing an issue on a DIY system, Please also fill out the memory and storage devices you are using.

Memory: Framework - 32 (2 x 16GB) DDR5-5600
Storage: Western Digital SN850X 2TB (2280), Western Digital SN770M 2TB (2230)

Standalone Operation

Are you running your mainboard as a standalone device. Is standalone mode enabled in the BIOS?

  • Yes
  • No

Describe the bug

With the Linux kernel paramater "efi=disable_early_pci_dma" ( or you don't but have the kernel config CONFIG_EFI_DISABLE_PCI_DMA ) enabled ), set, after the system has posted and starts to boot the OS, it's just sitting at there idle ( several minutes ) at the Framework logo until I press the power button ( not holding it down but just tap it ) and it turns off

Steps To Reproduce

Steps to reproduce the behaviour:

  1. Power on and wait for the Framework logo
  2. I wait several minutes thinking it might boot
  3. Tap power button ( after waiting ) and it turns off

Expected behaviour

  1. Power on and wait for the Framework logo
  2. Framework logo changes to showing the Framework logo plus plymouth animation starts to play
  3. I'm booted into Linux tty ready for me to enter my credentials.

Solution

To rid of this issue

  • set kernel parameter "efi=no_disable_early_pci_dma" if you have CONFIG_EFI_DISABLE_PCI_DMA enabled
  • or unset kernel parameter "efi=disable_early_pci_dma" if CONFIG_EFI_DISABLE_PCI_DMA is disabled

Operating System:

  • OS/Distribution: Gentoo ( primarily ), Arch Linux
  • Version: Gentoo ( Linux 6.12.0 ), Arch Linux ( 6.10.6, I've not updated this since I switched over to Gentoo )
  • Linux Kernel Version ( From Gentoo only ): Linux hostname 6.12.0-mainline-byw-gcc-znver4 #1 SMP PREEMPT_RT Wed Nov 20 06:58:15 GMT 2024 x86_64 AMD Ryzen 7 7840HS w/ Radeon 780M Graphics AuthenticAMD GNU/Linux

Additional context

I've only tried Linux 6.10.6 ( Arch Linux ) and 6.12.0 ( Gentoo )

About "efi=disable_early_pci_dma" / CONFIG_EFI_DISABLE_PCI_DMA

Disable the busmaster bit in the control register on all PCI bridges
while calling ExitBootServices() and passing control to the runtime
kernel. System firmware may configure the IOMMU to prevent malicious
PCI devices from being able to attack the OS via DMA. However, since
firmware can't guarantee that the OS is IOMMU-aware, it will tear
down IOMMU configuration when ExitBootServices() is called. This
leaves a window between where a hostile device could still cause
damage before Linux configures the IOMMU again.

If you say Y here, the EFI stub will clear the busmaster bit on all
PCI bridges before ExitBootServices() is called. This will prevent
any malicious PCI devices from being able to perform DMA until the
kernel reenables busmastering after configuring the IOMMU.

This option will cause failures with some poorly behaved hardware
and should not be enabled without testing. The kernel commandline
options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma"
may be used to override this option.

and more info about this from Matthew Garrett

I've been using "efi=disable_early_pci_dma" ( not "efi=no_disable_early_pci_dma" ) with bioses 03.02, 03.03 and 03.04 with no issues

@sinatosk sinatosk changed the title BIOS 03.05 beta - Linux kernel ( 6.12 & 6.10 ) fails to boot with kernel parameter "efi=disable_early_pci_dma" set FW16 BIOS 03.05 beta - Linux fails to boot with kernel parameter "efi=disable_early_pci_dma" set or kernel config CONFIG_EFI_DISABLE_PCI_DMA enabled Nov 21, 2024
@JohnAZoidberg
Copy link
Member

@superm1 might know whether it is supposed to work on the processor or not.

@DHowett DHowett added Laptop 16 AMD Ryzen 7040 Framework Laptop 16 (AMD Ryzen™ 7040 Series) 3.05 labels Nov 22, 2024
@superm1
Copy link

superm1 commented Nov 22, 2024

@superm1 might know whether it is supposed to work on the processor or not.

This isn't something actively tested, but this ping prompted me to double check on AMD reference hardware I have on hand.

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.12.0-rc1-00092-geb5b9b74c873 root=UUID=70400493-7802-40ad-9410-a356c92076e5 ro quiet splash vt.handoff=7 efi=disable_early_pci_dma
$ sudo dmidecode |grep AGESA
        String: AGESA!V9 PhoenixPI-FP8-FP7 1.1.0.3

I didn't have any problem booting reference hardware like this.

@sinatosk
Copy link
Author

sinatosk commented Nov 22, 2024

@superm1 I'm using 6.12.0, not any of the rc's ( I don't think it woukld make much difference ) and are you booting the kernel as an efistub? as the kernel itself has to be compiled as an efistub

@superm1
Copy link

superm1 commented Nov 22, 2024

@superm1 I'm using 6.12.0, not any of the rc's ( I don't think it woukld make much difference ) and are you booting the kernel as an efistub? as the kernel itself has to be compiled as an efistub

Its a random dev kernel I have not a proper release, but I've changed nothing in this area.

I'm booting it specifically through grub as a boot entry which should still be using efi stub.

@sinatosk
Copy link
Author

sinatosk commented Nov 22, 2024

@superm1 I'm using 6.12.0, not any of the rc's ( I don't think it woukld make much difference ) and are you booting the kernel as an efistub? as the kernel itself has to be compiled as an efistub

Its a random dev kernel I have not a proper release, but I've changed nothing in this area.

I'm booting it specifically through grub as a boot entry which should still be using efi stub.

I'm booting it using UEFI itself or via systemd-boot and it's in ( UKI ) Unified Kernel Image form which is made by systemd ukify and dracut making the initramfs

is the vmlinuz-6.12.0-rc1-00092-geb5b9b74c873 it self a efi file?

@superm1
Copy link

superm1 commented Nov 22, 2024

It's not a UKI, but yes the kernel is an EFI binary.

@sinatosk
Copy link
Author

sinatosk commented Nov 23, 2024

@superm1 would it be possible for you to give me a copy of the kernel config your using for vmlinuz-6.12.0-rc1-00092-geb5b9b74c873 or a diff or yours and mine?

this is my kernel config (stripped gentoo specifics CONFIG_GENTOO*)

and the failing to boot kernel parameters I'm using are

page_poison=0 rootwait rootfstype=btrfs 
rootflags=rw,relatime,ssd,discard=async,space_cache=v2,commit=6,flushoncommit,subvolid=261 
root=PARTUUID=8f4a1efa-342c-4ff5-9020-a8a1081f6a88 
rw resume=PARTUUID=eb2fea74-5140-4923-b4a5-0c276c09dc86 hibernate=nocompress 
init_on_alloc=1 init_on_free=1 delayacct iommu.strict=1 amd_pstate=active 
pcie_aspm.policy=powersupersave snd_hda_intel.power_save=1 amdgpu.abmlevel=2 
amdgpu.gttsize=16384 module.sig_enforce=1 amdgpu.gpu_recovery=1 audit=1 
selinux=0 efi=disable_early_pci_dma efi=runtime add_efi_memmap

and I've changed/removed the following

  • set init_on_alloc=0 and init_on_free=0
  • set pcie_aspm.policy=default
  • removed rootwait, efi=runtime and add_efi_memmap

and recompiled the kernel with PREEMPT_RT disabled, load the bios defaults and still fails to boot

I've set efi=runtime as it's disabled by default when PREEMPT_RT is enabled

@superm1
Copy link

superm1 commented Nov 23, 2024

Sure; this is the kernel config I used on the reference platform (Don't mind that it says 6.11.0-rc6, it's the config I use when I'm building kernels 6.11 and 6.12 and I just apply the defaults for things that don't match).
config-6.12.txt

FYI I know at last some of those things you're doing on your kernel command line are default behavior that you would have had to make a modification to the kernel to change.

  • amdgpu.gpu_recovery=1
  • amd_pstate=active

This may cause issues that are difficult to debug. I even flag it in amd_s2idle.py: https://gitlab.freedesktop.org/drm/amd/-/commit/e28fe714eed5a8bf17091c45883df8ac19aabe25

  • pcie_aspm.policy=powersupersave

@sinatosk
Copy link
Author

sinatosk commented Nov 23, 2024

Thanks for the config @superm1

Well, I'm stumped, since @superm1 last post I've tried ( not including some of what I've already mentioned )

  • "load optimal defaults" in the bios
  • disabled secure boot ( I've never used the tpm since the day I got my fw16 )
  • re-compiled the kernel with gcc using znver4 and generic
  • removed all non "usb c only" framework modules
  • tried with and without ac power adapter ( framework 180w )
  • tried booting it with "uefi -> efistub", "uefi -> systemd-boot -> efistub" ( all efistubs are UKI's by systemd ukify )
  • removed cruft kernel parameters amdgpu.gpu_recovery, amd_pstate and pcie_aspm.policy

There was one instance where it booted ( this was without the framework modules ) and it took a long time to get there

In one other instance where it didn't boot, while I was waiting, I was leaning to the side ( head resting on my hand ), and somewhere between 60-90 seconds in, I noticed a red/green/blue light flashing on one side ( ac adapter was plugged in the otherside ). I'm thinking "don't that usually show when something is not connected like my touchpad or keyboard?", I checked to make sure everything is reliably connected/plugged in.

I'm wondering if something has been changed in such a way where things are done at wrong timings and/or out of sync in the bios. This only happened once in the 5 or so tries I did since @superm1 last post

Between when in that one instance it did boot and all the others failed, I didn't change anything other than just reboot

While @superm1 has shown it's booting on his reference system, I'm convinced that something has changed in bios 03.05. This was always working for me before ( 03.02, 03.03 and 03.04 ). Be good if some more people can test this out on their systems ( I'm hoping it's not a hardware issue with me )

@superm1
Copy link

superm1 commented Nov 23, 2024

Please keep in mind I'm using an AMD reference system not a framework 16 for my above comments. There very well can be a problem in the BIOS on framework 16, I was mostly responding to the question of whether the processor can handle it by @JohnAZoidberg.

@superm1
Copy link

superm1 commented Nov 23, 2024

Do you have a dGPU? Maybe it is only reproducing with that plugged in?

@sinatosk
Copy link
Author

sinatosk commented Nov 23, 2024

oh no, iGPU only, I usually have 5 usb c and one audio module plugged in

port 1 - usb c -> ac adapater
port 2 - usb c -> external monitor ( only when I need to use the display )
port 3 - audio expansion
port 4 - usb c -> external storage via cable
port 5 - usb c mostly but when trouble shooting, I switch this to microsd module
port 6 - usb c, I barely use it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.05 Laptop 16 AMD Ryzen 7040 Framework Laptop 16 (AMD Ryzen™ 7040 Series)
Projects
None yet
Development

No branches or pull requests

4 participants