Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load amdgpu on linux kernel 5.7.14+ #159

Open
cristianmiranda opened this issue Aug 15, 2020 · 14 comments
Open

Can't load amdgpu on linux kernel 5.7.14+ #159

cristianmiranda opened this issue Aug 15, 2020 · 14 comments

Comments

@cristianmiranda
Copy link

I'm on a MacBook Pro 13,3 and currently running 5.0.0-32-generic wich allows me to load amdgpu and turn it off doing the following:

gpu-manager | grep 'amdgpu loaded? no' && sudo modprobe amdgpu || echo 'AMD GPU already loaded'
echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

I decided to give a more recent kernel a try, so I tested 5.8 and 5.7.14. When I run sudo modprobe amdgpu the macbook freezes (I can't do anything but turn it off and on again).
I understand that vgaswitcheroo is not present anymore on kernels version 5.4+. Is that right?.

After installing the kernel I got the following:

DKMS: install completed.
   ...done.
Setting up linux-modules-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
Setting up linux-image-unsigned-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
I: /vmlinuz.old is now a symlink to boot/vmlinuz-5.0.0-32-generic
I: /vmlinuz is now a symlink to boot/vmlinuz-5.8.0-050800-generic
I: /initrd.img is now a symlink to boot/initrd.img-5.8.0-050800-generic
Processing triggers for linux-image-unsigned-5.8.0-050800-generic (5.8.0-050800.202008022230) ...
/etc/kernel/postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.8.0-050800-generic
   ...done.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.8.0-050800-generic
W: Possible missing firmware /lib/firmware/amdgpu/navi12_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven2_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/picasso_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega20_ta.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/raven_kicker_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec2_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_mec_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_me_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_pfp_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_ce_wks.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_mes.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_vcn.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi14_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi10_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/arcturus_smc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/navi12_dmcu.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/renoir_dmcub.bin for module amdgpu
W: Possible missing firmware /lib/firmware/i915/skl_huc_2.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/skl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_huc_2.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/glk_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/kbl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/cml_huc_4.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/cml_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_huc_9.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_guc_33.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_huc_9.0.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/ehl_guc_33.0.4.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_huc_7.0.12.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_guc_35.2.0.bin for module i915
W: Possible missing firmware /lib/firmware/i915/icl_dmc_ver1_09.bin for module i915
W: Possible missing firmware /lib/firmware/i915/tgl_dmc_ver2_06.bin for module i915
I: The initramfs will attempt to resume from /dev/nvme0n1p3
I: (UUID=d6cab994-b144-43a3-a412-100ce59aa599)
I: Set the RESUME variable to override this.

I'm available to test stuff if you have any ideas. I also don't mind staying on 5.0.0-32-generic (and even more if on newer kernels I don't have the chance to load/turn off the dedicated GPU).

Thanks!

@andyholst
Copy link
Contributor

I have never tried the intel graphics. I am running the default EFI MBP133 firmware, I have to test out the intel graphics first by turning off the amdgpu.

@andyholst
Copy link
Contributor

@cristianmiranda have you tried to apply the patch https://marc.info/?l=grub-devel&m=141586614924917&q=p3 for eboot.c file? I believe patching the kernel to make it recognize the intel graphics card for MBP133 is better way to go then hacking the grub boot loader.

However, since kernel version 5.7+ the eboot.h and eboot.c has been moved from arch/x86/boot/compressed to drivers/firmware/efi/libstub as efistub.h and x86-stub.c

Want to make an bug report at https://bugzilla.kernel.org/ regarding this issue or do you want to try out the kernel patch instead for kernel v5.7+ before bug report?

@andyholst
Copy link
Contributor

andyholst commented Aug 30, 2020

@cristianmiranda did you get any progress with amdgpu?

I am currently running Linux kernel version 5.8.3 and following command lspci -nnk | grep -i vga -A3 | grep 'in use' gives me Kernel driver in use: amdgpu for MBP 13,3.

@cristianmiranda
Copy link
Author

cristianmiranda commented Aug 30, 2020

Hi @andyholst, sorry, I didn't have time to reply before, and then I just forgot.

I couldn't make much progress on this. My main concern is not being able to turn off the AMD GPU since resource consumption is higher than just using the integrated GPU. In order to do that I need vgaswitcheroo and I don't see it (maybe this is related to #6 (comment)).

I'm currently on 5.0.0-32-generic but I have 5.7 installed as well for running tests if you want.

Thank you so much for your interest on this!

bwt, this is what I get on 5.0.0-32-generic:

❯ lspci -nnk | grep -i vga -A3 | grep 'in use'
	Kernel driver in use: i915
	Kernel driver in use: amdgpu

@andyholst
Copy link
Contributor

andyholst commented Aug 30, 2020

@cristianmiranda right, the reason why vgaswitcheroo is not showing up is because the kernel don't recognize 2 gpus which is contradiction to the result of your "lspci -nnk | grep -i vga -A3 | grep 'in use'" command.

I still think trying out the kernel patch should be better then doing the grub boot hack so the gpu detection is integrated instead during boot. They have done major refactoring to the kernel structure, so can be bit tricky to apply the patch, I think it is still worth trying out.

@cristianmiranda
Copy link
Author

@andyholst that makes a lot of sense. It's weird because I'm using rEFInd in order to spoof macOS when loading 5.0.0-32-generic, so I didn't have to patch that one. I'd expect to see the integrated GPU in any other kernel version loaded with refind.
Anyway, I'm going to do some research on how to patch the kernel (I have no idea where to start) and will let you know how it goes. Any suggestions I should consider before doing this?. Thanks!

@andyholst
Copy link
Contributor

@cristianmiranda patching the kernel is matter of testing by applying it in the old structure for 5.5+ (arch/x86/boot/compressed/eboot.c) by executing the command patch -p1 < ../patch-x.y.z at the root directory for the Linux repo v5.5 branch. The patch-x.y.z is the file you get from https://marc.info/?l=grub-devel&m=141586614924917&q=p3 , If you get it to work, then try it for 5.7+ with the new structure drivers/firmware/efi/libstub where the efistub.h and x86-stub.c files are located, and you should apply it to x86-stub.c instead. I would diff the patched the eboot.c file against x86-stub.c to see how the structure differs and apply it manually then create a new patch and report to the kernel bug report section if it has worked before.

@andyholst
Copy link
Contributor

@cristianmiranda I have checked out v5.5 tag release and I have applied the patch from https://marc.info/?l=grub-devel&m=141586614924917&q=p3 and I had to resolve some merge conflicts, you can see the commit at my linux-stable-fork patch branch https://github.com/andyholst/linux-stable-fork/tree/efi-Identify-as-OS-X-to-EFI-drivers-before-booting

Going to test it out during the week.

@andyholst
Copy link
Contributor

andyholst commented Sep 10, 2020

@cristianmiranda the patch for v5.5 works for me, I can access the /sys/kernel/debug/vgaswitcheroo/switch file and it gives me the following

0:DIS:+:Pwr:0000:01:00.0
1:IGD: :Pwr:0000:00:02.0

The command sudo modprobe amdgpuworks just fine without any freezes.

The command lspci | grep "VGA"gives me the list:

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev c0)

The command gpu-manager verifies that the intel card and amd card is loaded with following list:

last_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
new_boot_file: /var/lib/ubuntu-drivers-common/last_gfx_boot
can't access /run/u-d-c-nvidia-was-loaded file
can't access /opt/amdgpu-pro/bin/amdgpu-pro-px
Looking for nvidia modules in /lib/modules/5.5.0+/updates/dkms
Looking for amdgpu modules in /lib/modules/5.5.0+/updates/dkms
Is nvidia loaded? no
Was nvidia unloaded? no
Is nvidia blacklisted? no
Is intel loaded? yes
Is radeon loaded? no
Is radeon blacklisted? no
Is amdgpu loaded? yes
Is amdgpu blacklisted? no
Is amdgpu versioned? no
Is amdgpu pro stack? no
Is nouveau loaded? no
Is nouveau blacklisted? no
Is nvidia kernel module available? no
Is amdgpu kernel module available? no
Vendor/Device Id: 8086:191b
BusID "PCI:0@0:2:0"
Is boot vga? no
Vendor/Device Id: 1002:67ef
BusID "PCI:1@0:0:0"
Is boot vga? yes
Skipping "/dev/dri/card1", driven by "i915"
Found "/dev/dri/card0", driven by "amdgpu"
output 0:
card0-eDP-1
output 1:
card0-DP-1
Number of connected outputs for /dev/dri/card0: 2
Skipping "/dev/dri/card1", driven by "i915"
Skipping "/dev/dri/card0", driven by "amdgpu"
Skipping "/dev/dri/card1", driven by "i915"
Skipping "/dev/dri/card0", driven by "amdgpu"
Found "/dev/dri/card1", driven by "i915"
Number of connected outputs for /dev/dri/card1: 0
Does it require offloading? no
last cards number = 2
Has amd? yes
Has intel? yes
Has nvidia? no
How many cards? 2
Has the system changed? No
Unsupported discrete card vendor: 8086
Nothing to do

So if you study the new efi boot structure for v5.7+ you should be able to apply the patch for it as well or even make a serious patch for upstream if you are up to it.

@cristianmiranda
Copy link
Author

cristianmiranda commented Sep 10, 2020

@andyholst thank you so much for spending time on this. I'll give it a try. I'm going to close this issue as you already proved that patching the kernel makes this is possible.

@andyholst
Copy link
Contributor

andyholst commented Sep 10, 2020

@cristianmiranda actually, I wouldn't close it until it has been fully verified, since you most likely want it to be included in the upstream as long the patch is 'good enough' and to test it on the new efi boot structure for v5.7+, I have no idea if a kind of this patch has been merged to upstream before, but worth spending time to verify if it has been applied before. If it's a bug and therefore has worked before in earlier kernel versions, then you don't close the issue until the bug has been patched.

@Dunedan
Copy link
Owner

Dunedan commented Dec 8, 2020

Any updates regarding this issue? What's the status with recent kernel versions? Did it get fixed upstream?

@cristianmiranda
Copy link
Author

@Dunedan I believe @andyholst played around with a patched version of 5.5. I haven't spent time on this issue. Sorry.

@andyholst
Copy link
Contributor

andyholst commented Dec 13, 2020

@Dunedan I tried out the patch for kernel version 5.8, it can't be applied since they have refactored the design, I relate to an old email convrsation...

Hi gentlemen,
Thank you so much for your code contribution to the Linux kernel.
Is there any news on the apple gmux dual gpu support for newer MBP models (late 2016/2017) without having to deal with the OS X version/vendor efi boot hack the way you apply it to the old efi boot code structure (<= v5.5) andyholst/linux-stable-fork@90c1102 ?
Keep up the good work!
Regards
Andy Holst

The i915 developers said a while ago that they'll look into turning on the
GPU if EFI has disabled it. I suppose they haven't made progress but you
may want to prod them on their mailing list:
[email protected]
The above-linked patch wasn't upstreamed so far because on MacBook Airs,
it has the side effect that the keyboard/trackpad is switched to SPI if
it's accessible both via SPI and USB. We do have a driver in mainline
now for the SPI keyboard, but it may not work on the MBA yet:
cb22/macbook12-spi-driver#65
Thanks,
Lukas

I couldn't try the 'v5.5' OS X version/vendor efi boot hack kernel patch on the macbook air models that has issues with the SPI conflicts by applying this old patch andyholst/linux-stable-fork@90c1102

@Dunedan @cristianmiranda This issue is mainly related to i915 developers, dunno if it is worth mentioning that the OS X version/vendor efi boot kernel hack is not working for >= v5.7, dunno about the grub boot loader hack though, that is, be able to switch between the amd/intel graphics on MBP 13,3 model.

I will not look into this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants