Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux 4.9: nvme-pci: No irq handler for vector #27

Open
akaher opened this issue Dec 21, 2018 · 6 comments
Open

Linux 4.9: nvme-pci: No irq handler for vector #27

akaher opened this issue Dec 21, 2018 · 6 comments

Comments

@akaher
Copy link

akaher commented Dec 21, 2018

BackPorted required changes of pci-hyperv on Linux 4.9, this driver works.

But getting following issues with nvme:

[74530.686555] do_IRQ: 10.232 No irq handler for vector
[74530.712068] do_IRQ: 10.232 No irq handler for vector
[74530.737579] do_IRQ: 10.232 No irq handler for vector
[74530.763092] do_IRQ: 10.232 No irq handler for vector
[74532.832221] nvme nvme1: I/O 206 QID 6 timeout, reset controller
[74532.873967] nvme nvme1: completing aborted command with status: fffffffc
[74532.873971] blk_update_request: I/O error, dev nvme1n1, sector 1048320

Back-ported the following patch, but still facing same issue:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/drivers/nvme/host/pci.c?id=0ff199cb48b4af6f29a1bf15d92d93f44a22eeb4

@akaher
Copy link
Author

akaher commented Jan 18, 2019

Found the reason why I am getting "No irq handler for vector":
nvme/host/pci.c allocates the irq vectors which assigned to be executed to X CPUs, now this info should be passed to Hypervisor and same should be copied to VECTOR_IRQ (this is percpu vector). However this assignment of CPUs is different as compare with the info passed to Hypervisor and on VECTOR_IRQ. Now once INT received by VM, because of entry is mismatching because of CPU number, INT is dropped.

Please help me to find out why this is happening incase of v4.9 and works fine with v4.14.

@dcui
Copy link
Contributor

dcui commented Jan 18, 2019

Can the 2 top patches on this branch help? https://github.com/dcui/linux/commits/decui/SLES12-SP3-AZURE-2018-1029.

I don't know what goes wrong here, as I don't have a NVMe to test. I hope the 2 patches could help, but I'm completely not sure.

@akaher
Copy link
Author

akaher commented Jan 22, 2019

Thanks Dexuan, after applying the following patches NVME IRQs, scheduled on CPU0 and CPU8 (total cpus in vm is 15) and no more "No irq handler for vector" in dmesg :
dcui/linux@cd09cb7
dcui/linux@eba61d2

Looking further, how to schedule on alternative CPU.

@dcui
Copy link
Contributor

dcui commented Jan 22, 2019

Glad to know the 2 patches can help!

Now it looks to me that the pci-hyperv driver is good, and you might need to improve the NVMe driver in v4.9 to spread interrupts to more CPUs if necessary (I assume the NVMe driver in the latest mainline kernel should do a better job on this).

@akaher
Copy link
Author

akaher commented Jan 24, 2019

Dexuan, any specific reason for not upstreaming following patch to stable mainline kernels:
dcui/linux@eba61d2

@dcui
Copy link
Contributor

dcui commented Jan 24, 2019

The patch was made for v4.12.14, which has reached End-of-Life: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/log/?h=linux-4.12.y

irq_data_get_effective_affinity_mask() is in v4.14+, so it's not needed there.

The other long-term stable kernels (4.9, 4.4, 3.x) seem a little old and it looks they're not widely used any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants