Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG][MTL] S0ix long run failure - timed out for 0x47000000 then kernel oops on page fault #4608

Closed
kv2019i opened this issue Sep 26, 2023 · 1 comment
Assignees
Labels
IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform. P1 Blocker bugs or important features

Comments

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 26, 2023

Branched the kernel part of thesofproject/sof#7866 here:

We see this error reported in multiple scenario:
One was long run S0ix testing.

[  352.971033] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x47000000|0x0
[  352.979537] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  352.988991] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0xc7000000|0x0|0x0, target: 0x67000000|0x0|0x0, ctl: 0x3
[  353.001735] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  353.010979] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]------------
[  353.020423] sof-audio-pci-intel-mtl 0000:00:1f.3: IPC timeout
[  353.026868] sof-audio-pci-intel-mtl 0000:00:1f.3: fw_state: SOF_FW_BOOT_COMPLETE (7)
[  353.035552] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM status: 0xffffffff, ROM error: 0xffffffff
[  353.045287] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM debug status: 0x50000005, ROM debug error: 0x0
[  353.055497] sof-audio-pci-intel-mtl 0000:00:1f.3: ROM feature bit not enabled
[  353.063477] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump end ]------------
[  353.072721] sof-audio-pci-intel-mtl 0000:00:1f.3: ctx_save IPC error: -110, proceeding with suspend
[  689.764439] sof-audio-pci-intel-mtl 0000:00:1f.3: GAIN (UUID: 61BCA9A8-18D0-4A18-8E7B-2639219804B7): No CPC match in the firmware file's manifest (ibs/obs: 768/768)
[  692.922594] sof-audio-pci-intel-mtl 0000:00:1f.3: ipc timed out for 0x47000000|0x0
[  692.931113] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump start ]------------
[  692.940571] sof-audio-pci-intel-mtl 0000:00:1f.3: Host IPC initiator: 0xc7000000|0x0|0x0, target: 0x67000000|0x0|0x0, ctl: 0x3
[  692.953325] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ IPC dump end ]------------
[  692.962576] sof-audio-pci-intel-mtl 0000:00:1f.3: ------------[ DSP dump start ]---------

cc:

@kv2019i kv2019i added P1 Blocker bugs or important features IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform. labels Sep 26, 2023
ujfalusi added a commit to ujfalusi/sof-linux that referenced this issue Oct 5, 2023
…efore suspend

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject#4608
Signed-off-by: Peter Ujfalusi <[email protected]>
@mengdonglin
Copy link
Collaborator

PR #4617 is a fix for this issue.

plbossart pushed a commit that referenced this issue Oct 12, 2023
…efore suspend

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: #4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
akiyks pushed a commit to akiyks/linux that referenced this issue Oct 17, 2023
…efore suspend

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
yifanli-intel pushed a commit to yifanli-intel/deepin-kernel that referenced this issue Feb 26, 2024
…efore suspend

commit 576a0b7 ("ASoC: SOF: Intel: hda-dsp: Make sure that no irq handler is pending before suspend") upstream.

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

deepin-Intel-SIG: commit 576a0b7 ("ASoC: SOF: Intel: hda-dsp: Make sure that no irq handler is pending before suspend").

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
[ Yifan2 Li: amend commit log ]
Signed-off-by: Yifan2 Li <[email protected]>
paralin pushed a commit to skiffos/linux that referenced this issue Apr 14, 2024
…efore suspend

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
(cherry picked from commit 576a0b7)
Signed-off-by: Cristian Ciocaltea <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 21, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 22, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 23, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 24, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 26, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 27, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
mj22226 pushed a commit to mj22226/linux that referenced this issue Aug 27, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
johnny-mnemonic pushed a commit to linux-ia64/linux-stable-rc that referenced this issue Aug 28, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
gregkh pushed a commit to gregkh/linux that referenced this issue Aug 29, 2024
…efore suspend

[ Upstream commit 576a0b7 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
PlaidCat pushed a commit to ctrliq/kernel-src-tree that referenced this issue Sep 12, 2024
…efore suspend

JIRA: https://issues.redhat.com/browse/RHEL-21053

commit 576a0b7
Author: Peter Ujfalusi <[email protected]>
Date: Thu Oct 12 15:18:48 2023 -0400

    ASoC: SOF: Intel: hda-dsp: Make sure that no irq handler is pending before suspend

    In the existing IPC support, the reply to each IPC message is handled in
    an IRQ thread. The assumption is that the IRQ thread is scheduled without
    significant delays.

    On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
    reply to the last IPC message before powering-down the DSP can be delayed
    by several seconds. The IRQ thread will proceed with register accesses
    after the DSP is powered-down which results in a kernel crash.

    While the bug which causes the delay is not in the audio stack, we must
    handle such cases with defensive programming to avoid such crashes.

    Call synchronize_irq() before proceeding to power down the DSP to make
    sure that no irq thread is pending execution.

    Closes: thesofproject/linux#4608
    Reviewed-by: Guennadi Liakhovetski <[email protected]>
    Signed-off-by: Peter Ujfalusi <[email protected]>
    Signed-off-by: Pierre-Louis Bossart <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Mark Brown <[email protected]>

Signed-off-by: Jaroslav Kysela <[email protected]>
wanghao75 pushed a commit to openeuler-mirror/kernel that referenced this issue Oct 15, 2024
…efore suspend

stable inclusion
from stable-v6.6.48
commit 3b2f36068c28f05b3bfc728dba337cdcec465262
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IAWEBV

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3b2f36068c28f05b3bfc728dba337cdcec465262

--------------------------------

[ Upstream commit 576a0b71b5b479008dacb3047a346625040f5ac6 ]

In the existing IPC support, the reply to each IPC message is handled in
an IRQ thread. The assumption is that the IRQ thread is scheduled without
significant delays.

On an experimental (iow, buggy) kernel, the IRQ thread dealing with the
reply to the last IPC message before powering-down the DSP can be delayed
by several seconds. The IRQ thread will proceed with register accesses
after the DSP is powered-down which results in a kernel crash.

While the bug which causes the delay is not in the audio stack, we must
handle such cases with defensive programming to avoid such crashes.

Call synchronize_irq() before proceeding to power down the DSP to make
sure that no irq thread is pending execution.

Closes: thesofproject/linux#4608
Reviewed-by: Guennadi Liakhovetski <[email protected]>
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Pierre-Louis Bossart <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Wen Zhiwei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IPC timeout IPC timeout observed MTL Applies to Meteor Lake platform. P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

3 participants