Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver support for firmware recovery #452

Closed
plbossart opened this issue Dec 16, 2018 · 11 comments
Closed

Driver support for firmware recovery #452

plbossart opened this issue Dec 16, 2018 · 11 comments
Assignees
Labels
enhancement New feature or request Notable Enhancement is significant or notable for release.

Comments

@plbossart
Copy link
Member

When the firmware gets stuck or unresponsive (IPC timeouts), the driver just bails out and the only way to recover is to reboot. That's not acceptable for a number of products.

The SOF driver needs to implement a recovery mechanism where it detects the firmware has gone off tracks (e.g. with a 5s watchdog), and restores the configuration (hardware reset, firmware download, pipelines and controls restored). We already have this sequence for resume so it shouldn't be too hard.

Note that this request is not asking for seamless operation. Applications will likely time out and likely stop, so collaboration with userspace is likely required to reenable playback/capture. The ask is really to avoid a reboot and restart cleanly.

@plbossart plbossart added the enhancement New feature or request label Dec 16, 2018
@wenqingfu
Copy link

extended work of thesofproject/sof#710?

@plbossart
Copy link
Member Author

plbossart commented Dec 17, 2018 via email

@ranj063
Copy link
Collaborator

ranj063 commented Dec 21, 2018

@plbossart looking into the hda_dsp_core_reset_power_down() method in hds-dsp.c, we seem to be placing the core in reset before powering down and when the firmware is unresponsive, this also seems to fail. This is required when we want to shutdown gracefully.

In the case of panic, would it be acceptable to just power down the cores instead of placing them in reset?

@plbossart
Copy link
Member Author

plbossart commented Dec 21, 2018 via email

@xiulipan
Copy link

@plbossart @ranj063
Actually now when a panic happened. If runtime_pm is not disabled, it will automatically recovery the DSP by runtime_suspend and runtime_resume.

This now brings some difficulty in debug as all the logs and dumped exception info is gone with that suspend/resume.

As for the recovery mode, I think we may need to add flag or debugfs controller to disable the FW recovery for DEBUG usage.

@plbossart
Copy link
Member Author

plbossart commented Feb 11, 2019 via email

@wenqingfu
Copy link

@kv2019i , would you be able to take this one? thanks!

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 2, 2019

@wenqingfu wrote:

@kv2019i , would you be able to take this one? thanks!

Ack. I'm currently looking at tracing issues during suspend and runtime supsned (old issue #248
and newer #486), will work on these first (good way to learn the tracing part first) and then move to this.

@kv2019i
Copy link
Collaborator

kv2019i commented Apr 17, 2019

Hello, I now pushed initial patchset to address this issue (PR841). I ended up taking a slightly different path than what you've discussed earlier, so please check the code and see whether this is on the right track or not.

@lgirdwood lgirdwood added the Notable Enhancement is significant or notable for release. label Jun 21, 2019
plbossart pushed a commit that referenced this issue Feb 13, 2020
rcu_read_lock is needed to protect access to psock inside sock_map_unref
when tearing down the map. However, we can't afford to sleep in lock_sock
while in RCU read-side critical section. Grab the RCU lock only after we
have locked the socket.

This fixes RCU warnings triggerable on a VM with 1 vCPU when free'ing a
sockmap/sockhash that contains at least one socket:

| =============================
| WARNING: suspicious RCU usage
| 5.5.0-04005-g8fc91b972b73 #450 Not tainted
| -----------------------------
| include/linux/rcupdate.h:272 Illegal context switch in RCU read-side critical section!
|
| other info that might help us debug this:
|
|
| rcu_scheduler_active = 2, debug_locks = 1
| 4 locks held by kworker/0:1/62:
|  #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #2: ffffffff82065d20 (rcu_read_lock){....}, at: sock_map_free+0x5/0x170
|  #3: ffff8881368c5df8 (&stab->lock){+...}, at: sock_map_free+0x64/0x170
|
| stack backtrace:
| CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04005-g8fc91b972b73 #450
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
|  dump_stack+0x71/0xa0
|  ___might_sleep+0x105/0x190
|  lock_sock_nested+0x28/0x90
|  sock_map_free+0x95/0x170
|  bpf_map_free_deferred+0x58/0x80
|  process_one_work+0x260/0x5e0
|  worker_thread+0x4d/0x3e0
|  kthread+0x108/0x140
|  ? process_one_work+0x5e0/0x5e0
|  ? kthread_park+0x90/0x90
|  ret_from_fork+0x3a/0x50

| =============================
| WARNING: suspicious RCU usage
| 5.5.0-04005-g8fc91b972b73-dirty #452 Not tainted
| -----------------------------
| include/linux/rcupdate.h:272 Illegal context switch in RCU read-side critical section!
|
| other info that might help us debug this:
|
|
| rcu_scheduler_active = 2, debug_locks = 1
| 4 locks held by kworker/0:1/62:
|  #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0
|  #2: ffffffff82065d20 (rcu_read_lock){....}, at: sock_hash_free+0x5/0x1d0
|  #3: ffff888139966e00 (&htab->buckets[i].lock){+...}, at: sock_hash_free+0x92/0x1d0
|
| stack backtrace:
| CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04005-g8fc91b972b73-dirty #452
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
|  dump_stack+0x71/0xa0
|  ___might_sleep+0x105/0x190
|  lock_sock_nested+0x28/0x90
|  sock_hash_free+0xec/0x1d0
|  bpf_map_free_deferred+0x58/0x80
|  process_one_work+0x260/0x5e0
|  worker_thread+0x4d/0x3e0
|  kthread+0x108/0x140
|  ? process_one_work+0x5e0/0x5e0
|  ? kthread_park+0x90/0x90
|  ret_from_fork+0x3a/0x50

Fixes: 7e81a35 ("bpf: Sockmap, ensure sock lock held during tear down")
Signed-off-by: Jakub Sitnicki <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
@kv2019i
Copy link
Collaborator

kv2019i commented May 19, 2021

@ujfalusi got #1675 so passing this as well :)

@kv2019i kv2019i removed their assignment May 19, 2021
hli25 added a commit to hli25/linux-sof that referenced this issue Aug 26, 2021
Fixed following WARNINGs:
for thesofproject#355, removed this unnecessary log print.
for thesofproject#364, added a variable to "struct es8316_priv", and tried to let
          function es8316_enable_jack_detect() align to mainline code.
for thesofproject#452, removed it since we did not verified the patch on other platform.

	WARNING: Unnecessary ftrace-like logging - prefer using ftrace
	thesofproject#355: FILE: sound/soc/codecs/es8316.c:751:
	+    dev_dbg(comp->dev, "Exit %s\n", __func__);

	ERROR: do not initialise statics to 0
	thesofproject#364: FILE: sound/soc/codecs/es8316.c:760:
	+    static int initial = 0;

	WARNING: DT compatible string "everest,es8336" appears un-documented -- check ./Documentation/devicetree/bindings/
	thesofproject#452: FILE: sound/soc/codecs/es8316.c:932:
	+    { .compatible = "everest,es8336", },

Signed-off-by: Huajun Li <[email protected]>
@plbossart
Copy link
Member Author

duplicate of #1675

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Notable Enhancement is significant or notable for release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants