Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] ASoC: SOF: ipc4-topology: reserve alh node_id for dai module #4794

Closed

Conversation

RanderWang
Copy link

Fix dai module initialization failure since node_id is zero. Two pcm streams share the same be dai module. When stream a is being closed and the other stream b is open, the dai module is first unprepared by stream a and the node_id is cleared. And then be hw_params function is called for stream b in asoc framework and found be dai component is shared by stream a and b, so the hw_params functon for be dai component is not called. The node_id is always set by be dai component so it will be missed.

Each BE has its own unique intel_alh_id aka node_index based on the link number and pdi number. So node_id will not change for a specific BE. This patch reserves alh node_id for dai module in firmware based on this fact.

Fix #4832

@@ -1283,9 +1283,6 @@ static void sof_ipc4_unprepare_copier_module(struct snd_sof_widget *swidget)
ALH_MULTI_GTW_BASE;
ida_free(&alh_group_ida, group_id);
}

/* clear the node ID */
copier_data->gtw_cfg.node_id &= ~SOF_IPC4_NODE_INDEX_MASK;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is rather convoluted: there is nothing remotely close to something being 'reserved'.

you are not clearing a value, but that does not quite explain the problem, and it's weird to leave something initialized unconditionally.
I wonder if this value should be cleared when the use_count becomes zero?

Copy link
Author

@RanderWang RanderWang Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node_id was cleared by this commit : fixup! ASoC: SOF: sof-audio: Fix broken early bclk feature for SSP 5e9a856. I don't get it.

The use_count is zero when the node_id is cleared, or it can't be cleared.
Let's talk about our processing
(1) stream open
(2) stream hw_params, set up fw module here, use_count ++
(3) stream free, free and unprepare the fw module here, use_count --

This bug happens with stream A at step (3), stream B at step (1), so we can't depend on use_count.

If we want to use use_count, we need to set up fw module in stream open (Asoc set up audio path in snd_pcm_open), at least part of fw module since we don't have enough hw_params information here. This is not done by both IPC3 & IPC4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is rather convoluted: there is nothing remotely close to something being 'reserved'.

you are not clearing a value, but that does not quite explain the problem, and it's weird to leave something initialized unconditionally.

Our fw module are initialized by topology setting when SOF is loaded. As I remember node_id is initialized by topology in IPC3. @bardliao do you remember this ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message is rather convoluted: there is nothing remotely close to something being 'reserved'.
you are not clearing a value, but that does not quite explain the problem, and it's weird to leave something initialized unconditionally.

Our fw module are initialized by topology setting when SOF is loaded. As I remember node_id is initialized by topology in IPC3. @bardliao do you remember this ?

For IPC3, yes, the dai_index is from topology.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we set the node_id unconditionally before sending a message which needs it?
If the node_id got cleared in between setting it in the message and sending it then we have way bigger problems with race...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we set the node_id unconditionally before sending a message which needs it? If the node_id got cleared in between setting it in the message and sending it then we have way bigger problems with race...

hi @bardliao , please help to make it clear, thanks!

I think it should be fine if we set the node_id unconditionally before sending a message which needs it. But @ranj063 is the best person to answer this question.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node_id was cleared by this commit : fixup! ASoC: SOF: sof-audio: Fix broken early bclk feature for SSP 5e9a856. I don't get it.

The use_count is zero when the node_id is cleared, or it can't be cleared. Let's talk about our processing (1) stream open (2) stream hw_params, set up fw module here, use_count ++ (3) stream free, free and unprepare the fw module here, use_count --

This bug happens with stream A at step (3), stream B at step (1), so we can't depend on use_count.

@RanderWang I think this observation is incorrect. Is Stream A has already finished with hw_free because of which the widget has been freed and unprepared, wouldn't Stream B's hw_params take care of preparing the widget?

Copy link
Author

@RanderWang RanderWang Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ranj063 check this https://github.com/thesofproject/sof/issues/8056#issuecomment-1899870604. Stream A free widget before Stream B's hw_params happen. If Stream B's hw_params happen before Stream A free widget, no issue.
If I misunderstand your idea, please help to explain it again. Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node_id is set by SDW hw_params, not by sof_hw_params.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RanderWang so it looks like we have a race between hw_free and hw_params for the same BE by two separate FE's isnt it? Not clearing the node ID, while it will work, seems like a hack and not an actual fix. Would it make sense to instead protect the be_dai_hw_params and be_dai_hw_free with the dpcm_mutex_lock for the be instead?
@plbossart what do you think?

Fix dai module initialization failure since node_id is zero. Two pcm
streams share the same be dai module. When stream a is being closed
and the other stream b is open, the dai module is first unprepared by
stream a and the node_id is cleared. And then be hw_params function is
called for stream b in asoc framework and found be dai component is
shared by stream a and b, so the hw_params function for be dai component
is not called. The node_id is always set by be dai component so it will
be missed.

Each BE has its own unique intel_alh_id aka node_index based on the link
number and pdi number. So node_id will not change for a specific BE. This
patch reserves alh node_id for dai module in firmware based on this fact.

Signed-off-by: Rander Wang <[email protected]>
@RanderWang RanderWang changed the title ASoC: SOF: ipc4-topology: reserve alh node_id for dai module [RFC] ASoC: SOF: ipc4-topology: reserve alh node_id for dai module Jan 30, 2024
@ranj063
Copy link
Collaborator

ranj063 commented Feb 18, 2024

@RanderWang can u please check if #4816 fixes the issues you're trying to address?

@RanderWang RanderWang closed this Feb 19, 2024
@RanderWang
Copy link
Author

@RanderWang can u please check if thesofproject/sof#4816 fixes the issues you're trying to address?

@ranj063 The bug rarely happen so I can't 100% reproduce it. CI test only caught it every a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] FW reported error: 104 - Other failure of module instance initialization request
5 participants