-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] need to rate-limit 'nothing to copy' traces when mixer source is paused. #4672
Comments
Still happening with Zephyr then this test failed with
|
@kv2019i any API in Zephyr to rate limit logs ? |
@lgirdwood both SOF and Zephyr have means to drop logging entries in case of flooding, SOF also can merge "similar" messages in such situations but I don't think either has a special call to mark individual logs for rate-limiting. I was looking for that recently too and haven't found anything. |
Need to use Zephyr methods, SOF logging will go away. |
I had a look and I could not find anything either, just "drop newest" vs "drop oldest" when full.
I don't see when someone would want to rate-limit one specific log statement while letting others flood the logs. EDIT: you could want a higher threshold for some statements versus others, see example in #5597. Except this not really an example because it mentions only one statement, so there's a chance changing global thresholds is still enough even in 5997 The old sof-logger throttling is smart enough to automatically throttle only abusive log statements, leaving other logs unaffected. I'd expect something similar from Zephyr (if there was anything)
I don't see how one could affect the other. |
Tentative fix submitted, buried in monster PR #8571 |
My changes were predicated on the DP scheduler, I didn't realize there were other circumstances that tripped over this too. So they won't fix this in particular, even though it's the same warning. Maybe we should just remove the warning entirely if it's known benign in other situations? Buffer over/underflow is already an error condition, I don't see why this is needed? (DP hits this due to asynchrony in the pipeline updates: the DP component sinks data synchronously in the source pipeline, but the output doesn't show up right away, so things downstream yell that they have nothing to do until the next time the pipeline is scheduled. But... that's precisely the design of the DP scheduler.) |
The DAI emits a flood of warnings when presented with empty buffers at copy time. That's not really a reasonable warning condition. There are multiple situations where components upstream may be paused or asynchronous, leading to starvation in any given cycle. Earlier code has hit this with paused components, where the log messages are merely annoying. One new situation is that when using the DP scheduler, updates are async and may happen at a different cadence than the pipeline the DAI is on; the upstream component will be presented with data in a (for example) 1ms pipeline tick, but then send it to a different component (echo cancellation, say) that batches it up into larger buffers (10ms) and releases it downstream only at the slower cadence. In that situation the flood of messages is being emitted during an active stream, and tends to cause glitches all by itself after a few seconds (and even where it doesn't, it floods the Zephyr log backend to the extent that literally every message is dropped). (I don't know that all such warnigns are removed by this patch. These are only the ones I've seen in practice.) Fixes thesofproject#4672 Signed-off-by: Andy Ross <[email protected]>
Trying to reproduce DSP panic thesofproject#8621 ------- The DAI emits a flood of warnings when presented with empty buffers at copy time. That's not really a reasonable warning condition. There are multiple situations where components upstream may be paused or asynchronous, leading to starvation in any given cycle. Earlier code has hit this with paused components, where the log messages are merely annoying. One new situation is that when using the DP scheduler, updates are async and may happen at a different cadence than the pipeline the DAI is on; the upstream component will be presented with data in a (for example) 1ms pipeline tick, but then send it to a different component (echo cancellation, say) that batches it up into larger buffers (10ms) and releases it downstream only at the slower cadence. In that situation the flood of messages is being emitted during an active stream, and tends to cause glitches all by itself after a few seconds (and even where it doesn't, it floods the Zephyr log backend to the extent that literally every message is dropped). (I don't know that all such warnigns are removed by this patch. These are only the ones I've seen in practice.) Fixes thesofproject#4672 Signed-off-by: Andy Ross <[email protected]> (cherry picked from commit 514576e)
I think this is a bug. If the mixer itself is active, it should continue to produce data to the DAI if the DAI is running. I'll bump priority to P2 and assign to v2.9. |
I'm still not comfortable with silencing those messages, they have been very useful for me recently do what do you think about this sollution: as cycles with no data may naturally happen at pipeline startup, so silence them.
|
@kv2019i your fix now merged, good to close ? |
@lgirdwood I think what @plbossart described here originally is actually a bug, but when I tried to reproduce this today, I could not. Even without my recent fix PR to (8649), pausing PCMs with mixer caused no flood of "nothing to copy" messages. So unless @plbossart disagrees, I think this can be closed. The original functional problem has been solved, and also "nothing to copy" messages are now limited to debug builds only. |
I am not sure what to make of all the comments. This is a 2.5yr old bug, I am not even sure it this was reported in the context of IPC4. On one side we have our trusted @kv2019i who cannot reproduce the error, and on the other side no consensus with a remark from @marcinszkudlinski and a reference to Google AEC which cannot possibly be related. I would err on the side of keeping this open until there's consensus. |
@marcinszkudlinski Are you good to close this now? We have kept the warning logs in code, but they are behind a build option (#8649). I think in any case, all known problems (that people can reproduce have been fixed), so I don't want to keep this open for v2.9 if it's not clear anything is broken or needs to be done. |
for me - ok to close |
The mtrace spam still makes logs unusable. Try opening
|
@marc-hb Please open a different bug for this. This is a log coming from different component and in different case, so we are reusing same bug id for multiple issues here. These logs in latest comment are expected, but we can debate the volume. |
Opened new #8761 |
Describe the bug
When we use a mixer, pausing while using a single source will cause a lot of trace messages
This is due to the fact that we don't pause a mixer but keep it active.
To Reproduce
use a topology with a mixer, e.g. cavs-nocodec
aplay -Dhw:0,0 -c2 -r48000 -fS16_LE /dev/zero -vv -i
press space bar
in another terminal, run sof-logger
Reproduction Rate
100%
Expected behavior
rate-limit the trace messages in that case. There's nothing to copy because the sources are paused.
Impact
annoyance
The text was updated successfully, but these errors were encountered: