Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google AEC rework: unbreak/resurrect, mt8195, DP integration #8571

Closed
wants to merge 4 commits into from

Conversation

andyross
Copy link
Contributor

@andyross andyross commented Dec 5, 2023

This series updates the Google RTC Audio Processing module, which has
always been deployed out of platform branches, to run on SOF main.
The tactic taken is a little different: previous integrations have
relied on getting pipeline dependencies to work, where this does a
little trickery at the component level (both in
google_rtc_audio_processing and demux) to control propagation of
command triggers across pipeline boundaries. This should evolve
better with IPC4, which in (almost?) all cases refuses to propagate
those commands anyway.

The main advantage is that the pipelines become decoupled: you can
start an AEC'd capture stream without playback running and launch the
playback DMA later which then integrates with AEC dynamically; you can
shut either down in either order, start the playback before
capture, etc...

A secondary advantage is that, except for one logic fix with reset
propagation that I'm all but certain was a bug, no behavior changes
are needed to the pipeline state logic.

It also ends up a little cleaner and a little faster.

This work was done against a mt8195 device. I'm going to look at MTL
immediately, hopefully nothing will be needed except the format
conversion.

src/audio/mux/mux.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
@lgirdwood
Copy link
Member

@thesofproject/mediatek pls review.

@marcinszkudlinski
Copy link
Contributor

I understand this change is soley for IPC3 and legacy platforms.

To merge changes from 007 branch, where all IPC3 have been removed completely from google_rtc_audio_processing.c, we need to rename the current file to _ipc3 OR (maybe better) merge 007 changes to a new file google_rtc_audio_processing_dp.c

@andyross
Copy link
Contributor Author

andyross commented Dec 8, 2023

@marcinszkudlinski I'd like to keep this in one implementation, actually. Other platforms are otherwise trapped in some very old topic branches that will be a pain to maintain. (Also some of the state fixes here are turning out be be needed even with IPC4 I think). Working on merging the two forks right now, should have a new version up over the weekend. I'll submit to both branches, but this will be the one with clean commit messages.

@andyross
Copy link
Contributor Author

Many updates and fork merges from mtl-007-drop-stable, including source/sink rework, s32_le support, DP scheduler support, more cleanup, and a handful of bugs fixed that were present in the earlier versions. Please re-review.

The code as it stands here works for me to get working audio through the full AEC stack on mt8195 and (with one workaround in the last patch) Intel MTL. I'll push a separate patch to the mtl-007-drop-stable branch containing these changes in squashed form.

andyross added a commit to andyross/sof that referenced this pull request Dec 11, 2023
Squashed fixups to this code from thesofproject#8571

Signed-off-by: Andy Ross <[email protected]>
src/audio/pipeline/pipeline-params.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
@andyross
Copy link
Contributor Author

Another spin, now that almost all the split-out PRs have been merged. This is down to one patch now, albeit a rather large one. This will work out of the box to get an AEC build working on MTL. If you want to run it with mt8195 you also need #8691.

I'm pretty sure existing review comments have been addressed, the main ones outstanding are:

  • Static vs. dynamic allocation in the giant buffer: I feel really strongly that using the heap for this is dangerous. If we need it someday, it's easy to get back. But right now I'm maybe 60% certain that AEC can be made to fail just by doing a lot of pipeline start/stop operations that fragment the heap.

  • Floating point conversion routines. As mentioned upthread, there just isn't anything in SOF that matches the performance characteristics here. The code here is designed to optimize down to single saturation and conversion instructions in 8-10 instruction inner loops that include de/interleaving inline. The two existing tools (designed for high precision fixed point constant generation and for flat array conversion) just aren't a fit. And AEC is, after all, a performance case.

Anyway, please re-review. It would be good to get this in soon, some more work is starting to back up behind it. (And yes, there's some drama about merge and release strategy vis a vis #8722. But regardless of which merges first, we need to get to a go/no-go decision on this PR).

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 11, 2024

@lrudyX , @wszypelt 2 different errors in https://sof-ci.01.org/sof-pr-viewer/#/build/PR8571/build13389086

  • bad permissions on .ssh/config
  • all TGL tests failed with "connection refused"

None seem relevant here.

@andyross andyross changed the title Google AEC updates with mt8195 integration Google AEC rework: unbreak/resurrect, mt8195, DP integration Jan 11, 2024
@andyross
Copy link
Contributor Author

Split out one patch that can be fairly clearly explained in isolation.

@andyross
Copy link
Contributor Author

And indeed, the checkpatch issue is a false positive, it doesn't like #define ALWAYS_INLINE inline __attribute__((always_inline)) because it wants us to use __always_inline, which is Linux kernel API that neither SOF nor Zephyr use. In fact Zephyr provides ALWAYS_INLINE for this, which is exactly why this line is here for xtos builds, so they can use the "most reasonably official" API.

@wszypelt
Copy link

@marc-hb TGL was repaired -> new results should be available within 40 minutes
bad permissions on .ssh/config, -> I need to discuss this with the process team

Copy link
Contributor

@marcinszkudlinski marcinszkudlinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general problem; missing

CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING=y
CONFIG_COMP_STUBS=y

in app/boards/intel_adsp_ace15_mtpm.conf

that means - IPC4 code is never compiled in CI checks

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
@lkoenig
Copy link
Contributor

lkoenig commented Jan 16, 2024

  • Static vs. dynamic allocation in the giant buffer: I feel really strongly that using the heap for this is dangerous. If we need it someday, it's easy to get back. But right now I'm maybe 60% certain that AEC can be made to fail just by doing a lot of pipeline start/stop operations that fragment the heap.

I agree with you @andyross . Furthermore I do not see any short or medium term situation where one would want to run 2 AEC. If someone can come up with a valid use-case then maybe we should revisit the static allocation otherwise I do think it is safer.

Copy link
Contributor Author

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delays. My lab home has been without power for four days now, so I haven't had access to devices to test with. The power company's web site says they're down to the last 12k/160k of failures though, so hopefully today I'll be able to get back to defrost things. Will have a respin up as soon as possible.

Regarding the static allocation, I'll admit this has been a lot more controversial than I expected. Here's the full argument:

  1. AEC wants an extremely large buffer in a contiguous block, the current one is 200k and Lionel wants more.

  2. Experimentally, that corresponds to about half of the available heap memory on the first stream startup (I could get a ~430k block allocated, IIRC).

  3. Dynamic heaps just aren't going to be reliable when faced with those numbers. I can basically guarantee that after enough overlapping stream operations (especially ones that cross use cases, like opening an HDMI output before closing your headset capture, etc...), you can get the heap into a state where capture won't start because AEC gets a heap failure.

  4. Such an error is fatal, but preventable by just making sure the memory is there to begin with.

Longer term, yes, we will surely want to support more than one AEC instance. But longer term we have better options, like for example integrating the internal AEC allocators with the SOF/Zephyr heap, so it doesn't require the big block. Or alternatively in the MMU future where such a feature will probably be deployed, we'll be able to map a module's .bss virtually, which would be an objectively better choice than fine-grained heap allocation of large blocks.

src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
src/audio/google/google_rtc_audio_processing.c Outdated Show resolved Hide resolved
The pipeline code is complicated, and some trigger events (for example
reset of a playback pipeline that was started before capture) cross
pipelines from the playback/reference stream into the AEC component
unexpectedly.  Ignore/halt triggers we get from the reference
pipeline.

Also work around a bug with some stale code in IPC3 builds that
detects the lack of an active reference source as an "error"
incorrectly.

Signed-off-by: Andy Ross <[email protected]>
Lots of work on AEC, with an eye to getting it working on main, where
it had bitrotten, and pulling in various features (IPC3 pipeline state
management and DP scheduling) that had merged in other branches.

Dynamically configure stream formats (sample format/rate and channel
count) from the connected streams at prepare() time instead of relying
on build-time tuning.

Port the code to use the source/sink API, in as sophisticated a manner
as I can find.  Copies unroll cleanly into just a few instructions per
sample, including integer/float conversions and de-/interleaving.

Support both 16 and 32 bit sample formats, with a fairly clever
inlining scheme to share as much code as possible between them.  The
component will select "copy" function pointers at prepare() time.
This works by using the _float32 variant of the AEC API (which is
actually the core internal implementation) instead of the _int16 one
which involves a conversion.

The large buffers required by this component (input and output staging
and an internally-managed pool/heap block) are now static symbols
instead of dynamic memory from the heap.  These are very large, taking
up about half of what is available to the linker on MTL.  Relying on
heap allocation is just dangerous in this context.

This fully decouples AEC from the playback stream.  It will run
without an active reference happily, feeding zeros to the processing,
and pick up in stride when the pipeline starts.  Note that this
feature is unexercisable on IPC4, where the kernel automatically
starts up connected pipelines.

Fixes a few bugs and misfeatures also:

+ Chunk the copies by full buffer strides between AEC processing calls
  instead of testing at each copied frame.

+ Copy the reference and mic streams in tandem, preventing them from
  becoming out of sync if the devices weren't themselves synchronized.

+ Copy the AEC results to the output stream after the call to
  ProcessCapture() instead of before.  This was a hidden latency bug
  in the original code, I think.

+ Cleans up the Kconfig to remove stale variables and guard all
  the component-specific tunables under the top-level component
  variable.  Also uses a default instead of a select to couple to
  CONFIG_STUBS, allowing AEC to be manually tuned.

Signed-off-by: Andy Ross <[email protected]>
Cleaner API.  Also avoids the need to specify cache line padding at the end.

Signed-off-by: Andy Ross <[email protected]>
This feature is already enabled in topology, enable it in the build to
match.

Signed-off-by: Andy Ross <[email protected]>
@andyross
Copy link
Contributor Author

Rebase/respin. Should have addressed the existing review comments (other than the argument about static buffers, obviously)

@lkoenig
Copy link
Contributor

lkoenig commented Feb 1, 2024

Longer term, yes, we will surely want to support more than one AEC instance.

I've thought about that and I can not see a use case for running more than one AEC at the same time so I do not think this is something we should plan to support.

Hence I am arguing that having the AEC memory pre-allocated is good solution as it'll avoid memory fragmentation, it guarantees that one can use the AEC.

@cujomalainey
Copy link
Contributor

Longer term, yes, we will surely want to support more than one AEC instance.

I've thought about that and I can not see a use case for running more than one AEC at the same time so I do not think this is something we should plan to support.

Hence I am arguing that having the AEC memory pre-allocated is good solution as it'll avoid memory fragmentation, it guarantees that one can use the AEC.

We should talk about this. I think there are cases we need to consider that might be important. I agree that in the general set of use cases this is true though.

@andyross
Copy link
Contributor Author

andyross commented Mar 7, 2024

This is hopelessly conflicted vs. current main. Closing this PR and opening a new one with re-split changes that can merge over main. That should have addressed existing review comments, but please check.

@andyross andyross closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.