Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio encoding - part 1 of N #524

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Feb 26, 2025

Expect N to be large :)

This PR implements a basic and not feature complete audio encoder, which seems to work OK in some limited scenarios. There are a lot of TODOs left in the code.

This is only C++ and core APIs, nothing is public yet. The current design is to pass all the necessary parameters to the constructor. Namely, with the core API:

encoder = create_encoder(wf=samples, sample_rate=sample_rate, filename=output.mp3)
encode(encoder)

The reason is: all these parameters are required in order to initialize the AVFormatContext and the AVCodecContext. This may very well change, i.e. eventually we may decide that we don't even want to expose this as a C++ object but rather as a pure function? It'll be easier to decide later once we're more feature complete.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 26, 2025
@NicolasHug NicolasHug changed the title [WIP] Audio encoding POC Audio encoding - part 1 of N Apr 2, 2025
@NicolasHug NicolasHug marked this pull request as ready for review April 2, 2025 14:46
@@ -61,6 +61,9 @@ function(make_torchcodec_libraries
AVIOContextHolder.cpp
FFMPEGCommon.cpp
SingleStreamDecoder.cpp
# TODO: lib name should probably not be "*_decoder*" now that it also
# contains an encoder
Encoder.cpp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be libtorchcodec_coreN.so?

// We're allocating the stream here. Streams are meant to be freed by
// avformat_free_context(avFormatContext), which we call in the
// avFormatContext_'s destructor.
avStream_ = avformat_new_stream(avFormatContext_.get(), nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to make sure I understand the relationship here: avformat_new_stream() takes a pointer to an AVFormatContext. It creates a new stream, associates that stream with the provided AVFormatContext such that the AVFormatContext owns the stream, and returns a pointer to that newly created stream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, or to be more precise: this is also what I understand from the FFmpeg docs.

void encode();

private:
void encode_inner_loop(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: camelCase instead of snake_case.

For the record, I prefer snake case, but I think being consistent is more important. If I could wave a magic wand and make our repo all snake case for variable and function names, I would. Class names should still be pascal case, I believe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more used to camel case too. If we really want to, surely there must exist some camelCase to smake_case auto converters?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants