Allow `sample_rate` parameter to audio decoder #551

NicolasHug · 2025-03-12T17:13:59Z

Towards #549

This PR allows the user to specify a a desired output sample rate. If the source stream isn't already in that sample rate, the sample are converted using swresample, which we already use for format conversion.

The PR addresses both core and public AudioDecoder APIs.

Converting sample rates isn't as trivial as converting formats, because this changes the number of output samples. And, importantly, we now need to account for libswresample's internal buffers:

Note that the samples may get buffered in swr if you provide insufficient output space or if sample rate conversion is done, which requires "future" samples. [...] At the end of conversion the resampling buffer can be flushed by calling swr_convert() with NULL in and 0 in_count

(from the docs)

NicolasHug · 2025-03-19T15:17:10Z

test/decoders/test_decoders.py

+        frames_44100_native = decoder.get_samples_played_in_range(
+            start_seconds=start_seconds, stop_seconds=stop_seconds
+        )
+        assert frames_44100_native.sample_rate == 44_100


We're treating the above frames as the reference. We could check-in their corresponding .pt reference frames instead, but I prefer doing it this way because it really clearly illustrate that the reference have a sample rate of 44_100 (from their metadata, as opposed to some checked-in value)

NicolasHug · 2025-03-19T15:26:47Z

src/torchcodec/decoders/_audio_decoder.py

@@ -25,10 +25,13 @@ def __init__(
        source: Union[str, Path, bytes, Tensor],
        *,
        stream_index: Optional[int] = None,
+        sample_rate: Optional[int] = None,


We could also consider exposing this as desired_sample_rate parameter? I don't have a strong opinion, the docs would make it clear what this means in any case.

NicolasHug · 2025-03-19T17:44:11Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+  torch::Tensor lastSamples = maybeFlushSwrBuffers();
+  if (lastSamples.numel() > 0) {
+    frames.push_back(lastSamples);
+  }


Not particularly fond of the above. Maybe we could let maybeFlushSwrBuffers return a tensor of shape (numChannels, 0), which could probably be pushed_back() unconditionally. Not sure that's better, there are probably nicer patterns I'm not seeing?

Everything I can think of that does things unconditionally in this function is way too cute (returning a vector of tensors; using std::copy()). It may be more clear about intent if maybeFlushSwrBuffers() returns an optional so that then we don't need to use an empty tensor to indicate nothing to do.

Using optional sounds better, thanks

scotts · 2025-03-20T01:06:26Z

src/torchcodec/decoders/_core/VideoDecoder.h

+
+    std::optional<int> sampleRate;
+  };
+


I don't think we need the indirection of having an options struct. I know it mirrors the pattern established on the video side, but I also don't think it's a good practice there, either. It's harder to get rid of the video options because we accept a string, and then we do a bunch of work parsing the string. Getting rid of VideoStreamOptions will mean updating a bunch of callers to pass real arguments instead of a string.

I'm not a fan of the stringy video options either - that's why I didn't implement a string constructor for audio options. I also opened #577 to entirely remove string video options (it's not as hard as we thought).

I don't feel very strongly about this, but if we were to collapse both video options and audio options into the video decoder (or the StreamInfo), then we would have a lot of video-only and audio-only fields within the same struct. I personally find it cleaner to separate those into separate structs. Additionally, using option structs makes it very clear which fields/values come from user-specified parameter, in constrast to e.g. metadata or video properties, which is often useful to immediately understand the source of the values, as e.g. here:

torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp

Lines 1395 to 1398 in 7cb2271

int sourceSampleRate = srcAVFrame->sample_rate;

int desiredSampleRate =

streamInfos_[activeStreamIndex_].audioStreamOptions.sampleRate.value_or(

sourceSampleRate);

LMK you thoughts, I'm fine with collapsing sampleRate as a StreamInfo field if you prefer, but we'd potentially be losing the 2 benefits mentioned above.

Yeah, that's a good point, we actually store the options. I was thinking just from the addAudioStream() API perspective. Let's keep this then, and eventually get rid of the stringy options for video.

scotts · 2025-03-20T13:36:06Z

src/torchcodec/decoders/_core/VideoDecoder.cpp

+      streamInfo.swrContext.get(),
+      &lastSamplesData,
+      numRemainingSamples,
+      NULL,


s/NULL/nullptr/g

WIP

f6a7f4e

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 12, 2025

NicolasHug added 13 commits March 18, 2025 11:38

Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate

179a01c

Remove old code

2d97555

WI:P

9af4bc8

WIP

2adf496

Fix clipping

ef93be4

Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate

db740a6

Driveby, remove preAllocatedOutputTensor

ca15232

Rename avFrame into srcAVFrame

6aa7b09

Add flushing

f858d0c

Put back normal compilation flags

70ac31e

Add tests

8deb079

Add tests

7b09315

Nit

af4e88a

NicolasHug commented Mar 19, 2025

View reviewed changes

NicolasHug mentioned this pull request Mar 19, 2025

Audio decoding TODOs #549

Closed

7 tasks

NicolasHug changed the title ~~[WIP] Allow sample_rate parameter to audio decoder~~ Allow sample_rate parameter to audio decoder Mar 19, 2025

NicolasHug marked this pull request as ready for review March 19, 2025 15:21

NicolasHug commented Mar 19, 2025

View reviewed changes

Fix test assets

975b0fb

NicolasHug requested a review from scotts March 19, 2025 17:08

NicolasHug commented Mar 19, 2025

View reviewed changes

scotts reviewed Mar 20, 2025

View reviewed changes

NicolasHug added 2 commits March 20, 2025 10:09

Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate

7cb2271

Nit

ee1c7b7

scotts reviewed Mar 20, 2025

View reviewed changes

NULL -> nullptr

bf9aed2

scotts approved these changes Mar 20, 2025

View reviewed changes

Use optional

f0e2cdd

NicolasHug merged commit 611421e into pytorch:main Mar 20, 2025
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow `sample_rate` parameter to audio decoder #551

Allow `sample_rate` parameter to audio decoder #551

NicolasHug commented Mar 12, 2025 •

edited

Loading

NicolasHug Mar 19, 2025 •

edited

Loading

NicolasHug Mar 19, 2025

NicolasHug Mar 19, 2025

scotts Mar 20, 2025

NicolasHug Mar 20, 2025

scotts Mar 20, 2025

NicolasHug Mar 20, 2025 •

edited

Loading

scotts Mar 20, 2025

scotts Mar 20, 2025

	int sourceSampleRate = srcAVFrame->sample_rate;
	int desiredSampleRate =
	streamInfos_[activeStreamIndex_].audioStreamOptions.sampleRate.value_or(
	sourceSampleRate);

Allow sample_rate parameter to audio decoder #551

Allow sample_rate parameter to audio decoder #551

Conversation

NicolasHug commented Mar 12, 2025 • edited Loading

NicolasHug Mar 19, 2025 • edited Loading

Choose a reason for hiding this comment

NicolasHug Mar 19, 2025

Choose a reason for hiding this comment

NicolasHug Mar 19, 2025

Choose a reason for hiding this comment

scotts Mar 20, 2025

Choose a reason for hiding this comment

NicolasHug Mar 20, 2025

Choose a reason for hiding this comment

scotts Mar 20, 2025

Choose a reason for hiding this comment

NicolasHug Mar 20, 2025 • edited Loading

Choose a reason for hiding this comment

scotts Mar 20, 2025

Choose a reason for hiding this comment

scotts Mar 20, 2025

Choose a reason for hiding this comment

Allow `sample_rate` parameter to audio decoder #551

Allow `sample_rate` parameter to audio decoder #551

NicolasHug commented Mar 12, 2025 •

edited

Loading

NicolasHug Mar 19, 2025 •

edited

Loading

NicolasHug Mar 20, 2025 •

edited

Loading