New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

How to lengthen the Whisper max audio length? #1050

Open

stinoga opened this issue Nov 22, 2024 · 0 comments

Labels

question

stinoga commented Nov 22, 2024

Question

I'm working from the webgpu-whisper demo, and I'm having a hard time lengthening the maximum audio input allowed. I made the following changes:

-const MAX_AUDIO_LENGTH = 30; // seconds
+const MAX_AUDIO_LENGTH = 120; // seconds

-const MAX_NEW_TOKENS = 64;
+const MAX_NEW_TOKENS = 624;

This seems to allow for longer input, but after 30 seconds I get the following error:

Attempting to extract features for audio longer than 30 seconds. If using a pipeline to extract transcript from a long audio clip, remember to specify `chunk_length_s` and/or `stride_length_s`.

I can't seem to find where to add stride_length_s in the demo code, however. Could someone point me in the right direction?

The text was updated successfully, but these errors were encountered:

stinoga added the question label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment