Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to lengthen the Whisper max audio length? #1050

Open
stinoga opened this issue Nov 22, 2024 · 0 comments
Open

How to lengthen the Whisper max audio length? #1050

stinoga opened this issue Nov 22, 2024 · 0 comments
Labels
question Further information is requested

Comments

@stinoga
Copy link

stinoga commented Nov 22, 2024

Question

I'm working from the webgpu-whisper demo, and I'm having a hard time lengthening the maximum audio input allowed. I made the following changes:

-const MAX_AUDIO_LENGTH = 30; // seconds
+const MAX_AUDIO_LENGTH = 120; // seconds

-const MAX_NEW_TOKENS = 64;
+const MAX_NEW_TOKENS = 624;

This seems to allow for longer input, but after 30 seconds I get the following error:

Attempting to extract features for audio longer than 30 seconds. If using a pipeline to extract transcript from a long audio clip, remember to specify `chunk_length_s` and/or `stride_length_s`.

I can't seem to find where to add stride_length_s in the demo code, however. Could someone point me in the right direction?

@stinoga stinoga added the question Further information is requested label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant