-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip OpenAIWhisperParser
extremely small audio chunks to avoid api error
#11450
Skip OpenAIWhisperParser
extremely small audio chunks to avoid api error
#11450
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
OpenAIWhisperParser
extremely small audio chunks to avoid api …OpenAIWhisperParser
extremely small audio chunks to avoid api error
Apologies for the slow review! Pr has some merge conflicts, happy to re-review if you'd like to resolve |
No problem! Will resolve conflicts this weekend |
373456c
to
d7c26c8
Compare
@baskaryan, I believe everything is in order now. |
@@ -52,11 +52,15 @@ def lazy_parse(self, blob: Blob) -> Iterator[Document]: | |||
# Need to meet 25MB size limit for Whisper API | |||
chunk_duration = 20 | |||
chunk_duration_ms = chunk_duration * 60 * 1000 | |||
chunk_duration_threshold = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this a configurable param with default value of None, so that default behavior doesn't change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make this param configurable? chunk_duration_threshold
only reflects OpenAI's API minimal audio duration, which is 0.1s.
Attempt 1 failed. Exception: Audio file is too short. Minimum audio length is 0.1 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah i see. perhaps still worth making configurable (with default value 0.1) for future proofing? i.e. in case OpenAI API is updated some time in the future to accept shorter lengths
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also could we add a comment somewhere explaining why the threshold is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the changes you suggested and thought more about the future-proofing idea – you're right about that. Also, I added the comments you mentioned to the class docstring. But if you'd rather have them inside the code, just let me know, and I'll fix it
…unks to avoid api error (langchain-ai#11450) **Description** This PR addresses a rare issue in `OpenAIWhisperParser` that causes it to crash when processing an audio file with a duration very close to the class's chunk size threshold of 20 minutes. **Issue** langchain-ai#11449 **Dependencies** None **Tag maintainer** @agola11 @eyurtsev **Twitter handle** leonardodiegues --------- Co-authored-by: Leonardo Diegues <[email protected]> Co-authored-by: Bagatur <[email protected]>
Description
This PR addresses a rare issue in
OpenAIWhisperParser
that causes it to crash when processing an audio file with a duration very close to the class's chunk size threshold of 20 minutes.Issue
#11449
Dependencies
None
Tag maintainer
@agola11 @eyurtsev
Twitter handle
leonardodiegues