Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Transcription Model support #322

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

yavens
Copy link
Collaborator

@yavens yavens commented Feb 25, 2025

Adds support for speech/audio-to-text generation models and integrates OpenAI's Whisper model.

Notes:

  • Enables reqwest multipart feature flag.

Copy link
Contributor

@0xMochan 0xMochan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty solid, couple of improvement suggestions. Also adds to an opportunity to explode openai into multiple files in a future PR ;)

@@ -15,7 +15,7 @@ doctest = false
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
reqwest = { version = "0.11.22", features = ["json", "stream"] }
reqwest = { version = "0.11.22", features = ["json", "stream", "multipart"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case to make this optional? Would be good to include some analysis on what other deps this might include based on this feature being on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into what it adds

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the multipart feature flag only adds the mime_guess and mime libraries. Their dependencies are already in the project from other crates.

/// The file data to be sent to the transcription model provider
pub data: Vec<u8>,
/// The file name to be used in the request
pub filename: String,
Copy link
Contributor

@0xMochan 0xMochan Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be Option<String>. I understand it's necessary for multi-part, but in situations where you don't have files, it doesn't make sense to set a filename when building a transcription request. IMO, it's better for the developer to not include a filename or default to None when building and then use that to set a default when actually executing the request.

OR, use #[serde(default)] to have "" as the default in the request (so if the builder doesn't make a filename, you default.) I'm not sure which is better, so just make a case :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with the former here

}

/// Load the specified file into data
pub fn load_file(self, path: &str) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn load_file(self, path: &str) -> Self {
pub fn load_file<P>(self, path: P) -> Self
where
P: AsRef<path::Path> {

This allows you to pass anything that can be a path which is flexible + it's actually generating a function per type of argument you provide. We could apply this technique to other places in the codebase, but I haven't fully explored the pros and cons (higher binary sizes code-readability vs flexibility and speed).

Many things in std do it this way!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh neat, didn't even think about that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants