-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Transcription Model support #322
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty solid, couple of improvement suggestions. Also adds to an opportunity to explode openai
into multiple files in a future PR ;)
@@ -15,7 +15,7 @@ doctest = false | |||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | |||
|
|||
[dependencies] | |||
reqwest = { version = "0.11.22", features = ["json", "stream"] } | |||
reqwest = { version = "0.11.22", features = ["json", "stream", "multipart"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case to make this optional? Would be good to include some analysis on what other deps this might include based on this feature being on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into what it adds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the multipart feature flag only adds the mime_guess
and mime
libraries. Their dependencies are already in the project from other crates.
/// The file data to be sent to the transcription model provider | ||
pub data: Vec<u8>, | ||
/// The file name to be used in the request | ||
pub filename: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be Option<String>
. I understand it's necessary for multi-part, but in situations where you don't have files, it doesn't make sense to set a filename
when building a transcription request. IMO, it's better for the developer to not include a filename
or default to None
when building and then use that to set a default when actually executing the request.
OR, use #[serde(default)]
to have ""
as the default in the request (so if the builder doesn't make a filename
, you default.) I'm not sure which is better, so just make a case :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with the former here
rig-core/src/transcription.rs
Outdated
} | ||
|
||
/// Load the specified file into data | ||
pub fn load_file(self, path: &str) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn load_file(self, path: &str) -> Self { | |
pub fn load_file<P>(self, path: P) -> Self | |
where | |
P: AsRef<path::Path> { |
This allows you to pass anything that can be a path which is flexible + it's actually generating a function per type of argument you provide. We could apply this technique to other places in the codebase, but I haven't fully explored the pros and cons (higher binary sizes code-readability vs flexibility and speed).
Many things in std
do it this way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh neat, didn't even think about that
Adds support for speech/audio-to-text generation models and integrates OpenAI's Whisper model.
Notes: