Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tokenize on models not embedders #1920

Merged
merged 1 commit into from
Oct 2, 2023
Merged

Move tokenize on models not embedders #1920

merged 1 commit into from
Oct 2, 2023

Conversation

spolu
Copy link
Contributor

@spolu spolu commented Oct 2, 2023

  • Move tokenize endpoint to llm not embedder
  • Partial support (emtpy strings) for textsynth and cohere
  • Some clean-up

@spolu spolu requested review from lasryaric and fontanierh and removed request for lasryaric October 2, 2023 14:46
@@ -297,6 +297,16 @@ impl LLM for CohereLLM {
api_decode(self.api_key.as_ref().unwrap(), tokens).await
}

// We return empty string in tokenize to partially support the endpoint.
async fn tokenize(&self, text: &str) -> Result<Vec<(usize, String)>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be able to use it to compute context size when generating the conversation for the model. I don't need the token strings (which I don't have access to here) but I do need the endpoint.

@spolu
Copy link
Contributor Author

spolu commented Oct 2, 2023

Deploy order:

  • core
  • front (no impact on current use of tokenize)
  • font-workers (extract and document tracker may be sad before they deploy)

@spolu spolu merged commit 9211f98 into main Oct 2, 2023
@spolu spolu deleted the spolu-tokenize_tweaks branch October 2, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants