Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enchancements to Embeddings: latency optimized/ debugging local model #12

Open
michaelfeil opened this issue Sep 24, 2023 · 0 comments
Open

Comments

@michaelfeil
Copy link
Contributor

A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.

For batch-size one:

  • query mode
  • debugging/testing
  • weird deployments to environments with segregated networks / places where you cannot provide your ACCESS_TOKEN , it would be interesting to e.g. run a local Bert locally

I would suggest to add a base / not fine-tuned encoder model (bge-large) with a SentenceTransformers like setup (ONNX-cpu or CTranslate2-cpu, which do not require torch). Users could then switch between local mode and API mode.

pip install gradientai[local-embedder]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant