Enchancements to Embeddings: latency optimized/ debugging local model #12

michaelfeil · 2023-09-24T09:48:55Z

A bit of a creative idea. Likely to be an interesting business concept, but at least a unique selling point. No other Embedding provider offers this, apart from a hacky do-it-yourself version of huggingface.

For batch-size one:

query mode
debugging/testing
weird deployments to environments with segregated networks / places where you cannot provide your ACCESS_TOKEN , it would be interesting to e.g. run a local Bert locally

I would suggest to add a base / not fine-tuned encoder model (bge-large) with a SentenceTransformers like setup (ONNX-cpu or CTranslate2-cpu, which do not require torch). Users could then switch between local mode and API mode.

pip install gradientai[local-embedder]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enchancements to Embeddings: latency optimized/ debugging local model #12

Enchancements to Embeddings: latency optimized/ debugging local model #12

michaelfeil commented Sep 24, 2023

Enchancements to Embeddings: latency optimized/ debugging local model #12

Enchancements to Embeddings: latency optimized/ debugging local model #12

Comments

michaelfeil commented Sep 24, 2023