[Feature] vLLM Integration #228

jamescalam · 2023-12-14T22:13:12Z

Is this your first time submitting a feature request?

I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing functionality

Describe the feature

It would be incredible if we could run local canopy with Mixtral 8x7b — which (afaik) would need GGUF quantized Mixtral via vLLM. This would open us up to integrations with things like formal grammars too (which again, afaik need local models, I don't think any API solutions accept it)

Holiday season is just around the corner and not sure if you guys got me anything so just putting this out there as an idea

Describe alternatives you've considered

No response

Who will this benefit?

The world, but primarily open LLM devs. Would probably be less production use, but I'm sure having this and being able to run for free (Pinecone free tier + local LLM) would push forward more devs building with canopy imo

Are you interested in contributing this feature?

maybe yes

Anything else?

Requires around 30GB of memory using GGUF quantized Mixtral https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF — this would fit on Mac M1/M2 chips

rachfop · 2023-12-29T15:10:53Z

This would also get me to adopt canopy.
I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

Would love to see this adopted.

igiloh-pinecone · 2024-01-15T13:29:25Z

I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

@rachfop can you please elaborate or provide a repro code?
For any model that follows the OpenAI APIs, this should actually work out of the box.

pedrocr83 · 2024-03-15T11:09:12Z

Totally think this should be considered! For two reasons:

One shouldn't just rely on paid models where local, fine-tuned models will perform just as good or better.
OpenAI embeddings are not great which can affect the accuracy of your RAG setup https://huggingface.co/spaces/mteb/leaderboard

Also perhaps the use of Ollama will be good as this creates a local server for the LLM.

Thoughts?

cognitivetech · 2024-08-25T20:55:53Z

ollama uses openai api

jamescalam added the enhancement New feature or request label Dec 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] vLLM Integration #228

[Feature] vLLM Integration #228

jamescalam commented Dec 14, 2023 •

edited

Loading

rachfop commented Dec 29, 2023

igiloh-pinecone commented Jan 15, 2024

pedrocr83 commented Mar 15, 2024

cognitivetech commented Aug 25, 2024

[Feature] vLLM Integration #228

[Feature] vLLM Integration #228

Comments

jamescalam commented Dec 14, 2023 • edited Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

rachfop commented Dec 29, 2023

igiloh-pinecone commented Jan 15, 2024

pedrocr83 commented Mar 15, 2024

cognitivetech commented Aug 25, 2024

jamescalam commented Dec 14, 2023 •

edited

Loading