Skip to content
This repository was archived by the owner on Nov 13, 2024. It is now read-only.

[Feature] vLLM Integration #228

Open
2 tasks done
jamescalam opened this issue Dec 14, 2023 · 4 comments
Open
2 tasks done

[Feature] vLLM Integration #228

jamescalam opened this issue Dec 14, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@jamescalam
Copy link
Contributor

jamescalam commented Dec 14, 2023

Is this your first time submitting a feature request?

  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing functionality

Describe the feature

It would be incredible if we could run local canopy with Mixtral 8x7b — which (afaik) would need GGUF quantized Mixtral via vLLM. This would open us up to integrations with things like formal grammars too (which again, afaik need local models, I don't think any API solutions accept it)

Holiday season is just around the corner and not sure if you guys got me anything so just putting this out there as an idea

Describe alternatives you've considered

No response

Who will this benefit?

The world, but primarily open LLM devs. Would probably be less production use, but I'm sure having this and being able to run for free (Pinecone free tier + local LLM) would push forward more devs building with canopy imo

Are you interested in contributing this feature?

maybe yes

Anything else?

Requires around 30GB of memory using GGUF quantized Mixtral https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF — this would fit on Mac M1/M2 chips

@jamescalam jamescalam added the enhancement New feature or request label Dec 14, 2023
@rachfop
Copy link

rachfop commented Dec 29, 2023

This would also get me to adopt canopy.
I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

Would love to see this adopted.

@igiloh-pinecone
Copy link
Contributor

I have set my base_url and api_key to a local instance to an LLM, but I am not able to continue - even though the LLM uses the Open AI module.

I get a: TypeError: 'NoneType' object is not subscriptable error; however, when I use the real base url and api key, it works just fine.

@rachfop can you please elaborate or provide a repro code?
For any model that follows the OpenAI APIs, this should actually work out of the box.

@pedrocr83
Copy link

Totally think this should be considered! For two reasons:

  1. One shouldn't just rely on paid models where local, fine-tuned models will perform just as good or better.
  2. OpenAI embeddings are not great which can affect the accuracy of your RAG setup https://huggingface.co/spaces/mteb/leaderboard

Also perhaps the use of Ollama will be good as this creates a local server for the LLM.

Thoughts?

@cognitivetech
Copy link

ollama uses openai api

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants