New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[FEATURE REQUEST] Enable llama-server api option in llm_provider #359

Open

Raul824 opened this issue Feb 27, 2025 · 0 comments

Raul824 commented Feb 27, 2025

Using openai llm provider and providing below for base url and api_key sends the request to llama-server.

base_url="http://localhost:5050/v1", # "http://:port"
api_key = "sk-no-key-required"

but it fails with 500 internal server error due to openai llm provider expecting multimodal which has been disabled in llama-server.

Ollama is working but it doesn't allow integrated gpu offload which Llama server provides and it is faster for smaller models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment