Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support llama.cpp chat generation #722

Closed
lbux opened this issue May 8, 2024 · 1 comment
Closed

Support llama.cpp chat generation #722

lbux opened this issue May 8, 2024 · 1 comment
Labels
feature request Ideas to improve an integration topic:streaming

Comments

@lbux
Copy link
Contributor

lbux commented May 8, 2024

Is your feature request related to a problem? Please describe.
Currently, the llama.cpp integration only supports create_completion() and not create_chat_completion(). Incorporating chat completion will allow for

  • streaming
  • function calling
  • json constraining

Describe the solution you'd like
Proper implementation of create_chat_completion()

Describe alternatives you've considered
Ollama is built upon llama.cpp and supports Chat Completion, but there are reasons for not wanting to use Ollama if you already use llama.cpp

Additional context
The biggest reason for wanting to support this is json constraining. By allowing json constraining, output of an LLM can be structured in such a way that it can be used as input for evaluators that support (or can be modified to support) local llms through llama.cpp. Without constraining the output, parsing becomes difficult for models that are not as good at following directions.

@lbux lbux added the feature request Ideas to improve an integration label May 8, 2024
@anakin87
Copy link
Member

done in #723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Ideas to improve an integration topic:streaming
Projects
None yet
Development

No branches or pull requests

3 participants