Text Generation Inference's Messages API: HuggingFaceEndpoint
, ChatHuggingFace
, or ChatOpenAI
?
#27561
Unanswered
Simon-Stone
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked other resources
Commit to Help
Example Code
Description
I have an instance of Text Generation Inference (TGI) deployed on my own server and want to use the Messages API. I have now tried this in several different ways, each with their own issues. I think my use case is not unusual, but it seems like it does not have a clear home. Here is what I am struggling with:
There used to be a class called
HuggingFaceTextGenInference
, which has been deprecated a while ago with the recommendation to useHuggingFaceEndpoint
instead.HuggingFaceEndpoint
works fine if we just want to use the/generate
endpoints of TGI. However, if we want to use a chat (a.k.a. instruction-tuned) model with those endpoints, we have to manage the chat template ourselves. This quickly becomes tedious and breaks many integrations in LangChain.langchain_huggingface
offers a wrapper classChatHuggingFace
around aHuggingFaceEndpoint
, looks for the model's chat template on HuggingFace Hub, downloads it and applies it to each message before passing it to the wrapped model. So the template is applied client-side, and HuggingFace Hub credentials are required.Since version 1.4.0, TGI offers the Messages API, which applies a chat template automatically on the server side, if required. It is (supposedly) compatible with OpenAI's API and thus uses
/v1/chat/completions
endpoints. It appears that this is not currently supported by any classes inlangchain_huggingface
. Because it is (supposedly, see below) compatible with OpenAI's API, TGI officially recommends using OpenAI's Python client. You can also uselangchain_openai
'sChatOpenAI
, set thebase_url
parameter appropriately, and it works.That is, until you try to use tools. TGI returns the tool arguments slightly differently (as a dict instead of a str), which breaks
ChatOpenAI
(but not OpenAI's own Python client, interestingly).I tried to fix this with PR #27523, but it was rejected because it was considered an issue to be taken up in
langchain_huggingface
.I don't know what to do here, really. This is a relatively minor tweak in
langchain_core
(see PR), but a major lift inlangchain_huggingface
because it does not support the Messages API at all, yet. It's also a more general issue not just specific to OpenAI, as is also evident by the fact that the output parser that needed fixing is inlangchain_core
.Am I missing anything here? Please let me know if I do.
I am trying to get some visibility for the issue, because it is very much holding me back right now.
System Info
System Information
Package Information
Optional packages not installed
Other Dependencies
Beta Was this translation helpful? Give feedback.
All reactions