Text Generation Inference's Messages API: `HuggingFaceEndpoint`, `ChatHuggingFace`, or `ChatOpenAI`? #27561

Simon-Stone · 2024-10-22T20:41:09Z

Simon-Stone
Oct 22, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_huggingface import HuggingFaceEndpoint
from langchain_openai import ChatOpenAI

URL = "<my_tgi_instance>/llama-3-1-8b-instruct/"
llm = HuggingFaceEndpoint(endpoint_url=URL)

response = llm.invoke(
    """<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
)

print(response)
# Output: I'm doing well, thank you for asking. How can I assist you today?

llm = ChatOpenAI(base_url=URL + "/v1")
llm.invoke("Hi, how are you?").pretty_print()
# ================================== Ai Message ==================================
#
# I'm doing well, thank you for asking. I'm a large language model, so I don't have emotions or 
# feelings like humans do, but I'm functioning properly and ready to assist you with any questions or tasks you may have. How can I help you today?

Description

I have an instance of Text Generation Inference (TGI) deployed on my own server and want to use the Messages API. I have now tried this in several different ways, each with their own issues. I think my use case is not unusual, but it seems like it does not have a clear home. Here is what I am struggling with:

There used to be a class called HuggingFaceTextGenInference, which has been deprecated a while ago with the recommendation to use HuggingFaceEndpoint instead. HuggingFaceEndpoint works fine if we just want to use the /generate endpoints of TGI. However, if we want to use a chat (a.k.a. instruction-tuned) model with those endpoints, we have to manage the chat template ourselves. This quickly becomes tedious and breaks many integrations in LangChain.

langchain_huggingface offers a wrapper class ChatHuggingFace around a HuggingFaceEndpoint, looks for the model's chat template on HuggingFace Hub, downloads it and applies it to each message before passing it to the wrapped model. So the template is applied client-side, and HuggingFace Hub credentials are required.

Since version 1.4.0, TGI offers the Messages API, which applies a chat template automatically on the server side, if required. It is (supposedly) compatible with OpenAI's API and thus uses /v1/chat/completions endpoints. It appears that this is not currently supported by any classes in langchain_huggingface. Because it is (supposedly, see below) compatible with OpenAI's API, TGI officially recommends using OpenAI's Python client. You can also use langchain_openai's ChatOpenAI, set the base_url parameter appropriately, and it works.

That is, until you try to use tools. TGI returns the tool arguments slightly differently (as a dict instead of a str), which breaks ChatOpenAI (but not OpenAI's own Python client, interestingly).

I tried to fix this with PR #27523, but it was rejected because it was considered an issue to be taken up in langchain_huggingface.

I don't know what to do here, really. This is a relatively minor tweak in langchain_core (see PR), but a major lift in langchain_huggingface because it does not support the Messages API at all, yet. It's also a more general issue not just specific to OpenAI, as is also evident by the fact that the output parser that needed fixing is in langchain_core.

Am I missing anything here? Please let me know if I do.

I am trying to get some visibility for the issue, because it is very much holding me back right now.

System Info

System Information

OS: Darwin
OS Version: Darwin Kernel Version 22.5.0: Thu Jun 8 22:22:20 PDT 2023; root:xnu-8796.121.3~7/RELEASE_ARM64_T6000
Python Version: 3.12.6 (main, Sep 6 2024, 19:03:47) [Clang 15.0.0 (clang-1500.1.0.2.5)]

Package Information

langchain_core: 0.3.12
langchain: 0.3.4
langchain_community: 0.3.3
langsmith: 0.1.136
langchain_huggingface: 0.1.0
langchain_ollama: 0.2.0
langchain_openai: 0.2.3
langchain_text_splitters: 0.3.0
langgraph: 0.2.39

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.10.5
async-timeout: Installed. No version info available.
dartmouth-auth: 0.0.3
dataclasses-json: 0.6.7
httpx: 0.27.2
huggingface-hub: 0.24.6
jsonpatch: 1.33
langgraph-checkpoint: 2.0.1
langgraph-sdk: 0.1.33
numpy: 1.26.4
ollama: 0.3.3
openai: 1.52.0
orjson: 3.10.9
packaging: 24.1
pydantic: 2.9.2
pydantic-settings: 2.6.0
pytest;: Installed. No version info available.
python-dotenv: 1.0.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
sentence-transformers: 3.0.1
sphinx-rtd-theme;: Installed. No version info available.
sphinx;: Installed. No version info available.
SQLAlchemy: 2.0.32
tenacity: 9.0.0
text-generation: 0.7.0
tiktoken: 0.7.0
tokenizers: 0.19.1
transformers: 4.44.2
typing-extensions: 4.12.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text Generation Inference's Messages API: `HuggingFaceEndpoint`, `ChatHuggingFace`, or `ChatOpenAI`? #27561

{{title}}

Replies: 0 comments

Select a reply

Text Generation Inference's Messages API: HuggingFaceEndpoint, ChatHuggingFace, or ChatOpenAI? #27561

Simon-Stone Oct 22, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Optional packages not installed

Other Dependencies

Replies: 0 comments

Text Generation Inference's Messages API: `HuggingFaceEndpoint`, `ChatHuggingFace`, or `ChatOpenAI`? #27561

Simon-Stone
Oct 22, 2024