ChatHuggingFace cutting total tokens #24125

luizguilhermedev · 2024-07-11T15:00:02Z

luizguilhermedev
Jul 11, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_huggingface import HuggingFaceEndPoint, ChatHuggingFace

# Using LLM

llama_llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=1024,
    temperature=0.1,
    huggingfacehub_api_token=HUGGINGFACE_API_KEY
)

messages = [
          """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
          You're a helpful assistant that answer general questions.
          question: {question}
          <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
]

prompt = ChatPromptTemplate.from_messages(messages)

parser = StrOutputParser()

chain_llm = prompt | llama_llm | parser

generate_with_llm = chain_llm.invoke({"question": "What is a blackhole?"})
print(f'LLM answer:\n {generate_with_llm}\n')
print(f'Total tokens using LLM: {len(generate_with_llm)}')

# Using ChatHuggingFace

llama_chat_model = ChatHuggingFace(llm=llama_llm)

chain_chat = prompt | llama_chat_model | parser

generate_with_chat = chain_chat.invoke({"question": "What is a blackhole?"})

print(f'Chat answer:\n {generate_with_chat}\n')
print(f'Total tokens using Chat: {len(generate_with_chat)}')

Description

When I use ChatHuggingFace, the number of tokens is reduced.
It seens that it cuts the answer. It returns incomplete answers.
Here's the output:

LLM answer:

A fascinating topic! A black hole is a region in space where the gravitational pull is so strong that nothing, including light, can escape. It's formed when a massive star collapses in on itself and its gravity becomes so strong that it warps the fabric of spacetime around it.

Here's a simplified explanation:

A massive star runs out of fuel and dies.
The star collapses under its own gravity, causing a massive amount of matter to be compressed into an incredibly small point.
This point, called a singularity, has infinite density and zero volume.
The gravity of the singularity is so strong that it warps the spacetime around it, creating a boundary called the event horizon.
Once something crosses the event horizon, it's trapped by the black hole's gravity and cannot escape.

Black holes come in various sizes, ranging from small, stellar-mass black holes formed from the collapse of individual stars, to supermassive black holes found at the centers of galaxies, with masses millions or even billions of times that of the sun.

Some interesting facts about black holes:

They're invisible, as not even light can escape to reach our eyes.
They can distort spacetime, causing strange effects like gravitational lensing and time dilation.
They can even affect the motion of nearby stars and planets.
Scientists believe that black holes might be responsible for the formation of some galaxies and the distribution of matter within them.

While black holes are still a topic of ongoing research, they continue to fascinate scientists and the general public alike, offering a glimpse into the mysterious and awe-inspiring universe we inhabit.

Total tokens using LLM: 1671

Chat answer:
Black holes are among the most fascinating and mysterious objects in the universe!

A black hole is a region in space where the gravitational pull is so strong that nothing, including light, can escape. It's formed when a massive star collapses in on itself and its gravity becomes so strong that it warps the fabric of spacetime around it.

Here's a simplified explanation:

When a massive star runs out of fuel, it collapses under its own gravity, causing a massive amount of matter to be compressed

Total tokens using Chat: 504

System Info

System Information

OS: Linux
OS Version: #1 SMP PREEMPT_DYNAMIC Thu Jun 27 21:05:47 UTC 2024
Python Version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

Package Information

langchain_core: 0.2.13
langchain: 0.2.7
langchain_community: 0.2.7
langsmith: 0.1.85
langchain_huggingface: 0.0.3
langchain_text_splitters: 0.2.2

Answered by michael-newsrx

Aug 22, 2024

Try: llama_chat_model = ChatHuggingFace(llm=llama_llm).bind(max_tokens=8192, temperature=0.0)

View full answer

@luizguilhermedev · 2024-07-11T15:27:07Z

dosubot[bot]
bot Jul 11, 2024

Hello, @luizguilhermedev! I'm here to help you with any bugs, questions, or contributions you may have. Let's work on this together while we wait for a human maintainer.

The issue you're experiencing with ChatHuggingFace returning incomplete responses with fewer tokens compared to HuggingFaceEndpoint is likely due to the different token limits and handling mechanisms of the TextGenInference API used by ChatHuggingFace.

To address this, you can adjust the max_new_tokens parameter to increase the token limit and potentially prevent incomplete responses. Here is how you can modify your code:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

# Using LLM

llama_llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    max_new_tokens=2048,  # Increase this value to increase the token limit
    temperature=0.1,
    huggingfacehub_api_token=HUGGINGFACE_API_KEY
)

messages = [
    """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You're a helpful assistant that answer general questions.
    question: {question}
    <|eot_id|><|start_header_id|>assistant<|end_header_id|>"""
]

prompt = ChatPromptTemplate.from_messages(messages)

parser = StrOutputParser()

chain_llm = prompt | llama_llm | parser

generate_with_llm = chain_llm.invoke({"question": "What is a blackhole?"})
print(f'LLM answer:\n {generate_with_llm}\n')
print(f'Total tokens using LLM: {len(generate_with_llm)}')

# Using ChatHuggingFace

llama_chat_model = ChatHuggingFace(llm=llama_llm)

chain_chat = prompt | llama_chat_model | parser

generate_with_chat = chain_chat.invoke({"question": "What is a blackhole?"})

print(f'Chat answer:\n {generate_with_chat}\n')
print(f'Total tokens using Chat: {len(generate_with_chat)}')

By increasing the max_new_tokens parameter, you allow the model to generate longer responses, which can help prevent the issue of incomplete answers. Additionally, you can fine-tune other parameters such as top_k, top_p, typical_p, temperature, and repetition_penalty to further control the response generation process ^[1]^[2].

To continue talking to Dosu, mention @dosu.

0 replies

michael-newsrx · 2024-08-22T20:12:13Z

michael-newsrx
Aug 22, 2024

Try: llama_chat_model = ChatHuggingFace(llm=llama_llm).bind(max_tokens=8192, temperature=0.0)

1 reply

DebienRuben Oct 11, 2024

This works indeed, but there must be a better way to actually set max_tokens and max_new_tokens. Right now those parameters just don't do anything. It seems like all the parameters you set in the llama_llm constructor are just not used

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ChatHuggingFace cutting total tokens #24125

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

ChatHuggingFace cutting total tokens #24125

luizguilhermedev Jul 11, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

System Information

Package Information

Replies: 2 comments · 1 reply

dosubot[bot] bot Jul 11, 2024

michael-newsrx Aug 22, 2024

DebienRuben Oct 11, 2024

luizguilhermedev
Jul 11, 2024

Replies: 2 comments 1 reply

dosubot[bot]
bot Jul 11, 2024

michael-newsrx
Aug 22, 2024