-
Notifications
You must be signed in to change notification settings - Fork 16.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HuggingFaceTextGenInference Streaming does not output #15516
Comments
🤖 Hello @PeterTucker, I'm Dosu, a friendly bot here to assist you with LangChain related issues, answer questions, and guide you in becoming a contributor. While we're waiting for a human maintainer, I'm here to help you get a head start on your issue. Let's dive in, shall we? Based on the information you've provided, it seems like the issue might be related to the handling of special tokens or stop sequences in the In the Here is the relevant part of the code: # identify text to yield
text: Optional[str] = None
if res.token.special:
text = None
elif stop_seq_found:
text = res.token.text[: res.token.text.index(stop_seq_found)]
else:
text = res.token.text
# yield text, if any
if text:
chunk = GenerationChunk(text=text)
yield chunk
if run_manager:
run_manager.on_llm_new_token(chunk.text)
# break if stop sequence found
if stop_seq_found:
break This code is part of the If the tokens being generated are special or contain a stop sequence, you might want to adjust your parameters or the text you're feeding into the model. If this doesn't solve your issue, could you please provide more information about the specific conditions under which this issue occurs? For example, does this happen for all prompts or only specific ones? Any additional information would be helpful in diagnosing the issue. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@dosu-bot
Output:
|
🤖 Based on the information you've provided, it seems like you're correctly using the In your current implementation, you're using a synchronous generator function Here's how you can modify your @app.get("/chat")
async def chat():
prompt = {"prompt":"What is the Nickelodeon channel?"}
# Generate the response using the LLM Chain and stream the output
async def generate():
async for text in llm_chain.run(prompt):
yield text
return StreamingResponse(generate(), media_type="text/plain") Please note that the Regarding the issue you found in the LangChain repository (openai tools don't work with streaming=True), it seems to be related to a different problem where the I hope this helps! If you have any other questions or if this doesn't solve your problem, please let me know. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@dosu-bot @hwchase17 Wondering if it might be an issue similar to this: |
🤖 I'm unable to access external links or specific pull requests from the LangChain repository. However, if you could provide me with more details or context about the issue you're encountering, I'd be more than happy to help troubleshoot and provide guidance. This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@dosu-bot Your code did not work @app.get("/chat")
async def chat():
prompt = {"prompt":"What is the Nickelodeon channel?"}
# Generate the response using the LLM Chain and stream the output
async def generate():
async for text in llm_chain.run(prompt):
yield text
return StreamingResponse(generate(), media_type="text/plain") This is the error I received:
|
🤖 Based on the information you've provided and the context from the LangChain repository, it seems like the issue might be due to the return type of the From the context, it appears that the If the To resolve this issue, you would need to ensure that the Here's an example of how you might modify the async def run(self, prompt, streaming=False):
if streaming:
# If streaming, yield results one at a time
for result in self._run(prompt):
yield result
else:
# If not streaming, return all results at once
return self._run(prompt) In this example, Please note that this is just a suggestion based on the provided context and the actual solution might differ based on the exact implementation of the Regarding the pull request you mentioned, without more context about the changes made in that pull request, it's hard to say whether it's related to the issue you're experiencing. If the pull request modifies the I hope this helps! If you have any more questions or need further clarification, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Just to close this, it was a mistake on my end. URL should have had '_stream' on the end. wompidy womp
|
System Info
langchain==0.0.354
text_generation==0.6.1
python:3.10-slim
Who can help?
@agola11 @hwaking
Information
Related Components
Reproduction
Model: TheBloke/Llama-2-7B-Chat-GPTQ, but I've also tried TheBloke/Mistral-7B-OpenOrca-GPTQ
FastAPI example with HuggingFaceTextGenInference streaming:
Output:
Expected behavior
HuggingFaceTextGenInference does not return any streaming data. Works fine when streaming=False in parameters.
The text was updated successfully, but these errors were encountered: