Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

Open
supreetkt opened this issue Jan 20, 2025 · 0 comments

Comments

@supreetkt
Copy link

supreetkt commented Jan 20, 2025

There are a set of models which are not supported in the Invoke APIs anymore and support streaming. These can be used via converse API, but streaming is affected as similar to Issue 240, on_llm_new_token doesn't seem to be called so you get the response as a full generated text instead of streaming tokens:

import boto3
from langchain_aws import ChatBedrock
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

streaming = True
session = boto3.session.Session()
bedrock_client = session.client("bedrock-runtime", region_name="us-east-1")

class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")

prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])

model = ChatBedrock(
    client=bedrock_client, 
    model_id="amazon.nova-micro-v1:0",
    streaming = streaming, 
    callbacks=[MyCustomHandler()],
    beta_use_converse_api = True
)

chain = prompt | model

response = chain.invoke({"animal": "bears"}) # 
print(response)

Response:

content="Sure, here's a light-hearted bear joke for you:\n\nWhy did the bear put on sunglasses?\n\nBecause it was feeling a little sun-bear!\n\nHope that brought a smile to your face!" additional_kwargs={} response_metadata={'ResponseMetadata': {'RequestId': '<requestId>', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 20 Jan 2025 22:23:20 GMT', 'content-type': 'application/json', 'content-length': '354', 'connection': 'keep-alive', 'x-amzn-requestid': '<>'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [350]}} id='<>' usage_metadata={'input_tokens': 6, 'output_tokens': 41, 'total_tokens': 47}

Expected response should look like this:

My custom handler, token: Sure,
My custom handler, token:  here
My custom handler, token: 's
My custom handler, token:  a
...
...

Right now, with Langchain-aws you can't use amazon micro and stream the responses. But other models are also affected by this. Is there an alternative way that is being proposed? Checked the ChatBedrockConverse documentation, and stream method is available, but its usage with the callback handler is probably what's missing?

for chunk in model.stream(messages):
    print(chunk)
@langcarl langcarl bot added the investigate label Jan 20, 2025
@supreetkt supreetkt changed the title Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant