Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

supreetkt · 2025-01-20T22:26:31Z

There are a set of models which are not supported in the Invoke APIs anymore and support streaming. These can be used via converse API, but streaming is affected as similar to Issue 240, on_llm_new_token doesn't seem to be called so you get the response as a full generated text instead of streaming tokens:

import boto3
from langchain_aws import ChatBedrock
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate

streaming = True
session = boto3.session.Session()
bedrock_client = session.client("bedrock-runtime", region_name="us-east-1")

class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")

prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])

model = ChatBedrock(
    client=bedrock_client, 
    model_id="amazon.nova-micro-v1:0",
    streaming = streaming, 
    callbacks=[MyCustomHandler()],
    beta_use_converse_api = True
)

chain = prompt | model

response = chain.invoke({"animal": "bears"}) # 
print(response)

Response:

content="Sure, here's a light-hearted bear joke for you:\n\nWhy did the bear put on sunglasses?\n\nBecause it was feeling a little sun-bear!\n\nHope that brought a smile to your face!" additional_kwargs={} response_metadata={'ResponseMetadata': {'RequestId': '<requestId>', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 20 Jan 2025 22:23:20 GMT', 'content-type': 'application/json', 'content-length': '354', 'connection': 'keep-alive', 'x-amzn-requestid': '<>'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [350]}} id='<>' usage_metadata={'input_tokens': 6, 'output_tokens': 41, 'total_tokens': 47}

Expected response should look like this:

My custom handler, token: Sure,
My custom handler, token:  here
My custom handler, token: 's
My custom handler, token:  a
...
...

Right now, with Langchain-aws you can't use amazon micro and stream the responses. But other models are also affected by this. Is there an alternative way that is being proposed? Checked the ChatBedrockConverse documentation, and stream method is available, but its usage with the callback handler is probably what's missing?

for chunk in model.stream(messages):
    print(chunk)

The text was updated successfully, but these errors were encountered:

langcarl bot added the investigate label Jan 20, 2025

supreetkt changed the title ~~Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API]~~ Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

supreetkt commented Jan 20, 2025 •

edited

Loading

Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models #334

Comments

supreetkt commented Jan 20, 2025 • edited Loading

supreetkt commented Jan 20, 2025 •

edited

Loading