You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a set of models which are not supported in the Invoke APIs anymore and support streaming. These can be used via converse API, but streaming is affected as similar to Issue 240, on_llm_new_token doesn't seem to be called so you get the response as a full generated text instead of streaming tokens:
import boto3
from langchain_aws import ChatBedrock
from langchain.callbacks.base import BaseCallbackHandler
from langchain_core.prompts import ChatPromptTemplate
streaming = True
session = boto3.session.Session()
bedrock_client = session.client("bedrock-runtime", region_name="us-east-1")
class MyCustomHandler(BaseCallbackHandler):
def on_llm_new_token(self, token: str, **kwargs) -> None:
print(f"My custom handler, token: {token}")
prompt = ChatPromptTemplate.from_messages(["Tell me a joke about {animal}"])
model = ChatBedrock(
client=bedrock_client,
model_id="amazon.nova-micro-v1:0",
streaming = streaming,
callbacks=[MyCustomHandler()],
beta_use_converse_api = True
)
chain = prompt | model
response = chain.invoke({"animal": "bears"}) #
print(response)
Response:
content="Sure, here's a light-hearted bear joke for you:\n\nWhy did the bear put on sunglasses?\n\nBecause it was feeling a little sun-bear!\n\nHope that brought a smile to your face!" additional_kwargs={} response_metadata={'ResponseMetadata': {'RequestId': '<requestId>', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Mon, 20 Jan 2025 22:23:20 GMT', 'content-type': 'application/json', 'content-length': '354', 'connection': 'keep-alive', 'x-amzn-requestid': '<>'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [350]}} id='<>' usage_metadata={'input_tokens': 6, 'output_tokens': 41, 'total_tokens': 47}
Expected response should look like this:
My custom handler, token: Sure,
My custom handler, token: here
My custom handler, token: 's
My custom handler, token: a
...
...
Right now, with Langchain-aws you can't use amazon micro and stream the responses. But other models are also affected by this. Is there an alternative way that is being proposed? Checked the ChatBedrockConverse documentation, and stream method is available, but its usage with the callback handler is probably what's missing?
for chunk in model.stream(messages):
print(chunk)
The text was updated successfully, but these errors were encountered:
supreetkt
changed the title
Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API]
Streaming with ChatBedrock: on_llm_new_token doesn't seem to be called [Converse API] - Affects newer models
Jan 23, 2025
There are a set of models which are not supported in the Invoke APIs anymore and support streaming. These can be used via converse API, but streaming is affected as similar to Issue 240,
on_llm_new_token
doesn't seem to be called so you get the response as a full generated text instead of streaming tokens:Response:
Expected response should look like this:
Right now, with Langchain-aws you can't use amazon micro and stream the responses. But other models are also affected by this. Is there an alternative way that is being proposed? Checked the ChatBedrockConverse documentation, and
stream
method is available, but its usage with the callback handler is probably what's missing?The text was updated successfully, but these errors were encountered: