Azure OpenAI content streaming with asynchronous filter still streams responses in large bursts #22246
-
Checked other resources
Commit to Help
Example Codefrom langchain_openai.chat_models.azure import AzureChatOpenAI
...
conversation_llm = AzureChatOpenAI(
streaming=True,
callback_manager=conversation_manager,
temperature=conversation_temperature,
deployment_name=conversation_model,
azure_endpoint=azure_api_base,
openai_api_version=azure_api_version,
openai_api_key=azure_api_key,
openai_api_type="azure"
)
...
doc_chain = load_qa_chain(
conversation_llm, chain_type="stuff", prompt=chat_prompt, callback_manager=default_manager
)
conversation_chain = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(search_type="similarity_score_threshold",
search_kwargs={"score_threshold": rag_score_threshold, "k": rag_top_k}),
combine_docs_chain=doc_chain,
question_generator=question_generator,
return_source_documents=True,
callback_manager=default_manager,
rephrase_question=False,
memory=memory,
max_tokens_limit=max_retrieval_tokens,
)
...
result = await qa_chain.ainvoke(
{
"question": question,
"chat_history": chat_history,
"xls_file_name": file_name,
"xls_survey_sheet": survey_sheet,
"xls_choices_sheet": choices_sheet,
"xls_settings_sheet": settings_sheet
}
) DescriptionI've enabled content streaming via the Azure OpenAI asynchronous filtering. (This had been restricted, but recently they opened it up.) However, streamed responses still come in in big bursts rather than continuously, with 2-10 seconds between bursts. The app UX is therefore quite poor. I have confirmed that the asynchronous filtering is enabled for my Azure OpenAI deployment, and it's working well against that deployment using the Azure OpenAI playground as the client. However, in my application, the tokens are streamed in these large bursts. My code uses a FastAPI back-end to stream responses to a React front-end, via a Websocket connection. This works perfectly well with a I should say that I'm still getting a stream of tokens with Has anybody else tried Thanks so much, Chris System InfoPackage Information
Packages not installed (Not Necessarily a Problem)The following packages were not found:
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
Yes, the @pytest.mark.scheduled
async def test_async_chat_openai_streaming() -> None:
"""Test that streaming correctly invokes on_llm_new_token callback."""
callback_handler = FakeCallbackHandler()
callback_manager = CallbackManager([callback_handler])
chat = _get_llm(
max_tokens=10,
streaming=True,
temperature=0,
callback_manager=callback_manager,
verbose=True,
)
message = HumanMessage(content="Hello")
response = await chat.agenerate([[message], [message]])
assert callback_handler.llm_streams > 0
assert isinstance(response, LLMResult)
assert len(response.generations) == 2
for generations in response.generations:
assert len(generations) == 1
for generation in generations:
assert isinstance(generation, ChatGeneration)
assert isinstance(generation.text, str)
assert generation.text == generation.message.content This test confirms that the asynchronous streaming correctly invokes the @pytest.mark.scheduled
def test_openai_streaming(llm: AzureOpenAI) -> None:
"""Test streaming tokens from AzureOpenAI."""
generator = llm.stream("I'm Pickle Rick")
assert isinstance(generator, Generator)
full_response = ""
for token in generator:
assert isinstance(token, str)
full_response += token
assert full_response
@pytest.mark.scheduled
async def test_openai_astream(llm: AzureOpenAI) -> None:
"""Test streaming tokens from AzureOpenAI."""
async for token in llm.astream("I'm Pickle Rick"):
assert isinstance(token, str)
@pytest.mark.scheduled
async def test_openai_abatch(llm: AzureOpenAI) -> None:
"""Test streaming tokens from AzureOpenAI."""
result = await llm.abatch(["I'm Pickle Rick", "I'm not Pickle Rick"])
for token in result:
assert isinstance(token, str) These tests collectively ensure that
|
Beta Was this translation helpful? Give feedback.
-
Hi @chrislrobert, Like you, I have an Am I missing anything? |
Beta Was this translation helpful? Give feedback.
-
Ah, that's good to know! Thanks for reporting back!
…On Thu, Dec 19, 2024 at 1:37 PM Johannes Schmidt ***@***.***> wrote:
Turns out, opentelemetry-instrumentation-openai was giving me high
reponse times. By removing the instrumentation, suddenly everything worked.
—
Reply to this email directly, view it on GitHub
<#22246 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACB3IHWES4BGO7QJFZ273YT2GMG7JAVCNFSM6AAAAABTYY7J4SVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCNRSGA4DIMI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
@dosu, that wasn't required.
To anybody else who struggles with this: changing the
openai_api_version
parameter from2023-05-15
to2024-02-01
resolved the issue for me. For whatever reason, the older API version wasn't supporting the newer async filter (unbuffered stream) functionality.