Python: Add support for streaming OpenAI assistants (#9055)

### Motivation and Context OpenAI assistants were introduced several versions ago; however, there was open work to support streaming for OpenAI assistants v2. Work to support streaming assistants was also needed in Agent Group Chat scenarios.  ### Description This PR introduces: - Support for streaming with (Azure) OpenAI assistants - Support for Agent Chat Scenarios that can now leverage a streaming invoke instead of just a sync invoke. - Unit tests to increase the test coverage to near 100% for the newly added code - Adds a few more OpenAI assistant agent samples - Closes #7267  ### Contribution Checklist  - [X] The code builds clean without any errors or warnings - [X] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [X] All unit tests pass, and I have added new tests where possible - [X] I didn't break anyone 😄
microsoft · Oct 2, 2024 · f08cf3c · f08cf3c
1 parent ca17571
commit f08cf3c
Show file tree

Hide file tree

Showing 33 changed files with 1,791 additions and 56 deletions.
diff --git a/docs/PLANNERS.md b/docs/PLANNERS.md
@@ -2,4 +2,4 @@
 
 This document has been moved to the Semantic Kernel Documentation site. You can find it by navigating to the [Automatically orchestrate AI with planner](https://learn.microsoft.com/en-us/semantic-kernel/ai-orchestration/planner) page.
 
-To make an update on the page, file a PR on the [docs repo.](https://github.com/MicrosoftDocs/semantic-kernel-docs/blob/main/semantic-kernel/ai-orchestration/planner.md)
+To make an update on the page, file a PR on the [docs repo.](https://github.com/MicrosoftDocs/semantic-kernel-docs/blob/main/semantic-kernel/concepts/planning.md)
diff --git a/python/samples/concepts/README.md b/python/samples/concepts/README.md
@@ -41,6 +41,12 @@ In Semantic Kernel for Python, we leverage Pydantic Settings to manage configura
 3. **Direct Constructor Input:**
    - As an alternative to environment variables and `.env` files, you can pass the required settings directly through the constructor of the AI Connector or Memory Connector.
 
+## Microsoft Entra Token Authentication
+
+To authenticate to your Azure resources using a Microsoft Entra Authentication Token, the `AzureChatCompletion` AI Service connector now supports this as a built-in feature. If you do not provide an API key -- either through an environment variable, a `.env` file, or the constructor -- and you also do not provide a custom `AsyncAzureOpenAI` client, an `ad_token`, or an `ad_token_provider`, the `AzureChatCompletion` connector will attempt to retrieve a token using the [`DefaultAzureCredential`](https://learn.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python).
+
+To successfully retrieve and use the Entra Auth Token, you need the `Cognitive Services OpenAI Contributor` role assigned to your Azure OpenAI resource. By default, the `https://cognitiveservices.azure.com` token endpoint is used. You can override this endpoint by setting an environment variable `.env` variable as `AZURE_OPENAI_TOKEN_ENDPOINT` or by passing a new value to the `AzureChatCompletion` constructor as part of the `AzureOpenAISettings`.
+
 ## Best Practices
 
 - **.env File Placement:** We highly recommend placing the `.env` file in the `semantic-kernel/python` root directory. This is a common practice when developing in the Semantic Kernel repository.

diff --git a/python/samples/concepts/agents/README.md b/python/samples/concepts/agents/README.md
@@ -0,0 +1,37 @@
+# Semantic Kernel: Agent concept examples
+
+This project contains a step by step guide to get started with _Semantic Kernel Agents_ in Python.
+
+#### PyPI:
+- For the use of Chat Completion agents, the minimum allowed Semantic Kernel pypi version is 1.3.0.
+- For the use of OpenAI Assistant agents, the minimum allowed Semantic Kernel pypi version is 1.4.0.
+- For the use of Agent Group Chat, the minimum allowed Semantic kernel pypi version is 1.6.0.
+- For the use of Streaming OpenAI Assistant agents, the minimum allowed Semantic Kernel pypi version is 1.11.0
+
+#### Source
+
+- [Semantic Kernel Agent Framework](../../../semantic_kernel/agents/)
+
+## Examples
+
+The concept agents examples are grouped by prefix:
+
+Prefix|Description
+---|---
+assistant|How to use agents based on the [Open AI Assistant API](https://platform.openai.com/docs/assistants).
+chat_completion|How to use Semantic Kernel Chat Completion agents.
+mixed_chat|How to combine different agent types.
+complex_chat|**Coming Soon**
+
+*Note: As we strive for parity with .NET, more getting_started_with_agent samples will be added. The current steps and names may be revised to further align with our .NET counterpart.*
+
+## Configuring the Kernel
+
+Similar to the Semantic Kernel Python concept samples, it is necessary to configure the secrets
+and keys used by the kernel. See the follow "Configuring the Kernel" [guide](../README.md#configuring-the-kernel) for
+more information.
+
+## Running Concept Samples
+
+Concept samples can be run in an IDE or via the command line. After setting up the required api key or token authentication
+for your AI connector, the samples run without any extra command line arguments.
diff --git a/python/samples/concepts/agents/assistant_agent_chart_maker.py b/python/samples/concepts/agents/assistant_agent_chart_maker.py
@@ -4,6 +4,7 @@
 from semantic_kernel.agents.open_ai import AzureAssistantAgent, OpenAIAssistantAgent
 from semantic_kernel.contents.chat_message_content import ChatMessageContent
 from semantic_kernel.contents.file_reference_content import FileReferenceContent
+from semantic_kernel.contents.streaming_file_reference_content import StreamingFileReferenceContent
 from semantic_kernel.contents.utils.author_role import AuthorRole
 from semantic_kernel.kernel import Kernel
 
@@ -19,6 +20,8 @@
 # Note: you may toggle this to switch between AzureOpenAI and OpenAI
 use_azure_openai = True
 
+streaming = True
+
 
 # A helper method to invoke the agent with the user input
 async def invoke_agent(agent: OpenAIAssistantAgent, thread_id: str, input: str) -> None:
@@ -27,14 +30,29 @@ async def invoke_agent(agent: OpenAIAssistantAgent, thread_id: str, input: str)
 
     print(f"# {AuthorRole.USER}: '{input}'")
 
-    async for message in agent.invoke(thread_id=thread_id):
-        if message.content:
-            print(f"# {message.role}: {message.content}")
-
-        if len(message.items) > 0:
-            for item in message.items:
-                if isinstance(item, FileReferenceContent):
-                    print(f"\n`{message.role}` => {item.file_id}")
+    if streaming:
+        first_chunk = True
+        async for message in agent.invoke_stream(thread_id=thread_id):
+            if message.content:
+                if first_chunk:
+                    print(f"# {message.role}: ", end="", flush=True)
+                    first_chunk = False
+                print(message.content, end="", flush=True)
+
+            if len(message.items) > 0:
+                for item in message.items:
+                    if isinstance(item, StreamingFileReferenceContent):
+                        print(f"\n# {message.role} => {item.file_id}")
+        print()
+    else:
+        async for message in agent.invoke(thread_id=thread_id):
+            if message.content:
+                print(f"# {message.role}: {message.content}")
+
+            if len(message.items) > 0:
+                for item in message.items:
+                    if isinstance(item, FileReferenceContent):
+                        print(f"\n`{message.role}` => {item.file_id}")
 
 
 async def main():

diff --git a/python/samples/concepts/agents/assistant_agent_retrieval.py b/python/samples/concepts/agents/assistant_agent_retrieval.py
@@ -17,7 +17,7 @@
 AGENT_INSTRUCTIONS = "You are a funny comedian who loves telling G-rated jokes."
 
 # Note: you may toggle this to switch between AzureOpenAI and OpenAI
-use_azure_openai = False
+use_azure_openai = True
 
 
 # A helper method to invoke the agent with the user input

diff --git a/python/samples/concepts/agents/assistant_agent_streaming.py b/python/samples/concepts/agents/assistant_agent_streaming.py
@@ -0,0 +1,110 @@
+# Copyright (c) Microsoft. All rights reserved.
+import asyncio
+from typing import Annotated
+
+from semantic_kernel.agents.open_ai import AzureAssistantAgent, OpenAIAssistantAgent
+from semantic_kernel.contents.chat_message_content import ChatMessageContent
+from semantic_kernel.contents.utils.author_role import AuthorRole
+from semantic_kernel.functions.kernel_function_decorator import kernel_function
+from semantic_kernel.kernel import Kernel
+
+#####################################################################
+# The following sample demonstrates how to create an OpenAI         #
+# assistant using either Azure OpenAI or OpenAI. OpenAI Assistants  #
+# allow for function calling, the use of file search and a          #
+# code interpreter. Assistant Threads are used to manage the        #
+# conversation state, similar to a Semantic Kernel Chat History.    #
+# This sample also demonstrates the Assistants Streaming            #
+# capability and how to manage an Assistants chat history.          #
+#####################################################################
+
+HOST_NAME = "Host"
+HOST_INSTRUCTIONS = "Answer questions about the menu."
+
+# Note: you may toggle this to switch between AzureOpenAI and OpenAI
+use_azure_openai = True
+
+
+# Define a sample plugin for the sample
+class MenuPlugin:
+    """A sample Menu Plugin used for the concept sample."""
+
+    @kernel_function(description="Provides a list of specials from the menu.")
+    def get_specials(self) -> Annotated[str, "Returns the specials from the menu."]:
+        return """
+        Special Soup: Clam Chowder
+        Special Salad: Cobb Salad
+        Special Drink: Chai Tea
+        """
+
+    @kernel_function(description="Provides the price of the requested menu item.")
+    def get_item_price(
+        self, menu_item: Annotated[str, "The name of the menu item."]
+    ) -> Annotated[str, "Returns the price of the menu item."]:
+        return "$9.99"
+
+
+# A helper method to invoke the agent with the user input
+async def invoke_agent(
+    agent: OpenAIAssistantAgent, thread_id: str, input: str, history: list[ChatMessageContent]
+) -> None:
+    """Invoke the agent with the user input."""
+    message = ChatMessageContent(role=AuthorRole.USER, content=input)
+    await agent.add_chat_message(thread_id=thread_id, message=message)
+
+    # Add the user message to the history
+    history.append(message)
+
+    print(f"# {AuthorRole.USER}: '{input}'")
+
+    first_chunk = True
+    async for content in agent.invoke_stream(thread_id=thread_id, messages=history):
+        if content.role != AuthorRole.TOOL:
+            if first_chunk:
+                print(f"# {content.role}: ", end="", flush=True)
+                first_chunk = False
+            print(content.content, end="", flush=True)
+    print()
+
+
+async def main():
+    # Create the instance of the Kernel
+    kernel = Kernel()
+
+    # Add the sample plugin to the kernel
+    kernel.add_plugin(plugin=MenuPlugin(), plugin_name="menu")
+
+    # Create the OpenAI Assistant Agent
+    service_id = "agent"
+    if use_azure_openai:
+        agent = await AzureAssistantAgent.create(
+            kernel=kernel, service_id=service_id, name=HOST_NAME, instructions=HOST_INSTRUCTIONS
+        )
+    else:
+        agent = await OpenAIAssistantAgent.create(
+            kernel=kernel, service_id=service_id, name=HOST_NAME, instructions=HOST_INSTRUCTIONS
+        )
+
+    thread_id = await agent.create_thread()
+
+    history: list[ChatMessageContent] = []
+
+    try:
+        await invoke_agent(agent, thread_id=thread_id, input="Hello", history=history)
+        await invoke_agent(agent, thread_id=thread_id, input="What is the special soup?", history=history)
+        await invoke_agent(agent, thread_id=thread_id, input="What is the special drink?", history=history)
+        await invoke_agent(agent, thread_id=thread_id, input="Thank you", history=history)
+    finally:
+        await agent.delete_thread(thread_id)
+        await agent.delete()
+
+    # You may then view the conversation history
+    print("========= Conversation History =========")
+    for content in history:
+        if content.role != AuthorRole.TOOL:
+            print(f"# {content.role}: {content.content}")
+    print("========= End of Conversation History =========")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/python/samples/concepts/agents/mixed_chat_streaming.py b/python/samples/concepts/agents/mixed_chat_streaming.py
@@ -0,0 +1,95 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+import asyncio
+
+from semantic_kernel.agents import AgentGroupChat, ChatCompletionAgent
+from semantic_kernel.agents.open_ai import OpenAIAssistantAgent
+from semantic_kernel.agents.strategies.termination.termination_strategy import TerminationStrategy
+from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
+from semantic_kernel.contents.chat_message_content import ChatMessageContent
+from semantic_kernel.contents.utils.author_role import AuthorRole
+from semantic_kernel.kernel import Kernel
+
+#####################################################################
+# The following sample demonstrates how to create an OpenAI         #
+# assistant using either Azure OpenAI or OpenAI, a chat completion  #
+# agent and have them participate in a group chat to work towards   #
+# the user's requirement.                                           #
+#####################################################################
+
+
+class ApprovalTerminationStrategy(TerminationStrategy):
+    """A strategy for determining when an agent should terminate."""
+
+    async def should_agent_terminate(self, agent, history):
+        """Check if the agent should terminate."""
+        return "approved" in history[-1].content.lower()
+
+
+REVIEWER_NAME = "ArtDirector"
+REVIEWER_INSTRUCTIONS = """
+You are an art director who has opinions about copywriting born of a love for David Ogilvy.
+The goal is to determine if the given copy is acceptable to print.
+If so, state that it is approved. Only include the word "approved" if it is so.
+If not, provide insight on how to refine suggested copy without example.
+"""
+
+COPYWRITER_NAME = "CopyWriter"
+COPYWRITER_INSTRUCTIONS = """
+You are a copywriter with ten years of experience and are known for brevity and a dry humor.
+The goal is to refine and decide on the single best copy as an expert in the field.
+Only provide a single proposal per response.
+You're laser focused on the goal at hand.
+Don't waste time with chit chat.
+Consider suggestions when refining an idea.
+"""
+
+
+def _create_kernel_with_chat_completion(service_id: str) -> Kernel:
+    kernel = Kernel()
+    kernel.add_service(AzureChatCompletion(service_id=service_id))
+    return kernel
+
+
+async def main():
+    try:
+        agent_reviewer = ChatCompletionAgent(
+            service_id="artdirector",
+            kernel=_create_kernel_with_chat_completion("artdirector"),
+            name=REVIEWER_NAME,
+            instructions=REVIEWER_INSTRUCTIONS,
+        )
+
+        agent_writer = await OpenAIAssistantAgent.create(
+            service_id="copywriter",
+            kernel=Kernel(),
+            name=COPYWRITER_NAME,
+            instructions=COPYWRITER_INSTRUCTIONS,
+        )
+
+        chat = AgentGroupChat(
+            agents=[agent_writer, agent_reviewer],
+            termination_strategy=ApprovalTerminationStrategy(agents=[agent_reviewer], maximum_iterations=10),
+        )
+
+        input = "a slogan for a new line of electric cars."
+
+        await chat.add_chat_message(ChatMessageContent(role=AuthorRole.USER, content=input))
+        print(f"# {AuthorRole.USER}: '{input}'")
+
+        last_agent = None
+        async for message in chat.invoke_stream():
+            if message.content is not None:
+                if last_agent != message.name:
+                    print(f"\n# {message.name}: ", end="", flush=True)
+                    last_agent = message.name
+                print(f"{message.content}", end="", flush=True)
+
+        print()
+        print(f"# IS COMPLETE: {chat.is_complete}")
+    finally:
+        await agent_writer.delete()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/python/samples/getting_started/CONFIGURING_THE_KERNEL.md b/python/samples/getting_started/CONFIGURING_THE_KERNEL.md
@@ -61,3 +61,7 @@ chat_completion = AzureChatCompletion(service_id="test", env_file_path=env_file_
 
 - Manually configure the `api_key` or required parameters on either the `OpenAIChatCompletion` or `AzureChatCompletion` constructor with keyword arguments.
 - This requires the user to manage their own keys/secrets as they aren't relying on the underlying environment variables or `.env` file.
+
+### 4. Microsoft Entra Authentication
+
+To learn how to use a Microsoft Entra Authentication token to authenticate to your Azure OpenAI resource, please navigate to the following [guide](../concepts/README.md#microsoft-entra-token-authentication).
diff --git a/python/samples/getting_started_with_agents/README.md b/python/samples/getting_started_with_agents/README.md
@@ -6,6 +6,7 @@ This project contains a step by step guide to get started with _Semantic Kernel
 - For the use of Chat Completion agents, the minimum allowed Semantic Kernel pypi version is 1.3.0.
 - For the use of OpenAI Assistant agents, the minimum allowed Semantic Kernel pypi version is 1.4.0.
 - For the use of Agent Group Chat, the minimum allowed Semantic kernel pypi version is 1.6.0.
+- For the use of Streaming OpenAI Assistant agents, the minimum allowed Semantic Kernel pypi version is 1.11.0
 
 #### Source
 

diff --git a/python/samples/getting_started_with_agents/step8_assistant_vision.py b/python/samples/getting_started_with_agents/step8_assistant_vision.py
@@ -35,16 +35,29 @@ def create_message_with_image_reference(input: str, file_id: str) -> ChatMessage
     )
 
 
+streaming = False
+
+
 # A helper method to invoke the agent with the user input
 async def invoke_agent(agent: OpenAIAssistantAgent, thread_id: str, message: ChatMessageContent) -> None:
     """Invoke the agent with the user input."""
     await agent.add_chat_message(thread_id=thread_id, message=message)
 
     print(f"# {AuthorRole.USER}: '{message.items[0].text}'")
 
-    async for content in agent.invoke(thread_id=thread_id):
-        if content.role != AuthorRole.TOOL:
-            print(f"# {content.role}: {content.content}")
+    if streaming:
+        first_chunk = True
+        async for content in agent.invoke_stream(thread_id=thread_id):
+            if content.role != AuthorRole.TOOL:
+                if first_chunk:
+                    print(f"# {content.role}: ", end="", flush=True)
+                    first_chunk = False
+                print(content.content, end="", flush=True)
+        print()
+    else:
+        async for content in agent.invoke(thread_id=thread_id):
+            if content.role != AuthorRole.TOOL:
+                print(f"# {content.role}: {content.content}")
 
 
 async def main():