Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAI Assistants Agent #4131

Merged

Conversation

lspinheiro
Copy link
Collaborator

Why are these changes needed?

Related issue number

Checks

@lspinheiro
Copy link
Collaborator Author

@ekzhu @jackgerrits , this is a very early draft, I have some questions before proceeding further.

  1. What to do w.r.t. model client? The chat completion client abstraction doesn't seem to fit well because it seems to have some assumptions about the handling of messages in the interface that the assistants api has a very different approach of handling with the threads (I have spent a lot of time trying to adapt it without success). I'm also not sure if we can generate some general interface for agent-like apis, should I even create a specific one in autogen_ext to abstract away the openai sdk? I'm not sure what would be the value in that but it also feels like I'm adding an implementation without ap roper standard/abstraction.

  2. How do we want to handle file search, specially ingestion? Also seems like something we don't have a strong abstraction for. I'm not sure if it should fit into how we will integrate rag or not. I also don't know if we want file ingestion to be part of the agent interaction api through on messages or not (maybe just a separate method to be used in the agent set up before running the chat?)

  3. The next step for me is to map the tool calling to the autogen core framework. It looks like the azure openai api has integration with logic apps to actually call functions as tools. Should this be a future functionality in autogen_ext?

@ekzhu
Copy link
Collaborator

ekzhu commented Nov 11, 2024

Thanks. I think we can follow the design in the Core cookbook for open ai assistant agent: https://microsoft.github.io/autogen/dev/user-guide/core-user-guide/cookbook/openai-assistant-agent.html. The API should be simple without introducing additional abstractions on our side.

class OpenAIAssistantAgent:
  name: str
  description: str
  client: openai.AsyncClient
  assistant_id: str
  thread_id: str
  tools: List[Tools] | None = None,
  code_interperter: ... | None = None, # configuration class from OpenAI client
  file_search: ... | None = None, # Configuration class from OpenAI client

We don't need to introduce additional abstractions because OpenAI Assistant is specific to OpenAI and Azure OpenAI services -- we should stick with the official clients they provide. Furthermore, we shouldn't expect the agent to be the only interface to the assistant features such as file search, and as the application may also perform other functions such as file upload and thread management.

  1. What to do w.r.t. model client? The chat completion client abstraction doesn't seem to fit well because it seems to have some assumptions about the handling of messages in the interface that the assistants api has a very different approach of handling with the threads (I have spent a lot of time trying to adapt it without success). I'm also not sure if we can generate some general interface for agent-like apis, should I even create a specific one in autogen_ext to abstract away the openai sdk? I'm not sure what would be the value in that but it also feels like I'm adding an implementation without ap roper standard/abstraction.

Use the official openai client and do not introduce new abstractions on our side besides the new agent class.

  1. How do we want to handle file search, specially ingestion? Also seems like something we don't have a strong abstraction for. I'm not sure if it should fit into how we will integrate rag or not. I also don't know if we want file ingestion to be part of the agent interaction api through on messages or not (maybe just a separate method to be used in the agent set up before running the chat?)

This should be mostly done using the official openai client in user's application. We can potentially add new assistant tools that can use the client.

  1. The next step for me is to map the tool calling to the autogen core framework. It looks like the azure openai api has integration with logic apps to actually call functions as tools. Should this be a future functionality in autogen_ext?

We should make sure we can use our Tool class tools in this new agent.

Overall, the goal is to bring OpenAI assistant agents into our ecosystem, not to build a new wrapper around assistant API.

@ekzhu ekzhu linked an issue Nov 16, 2024 that may be closed by this pull request
@lspinheiro lspinheiro marked this pull request as ready for review November 17, 2024 04:19
@lspinheiro
Copy link
Collaborator Author

@ekzhu , this is ready for review. I tested it with the script below. I haven't added unit tests as mocking the assistants api will take a lot of effort and we dont have other use cases.

import asyncio
from enum import Enum
from typing import List, Optional
from autogen_agentchat.agents import OpenAIAssistantAgent
from autogen_core.components.tools._base import BaseTool
from openai import AsyncAzureOpenAI
from autogen_agentchat.messages import TextMessage
from autogen_core.base import CancellationToken
from pydantic import BaseModel


class QuestionType(str, Enum):
    MULTIPLE_CHOICE = "MULTIPLE_CHOICE"
    FREE_RESPONSE = "FREE_RESPONSE"

class Question(BaseModel):
    question_text: str
    question_type: QuestionType
    choices: Optional[List[str]] = None

class DisplayQuizArgs(BaseModel):
    title: str
    questions: List[Question]

# Step 2: Create the Tool class by subclassing BaseTool

class DisplayQuizTool(BaseTool[DisplayQuizArgs, List[str]]):
    def __init__(self):
        super().__init__(
            args_type=DisplayQuizArgs,
            return_type=List[str],
            name="display_quiz",
            description=(
                "Displays a quiz to the student and returns the student's responses. "
                "A single quiz can have multiple questions."
            ),
        )

    # Step 3: Implement the run method

    async def run(self, args: DisplayQuizArgs, cancellation_token: CancellationToken) -> List[str]:
        # Simulate displaying the quiz and collecting responses
        responses = []
        for q in args.questions:
            if q.question_type == QuestionType.MULTIPLE_CHOICE:
                # Simulate a response for multiple-choice questions
                response = q.choices[0] if q.choices else ""
            elif q.question_type == QuestionType.FREE_RESPONSE:
                # Simulate a response for free-response questions
                response = "Sample free response"
            else:
                response = ""
            responses.append(response)
        return responses


def create_agent(client: AsyncAzureOpenAI) -> OpenAIAssistantAgent:
    tools = [
        {"type": "code_interpreter"},
        {"type": "file_search"},
        {"type": "tool", "tool": DisplayQuizTool()},
    ]

    return OpenAIAssistantAgent(
        name="assistant",
        instructions="Help the user with their task.",
        model="gpt-4o-mini",
        description="OpenAI Assistant Agent",
        client=client,
        tools=tools,
    )



async def test_file_retrieval(agent: OpenAIAssistantAgent, cancellation_token: CancellationToken):
    file_path = r".\data\SampleBooks\jungle_book.txt"
    await agent.on_upload_for_file_search(file_path, cancellation_token)
    
    message = TextMessage(source="user", content="What is the first sentence of the jungle scout book?")
    response = await agent.on_messages([message], cancellation_token)
    print("File Retrieval Test Response:", response.chat_message.content)
    
    await agent.delete_uploaded_files(cancellation_token)
    await agent.delete_vector_store(cancellation_token)


async def test_code_interpreter(agent: OpenAIAssistantAgent, cancellation_token: CancellationToken):
    message = TextMessage(source="user", content="I need to solve the equation `3x + 11 = 14`. Can you help me?")
    response = await agent.on_messages([message], cancellation_token)
    print("Code Interpreter Test Response:", response.chat_message.content)


async def test_quiz_creation(agent: OpenAIAssistantAgent, cancellation_token: CancellationToken):
    message = TextMessage(source="user", content="Create a short quiz about basic math with one multiple choice question and one free response question.")
    response = await agent.on_messages([message], cancellation_token)
    print("Quiz Creation Test Response:", response.chat_message.content)


async def main():
    client = AsyncAzureOpenAI(
        azure_endpoint="https://{your-api-endpoint}.openai.azure.com",
        api_version="2024-08-01-preview",
        api_key="your-api-key"
    )
    
    cancellation_token = CancellationToken()

    # Test file retrieval
    agent = create_agent(client)
    await test_file_retrieval(agent, cancellation_token)
    await agent.delete_assistant(cancellation_token)

    # Wait 1 minute before next test
    await asyncio.sleep(60)

    # Test code interpreter
    agent = create_agent(client)
    await test_code_interpreter(agent, cancellation_token) 
    await agent.delete_assistant(cancellation_token)

    # Wait 1 minute before next test
    await asyncio.sleep(60)

    # Test quiz creation
    agent = create_agent(client)
    await test_quiz_creation(agent, cancellation_token)
    await agent.delete_assistant(cancellation_token)


if __name__ == "__main__":
    asyncio.run(main())

Here is the output of the tool call inspection

-> run = await cancellation_token.link_future(
(Pdb) tool_outputs
[FunctionExecutionResult(content="['4', 'Sample free response']", call_id='call_mx2T4F0niQvJ0FiGBqsMGZXl'), FunctionExecutionResult(content="['4', 'Sample free response']", call_id='call_WW4D3OtQteig5BwH5FOZhdgL')]
(Pdb) c
Quiz Creation Test Response: I have created a short quiz about basic math. Here it is:

### Basic Math Quiz

1. **What is 15 divided by 3?**
   - A) 4
   - B) 5
   - C) 6
   - D) 7

2. **What is the square root of 64?**
   - (Free Response)

@lspinheiro lspinheiro changed the title WIP - OpenAI Assistants Agent OpenAI Assistants Agent Nov 18, 2024
@lspinheiro lspinheiro merged commit df32d5e into microsoft:main Nov 18, 2024
41 checks passed
@lspinheiro lspinheiro deleted the lpinheiro/feat/add-openai-assistants-agent branch November 18, 2024 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create OpenAI assisant agent
3 participants