-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add GAIA benchmark #1181
feat: Add GAIA benchmark #1181
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liuxukun2000 ! Left some comments and added commit here:d76f284
d8ce302
feel free check the change, I will merge the PR first, let me know if you have any further question
examples/benchmarks/gaia.py
Outdated
|
||
|
||
toolkit = CodeExecutionToolkit(verbose=True) | ||
runtime = DockerRuntime("xukunliu/camel").add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use RemoteHttpRuntime
instead for easier set up
camel/benchmarks/gaia.py
Outdated
""" | ||
return self.run_vector_retriever(query, contents, **kwargs) # type: ignore[arg-type] | ||
|
||
def reset(self, **kwargs: Dict[str, Any]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def reset(self, **kwargs: Dict[str, Any]) -> bool: | |
def reset(self, **kwargs: Any) -> bool: |
camel/benchmarks/gaia.py
Outdated
bool: Whether the reset was successful. | ||
""" | ||
path = Path(self.vector_storage_local_path or os.getcwd()) | ||
task_id = str(kwargs.get("task_id", uuid.uuid1())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use uuid4
def run( # type: ignore[override] | ||
self, | ||
agent: ChatAgent, | ||
on: Literal["train", "valid", "test"], | ||
level: Union[int, List[int], Literal["all"]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method could be refactored for better readability and maintainence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liuxukun2000 !
Description
task #640
Motivation and Context
Why is this change required? What problem does it solve?
add GAIA as a agent evaluation benchmark
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!