Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add GAIA benchmark #1181

Merged
merged 64 commits into from
Dec 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
b03b58c
add GAIA benchmark files
Asher-hss Jul 10, 2024
1626e5a
update gaia.py
Asher-hss Jul 11, 2024
e83b91e
delete dataset
Asher-hss Jul 11, 2024
581b022
do some changes
Asher-hss Jul 11, 2024
95816e5
add type check
Asher-hss Jul 15, 2024
d73f31d
Merge branch 'master' into GAIA
Asher-hss Jul 23, 2024
1a6fc81
add RAG
Asher-hss Jul 24, 2024
f58d8af
fix some issue
Asher-hss Jul 24, 2024
fe3a9e5
update lock file
Asher-hss Jul 24, 2024
ece2925
Merge branch 'master' into GAIA
Wendong-Fan Aug 2, 2024
8b1e7e8
docs: small fixes on docstrings
WHALEEYE Aug 13, 2024
aaa06e3
Add/improve tools, explanation params, cache tool results, conversati…
CaelumF Oct 15, 2024
cc5a6ff
add docker runtime
liuxukun2000 Nov 8, 2024
8ab01cf
add agent example
liuxukun2000 Nov 8, 2024
72e6d4e
add unsafe mode and import whitelist
liuxukun2000 Nov 8, 2024
913f09f
update code_exec
liuxukun2000 Nov 8, 2024
8edd83d
add task to runtime, used to prepare env
liuxukun2000 Nov 8, 2024
adcf779
add comments
liuxukun2000 Nov 8, 2024
145a2bb
format code, add dependence
liuxukun2000 Nov 8, 2024
9d9013c
Fix the mistaken deletion
liuxukun2000 Nov 8, 2024
be62904
Merge branch 'master' into feat/runtime
liuxukun2000 Nov 8, 2024
4f7f06f
update poetry.lok
liuxukun2000 Nov 8, 2024
6bd1362
add remote http runtime
liuxukun2000 Nov 9, 2024
d538411
support get docs url
liuxukun2000 Nov 9, 2024
aaeea0b
add LLM guard runtime
liuxukun2000 Nov 9, 2024
0834942
pass test
liuxukun2000 Nov 10, 2024
fe45f63
pass precheck
liuxukun2000 Nov 10, 2024
8043e70
Merge branch 'master' into feat/runtime
liuxukun2000 Nov 10, 2024
aafc8b6
Merge branch 'master' into feat/runtime
liuxukun2000 Nov 10, 2024
8b0c359
Updated code as the comments
liuxukun2000 Nov 14, 2024
5012dfb
Merge branch 'feat/runtime' of https://github.com/camel-ai/camel into…
liuxukun2000 Nov 14, 2024
7cbaf4d
Merge branch 'master' into feat/runtime
liuxukun2000 Nov 14, 2024
563956f
update poetry.lock
liuxukun2000 Nov 14, 2024
c7d539a
update lock file
liuxukun2000 Nov 14, 2024
1250021
pass pre-commit
liuxukun2000 Nov 15, 2024
a76b9e1
Merge remote-tracking branch 'origin/GAIA' into feat/runtime
liuxukun2000 Nov 15, 2024
f9963e8
fix tools bug
liuxukun2000 Nov 15, 2024
e5c0ee3
add GAIA benchmark
liuxukun2000 Nov 15, 2024
3ee08a8
pass pre commit check
liuxukun2000 Nov 15, 2024
4da0f34
add retrieval and return results
liuxukun2000 Nov 22, 2024
5c16ccf
add doc string
liuxukun2000 Nov 22, 2024
e6a6d5f
Merge branch 'master' into feat/gaia
liuxukun2000 Nov 27, 2024
c649566
fix typo
liuxukun2000 Nov 27, 2024
41b31a7
pass precommit
liuxukun2000 Nov 27, 2024
3f8bcf7
Merge branch 'master' into feat/gaia
Wendong-Fan Nov 28, 2024
fe8c26a
Merge branch 'master' into feat/gaia
liuxukun2000 Nov 29, 2024
9dadc9a
Merge branch 'master' into feat/gaia
liuxukun2000 Nov 29, 2024
991ec47
update license
liuxukun2000 Nov 29, 2024
b65bd4d
refine gaia benchmark
liuxukun2000 Nov 30, 2024
35fbfba
format code
liuxukun2000 Nov 30, 2024
b940b85
Merge branch 'master' into feat/gaia
liuxukun2000 Dec 1, 2024
928c02f
Merge branch 'master' into feat/gaia
liuxukun2000 Dec 3, 2024
11cb8c5
Merge branch 'master' into feat/gaia
liuxukun2000 Dec 4, 2024
baff34c
add docss to benchmark, remove inject
liuxukun2000 Dec 4, 2024
6bafeae
minor format fix
Wendong-Fan Dec 4, 2024
8358e56
remove duplicated search implementation
Wendong-Fan Dec 4, 2024
19e3ff2
add question info
Wendong-Fan Dec 4, 2024
fb25266
add question and level
liuxukun2000 Dec 4, 2024
413b12e
add RetrieverProtocol
liuxukun2000 Dec 5, 2024
557e7d9
add unit test
liuxukun2000 Dec 5, 2024
14222ea
Merge branch 'master' into feat/gaia
Wendong-Fan Dec 7, 2024
d76f284
small refactor and enhancement
Wendong-Fan Dec 7, 2024
d8ce302
small enhance
Wendong-Fan Dec 7, 2024
d5785f8
update test
Wendong-Fan Dec 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions camel/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========

from .base import BaseBenchmark
from .gaia import DefaultGAIARetriever, GAIABenchmark

__all__ = ["BaseBenchmark", "GAIABenchmark", "DefaultGAIARetriever"]
152 changes: 152 additions & 0 deletions camel/benchmarks/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========

import logging
from abc import ABC, abstractmethod
from pathlib import Path
from typing import Any, Dict, List, Literal, Optional

from camel.agents import ChatAgent

logger = logging.getLogger(__name__)


class BaseBenchmark(ABC):
liuxukun2000 marked this conversation as resolved.
Show resolved Hide resolved
r"""Base class for benchmarks.

Attributes:
name (str): Name of the benchmark.
data_dir (str): Path to the data directory.
save_to (str): Path to save the results.
processes (int): Number of processes to use for parallel
processing. :(default: :obj:`1`)
"""

def __init__(
self, name: str, data_dir: str, save_to: str, processes: int = 1
):
liuxukun2000 marked this conversation as resolved.
Show resolved Hide resolved
r"""Initialize the benchmark.

Args:
name (str): Name of the benchmark.
data_dir (str): Path to the data directory.
save_to (str): Path to save the results.
processes (int): Number of processes to use for parallel
processing. :(default: :obj:`1`)

"""
self.name = name
self.data_dir = Path(data_dir)
self.processes = processes
self.save_to = save_to
if not self.data_dir.exists():
logger.info(
f"Data directory {data_dir} does not exist. Creating it."
)
self.data_dir.mkdir(parents=True, exist_ok=True)
if not self.data_dir.is_dir():
raise NotADirectoryError(
f"Data directory {data_dir} is not a directory"
)
self._data: Dict[str, List[Dict[str, Any]]] = dict()
self._results: List[Dict[str, Any]] = []

@abstractmethod
def download(self) -> "BaseBenchmark":
r"""Download the benchmark data.

Returns:
BaseBenchmark: The benchmark instance.
"""
pass

@abstractmethod
def load(self, force_download: bool = False) -> "BaseBenchmark":
r"""Load the benchmark data.

Args:
force_download (bool): Whether to force download the data.

Returns:
BaseBenchmark: The benchmark instance.
"""
pass

@property
def train(self) -> List[Dict[str, Any]]:
r"""Get the training data.

Returns:
List[Dict[str, Any]]: The training data.
"""
if not self._data:
logger.info("Data not loaded. Loading data.")
self.load()
return self._data["train"]

@property
def valid(self) -> List[Dict[str, Any]]:
r"""Get the validation data.

Returns:
List[Dict[str, Any]]: The validation data.
"""
if not self._data:
logger.info("Data not loaded. Loading data.")
self.load()
return self._data["valid"]

@property
def test(self) -> List[Dict[str, Any]]:
r"""Get the test data.

Returns:
List[Dict[str, Any]]: The test data.
"""
if not self._data:
logger.info("Data not loaded. Loading data.")
self.load()
return self._data["test"]

@abstractmethod
def run(
self,
agent: ChatAgent,
on: Literal["train", "valid", "test"],
randomize: bool = False,
subset: Optional[int] = None,
*args,
**kwargs,
) -> "BaseBenchmark":
r"""Run the benchmark.

Args:
agent (ChatAgent): The chat agent.
on (str): The data split to run the benchmark on.
randomize (bool): Whether to randomize the data.
subset (int): The subset of the data to run the benchmark on.

Returns:
BaseBenchmark: The benchmark instance.
"""
pass

@property
def results(self) -> List[Dict[str, Any]]:
r"""Get the results.

Returns:
List[Dict[str, Any]]: The results.
"""
return self._results
Loading
Loading