CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
Yanlin Feng, Simone Papicchio, Sajjadur Rahmanα
- [Feb 20, 2025] We updated the graph deployment configuration to reduce RAM usage.
- [Feb 19, 2025] We have released the evaluation scripts and the EX and PSJS implementations!
- [Feb 14, 2025] We have released the text2cypher baseline code! See the instructions below on how to run
gpt-4o-mini
on CypherBench. - [Feb 13, 2025] The 11 property graphs are now available on 🤗HuggingFace! We also make it super easy to deploy them (see the instructions below).
- [Dec 27, 2024] We have deployed a demo NBA graph(password:
cypherbench
) at Neo4j AuraDB! Check it out! You can run Cypher queries likeMATCH (n:Player {name: 'LeBron James'})-[r]-(m) RETURN *
. - [Dec 27, 2024] The training and test sets are now available on 🤗HuggingFace!
conda create -n cypherbench python=3.11
conda activate cypherbench
git clone https://github.com/megagonlabs/cypherbench.git
cd cypherbench
pip install -e .
To download the dataset (including both the graphs and text2cypher tasks), simply clone the HuggingFace dataset repository:
# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install
# Clone the dataset repo from HuggingFace and save it as the `benchmark` directory
git clone https://huggingface.co/datasets/megagonlabs/cypherbench benchmark
Now, you can deploy the 7 test graphs with a single Docker Compose command using our custom Neo4j Docker image and our Docker Compose configuration:
cd docker/
bash start_neo4j_test.sh # This script first checks if required files exist, then runs the docker-compose command
cd ..
# check if the graphs are fully loaded (it typically takes at least 10 minutes).
python scripts/print_db_status.py
To stop the Neo4j databases, run bash stop_neo4j_test.sh
.
Running gpt-4o-mini
on the CypherBench test set costs around $0.3. First, make sure you have set the OPENAI_API_KEY
environment variable to use the OpenAI API.
python -m cypherbench.baseline.zero_shot_nl2cypher --llm gpt-4o-mini --result_dir output/gpt-4o-mini/
There are two ways to fetch the graph schemas when running text2cypher:
- (default)
--load_schema_from json
loads the schema from the local JSON files stored in the benchmark/graphs/schemas directory. When using this option, the Neo4j databases are not used during text2cypher. --load_schema_from neo4j
fetches the schema from the Neo4j database by executing special Cypher queries*. This option requires the Neo4j databases to be fully loaded.
*We don't use apoc.meta.data() by default, see Appendix A.4 in the paper for details.
python -m cypherbench.evaluate --result_dir output/gpt-4o-mini/ --num_threads 8 # Adjust the number of threads as needed
Metric implementation:
- Execution Accuracy (EX): execution_accuracy.py
- Provenance Subgraph Jaccard Similarity (PSJS): provenance_subgraph_jaccard_similarity.py
- Executable Percentage: executable.py
Reference performance for gpt-4o-mini
:
{
"overall": {
"execution_accuracy": 0.3143,
"psjs": 0.4591,
"executable": 0.8739
},
"by_graph": {
"flight_accident": 0.4603,
"fictional_character": 0.3273,
...
- text2cypher tasks
- 11 property graphs and graph deployment docker
- text2cypher baseline code
- EX/PSJS implementation and evaluation scripts
- Wikidata RDF-to-property-graph engine
- Text2cypher task generation pipeline
Please feel free to open an issue if you have any questions or suggestions!
@article{feng2024cypherbench,
title={CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era},
author={Feng, Yanlin and Papicchio, Simone and Rahman, Sajjadur},
journal={arXiv preprint arXiv:2412.18702},
year={2024}
}