YRC-Bench is a benchmark for learning a coordination strategy the problem of yielding and requesting control between novice (weak) and expert (strong) policies. The benchmark is designed to train and evaluate the performance of the coordination policy in a variety of environments, algorithms, and feature types. The benchmark is designed to be modular and extensible, allowing for easy integration of new environment suites, algorithms, and feature types.
For further details, check the paper: https://arxiv.org/abs/2502.09583
- MiniGrid:
- Forked from the
Farama-Foundation/Minigrid
with minor changes: https://github.com/modanesh/Minigrid/tree/1eae02b3561e0cce827d0509d904d053f437b08c
- Forked from the
- Procgen:
- Containing the "Goal Misgeneralization" environments from the original Procgen: https://github.com/JacobPfau/procgenAISC/tree/7821f2c00be9a4ff753c6d54b20aed26028ca812
- Cliport:
- Forked from the
cliport/cliport
with minor changes: https://github.com/modanesh/cliport/tree/ff9fe042c0c2bf0d98c6354d3b5ce2a0a8de97c2 - Useful link for cliport's installation: https://medium.com/@limyoonaxi/common-bugs-you-may-encounter-while-installing-cliport-ef1790e1cc0a
- Forked from the
- Random: randomly selecting the agent to query
- Always: always querying the same agent
- Threshold-based: querying the agent based on a threshold over one of maximum logit, maximum probability, margin, negative entropy, or negative energy
- RL-based: training an RL agent to learn the coordination strategy (PPO)
- OOD detection-based: querying the agent based on the out-of-distribution detection score (DeepSVDD)
For RL-based and OOD detection-based algorithms, the following feature types are supported as the input:
- Raw image observations
- Weak agent's hidden features
- Weak agent's action distributions
- Raw image observations + Weak agent's hidden features
- Raw image observations + Weak agent's action distributions
- Weak agent's hidden features + Weak agent's action distributions
- Raw image observations + Weak agent's hidden features + Weak agent's action distributions
- Clone the repository:
git clone https://github.com/modanesh/YRC-Bench.git
cd YRC-Bench
git submodule update --init --recursive
- Create a virtual environment and source it:
virtualenv -p python3.8 venv
source venv/bin/activate
- Install the requirements:
pip install -r requirements.txt
Note: the requirements for MiniGrid environments is different from the other environments. To install the requirements for MiniGrid, run the following command:
pip install -r requirements_minigrid.txt
- Install the environments:
pip install -e lib/LIBRARY_NAME
where LIBRARY_NAME
is the name of the library you want to install from the lib
directory.
First checkpoints for the acting policies should be provided. For the acting policies (sim weak, weak, and strong), the checkpoints should be located in the YRC/checkpoints
directory. The checkpoints for each environment suite are obtained by:
- MiniGrid: Training RL policies using https://github.com/lcswillems/rl-starter-files
- Procgen: Training RL policies using https://github.com/jbkjr/train-procgen-pytorch
- Cliport: The provided checkpoints in the
cliport
repository https://github.com/cliport/cliport
A zip file containing the checkpoints for the acting policies can be downloaded from the following link: Google Drive Link
Once the acting policies are set, to train an algorithm on an environment with a specific feature type, run the following command:
python train.py -c configs/CONFIG.yaml -n RUN_NAME -en ENV_NAME -sim PATH/TO/SIM_WEAK_AGENT.pt -weak PATH/TO/WEAK_AGENT.pt -strong PATH/TO/STRONG_AGENT.pt -query_cost QUERY_COST -cp_feature FEATURE_TYPE
where CONFIG.yaml
is the configuration file, RUN_NAME
is the name of the run, ENV_NAME
is the name of the environment, PATH/TO/SIM_WEAK_POLICY.pt
is the path to the simulated weak policy, PATH/TO/WEAK_POLICY.pt
is the path to the weak policy, PATH/TO/STRONG_POLICY.pt
is the path to the strong policy, QUERY_COST
is the cost of querying the strong agent, and FEATURE_TYPE
is the input feature type for OOD- and RL-based methods. For other methods, there is no need to specify cp_feature
. For a full list of arguments, run python train.py -h
or check the flags.py
file.
During the training, 3 checkpoints are saved: best_val_sim.ckpt, best_val_true.ckpt, and last.ckpt. The best_val_sim.ckpt is the checkpoint with the best validation performance on the simulated case, the best_val_true.ckpt is the checkpoint with the best validation performance on the true case, and the last.ckpt is the last checkpoint of the training.
Following are a few examples of training commands for different settings:
- Training a random algorithm on the MiniGrid's DoorKey environment with query cost
0.2
:
python train.py -c configs/minigrid_random.yaml -n DoorKey_always_random_qc02 -en MiniGrid-DoorKey -sim YRC/checkpoints/minigrid/DoorKey/sim_weak/status.pt -weak YRC/checkpoints/minigrid/DoorKey/weak/status.pt -strong YRC/checkpoints/minigrid/DoorKey/strong/status.pt -query_cost 0.2 -en_tr_suffix=-5x5-v0 -en_te_suffix=-8x8-v0
- Training a threshold-based algorithm on the Procgen's coinrun environment with query cost
0.6
and margin as the coordination policy metric:
python train.py -c configs/procgen_threshold.yaml -n coinrun_threshold_margin_qc06 -en coinrun -sim YRC/checkpoints/procgen/coinrun/sim_weak/model_40009728.pth -weak YRC/checkpoints/procgen/coinrun/weak/model_80019456.pth -strong YRC/checkpoints/procgen/coinrun/strong/model_200015872.pth -cp_metric margin -query_cost 0.6
- Training an OOD detection-based algorithm on the cliport's stack-block-pyramid-seq environment with query cost
0.0
and raw image observations as the input feature type:
python train.py -c configs/cliport_ood.yaml -n stack-block-pyramid-seq_ood_obs_qc00 -en stack-block-pyramid-seq -sim YRC/checkpoints/cliport/multi-language-conditioned-cliport-n10-train/checkpoints/steps=400000-val_loss=0.00016243.ckpt -weak YRC/checkpoints/cliport/multi-language-conditioned-cliport-n100-train/checkpoints/steps=300000-val_loss=0.00017400.ckpt -cp_feature obs -cp_method DeepSVDD -query_cost 0
Note: For the cliport's environments, the strong agent is created based on the task (oracle), so there is no need to give it as input.
- Training an RL-based algorithm on the Procgen's maze environment with query cost
1.0
and raw image observation + hidden features as the input feature type:
python train.py -c configs/procgen_skyline.yaml -n maze_rl_hidden_obs_qc10 -en maze -sim YRC/checkpoints/procgen/maze/sim_weak/model_40009728.pth -weak YRC/checkpoints/procgen/maze/weak/model_80019456.pth -strong YRC/checkpoints/procgen/maze/strong/model_200015872.pth -wandb -cp_feature hidden_obs -query_cost 1.0
To evaluate the trained model, run the following command:
python eval.py -c configs/CONFIG.yaml -n RUN_NAME -en ENV_NAME -sim PATH/TO/SIM_WEAK_AGENT.pt -weak PATH/TO/WEAK_AGENT.pt -strong PATH/TO/STRONG_AGENT.pt -query_cost QUERY_COST -cp_feature FEATURE_TYPE -f_n CHECKPOINT_NAME -seed SEED
where CONFIG.yaml
is the configuration file, RUN_NAME
is the name of the run, ENV_NAME
is the name of the environment, PATH/TO/SIM_WEAK_POLICY.pt
is the path to the simulated weak policy, PATH/TO/WEAK_POLICY.pt
is the path to the weak policy, PATH/TO/STRONG_POLICY.pt
is the path to the strong policy, QUERY_COST
is the cost of querying the strong agent, FEATURE_TYPE
is the input feature type for OOD- and RL-based methods, CHECKPOINT_NAME
is the name of the checkpoint to evaluate, and SEED
is the random seed. For a full list of arguments, run python eval.py -h
or check the flags.py
file.
Following are a few examples of evaluation commands for different settings:
- Evaluating a random algorithm on the MiniGrid's DoorKey environment with query cost
0.2
andbest_val_sim.ckpt
checkpoint:
python eval.py -c configs/minigrid_random.yaml -n DoorKey_always_random_qc02 -en MiniGrid-DoorKey -sim YRC/checkpoints/minigrid/DoorKey/sim_weak/status.pt -weak YRC/checkpoints/minigrid/DoorKey/weak/status.pt -strong YRC/checkpoints/minigrid/DoorKey/strong/status.pt -query_cost 0.2 -en_tr_suffix=-5x5-v0 -en_te_suffix=-8x8-v0 -f_n best_val_sim.ckpt -seed 10
- Evaluating a threshold-based algorithm on the Procgen's coinrun environment with query cost
0.6
andbest_val_true.ckpt
checkpoint:
python eval.py -c configs/procgen_threshold.yaml -n coinrun_threshold_margin_qc06 -en coinrun -sim YRC/checkpoints/procgen/coinrun/sim_weak/model_40009728.pth -weak YRC/checkpoints/procgen/coinrun/weak/model_80019456.pth -strong YRC/checkpoints/procgen/coinrun/strong/model_200015872.pth -cp_metric margin -f_n best_val_true.ckpt -query_cost 0.6 -seed 12
All the scripts required to analyze the results are located in the analyzing
directory. The scripts are designed to analyze the results of the experiments and generate the plots and tables presented in the paper. To do so, once the experiments are done, edit the constants.py
file according to the environments and algorithms used in the experiments. Then, run the following command first to extract the raw results from the experiments
directory:
python parse.py
This will create a file named raw_results.json
in the analyzing
directory. Then, run the following command to aggregate the results:
python aggregate.py
This will create a file named aggregated_results.json
in the analyzing
directory. Finally, to reproduce the plots and tables, run any of the other scripts in the analyzing
directory. Each will store the generated plots and tables in the final_plots
sub-directory.
To extend the benchmark with a new environment suite, follow the steps discussed in the "Add a new environment suite to the benchmark" wiki page. Also, to extend the benchmark with a new algorithm or feature type, follow the steps discussed in the "Add a new algorithm to the benchmark" wiki page.
If you use this benchmark in your research, please cite the following:
@misc{danesh2025learningcoordinateexperts,
title={Learning to Coordinate with Experts},
author={Mohamad H. Danesh and Tu Trinh and Benjamin Plaut and Nguyen X. Khanh},
year={2025},
eprint={2502.09583},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.09583},
}
@misc{yrc-bench,
author = {Danesh, Mohamad H. and Nguyen, Khanh Xuan},
title = {YRC-Bnech: A Benchmark for Learning to Coordinate with Experts},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/modanesh/yield_request_control}},
}