Recurrent Replay Relevance Distributed DQN (R3D2) is a generalist multi-agent reinforcement learning (MARL) agent designed to play Hanabi across all game settings while adapting to unfamiliar collaborators. Unlike traditional MARL agents that struggle with transferability and cooperation beyond their training setting, R3D2 utilizes language-based reformulation and a distributed learning approach to handle dynamic observation and action spaces. This allows it to generalize across different game configurations and effectively collaborate with diverse algorithmic agents.
-
Generalized MARL agent: Play Hanabi across different player settings (2-to-5 players) without changing architecture or retraining from scratch.
-
Adaptive cooperation: Capable of collaborating with unfamiliar partners, overcoming limitations of traditional MARL systems.
-
Language-based task reformulation: Utilizes text representations to enhance transfer learning and generalization.
-
Distributed Learning Framework: Employs a scalable MARL algorithm to handle dynamic observations and actions effectively.
The code has been tested with PyTorch 2.0.1
Clone the repo with --recursive
to include submodules
git clone --recursive [email protected]:chandar-lab/R3D2-A-Generalist-Hanabi-Agent.git
- Environment Setup
- Dependencies
- GPU Configuration
- Building Tokenizers
- Additional Information
- Training Scripts R2D2, R3D2
- Evaluation Scripts Job
This repository contains the setup instructions for the Hanabi learning environment and related dependencies.
- Create a Conda environment:
conda create --name r3d3_hanabi python=3.9
- Activate the environment:
conda activate r3d3_hanabi
-
Install PyTorch (CUDA 11.8):
pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118
-
Install Transformers library:
pip install transformers==4.31.0
-
Load Python module:
module load python/3.9
-
Install additional Python packages:
pip install cmake tabulate cffi psutil pip install tdqm scipy matplotlib wandb
-
Load CUDA module:
module load cuda/11.8
-
Install Rust for building tokenizers:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-
Navigate to the
hanabi_lib
directory:cd hanabi-learning-environment/hanabi_lib/
-
Clone the
tokenizers-cpp
repository:git clone --recursive [email protected]:mlc-ai/tokenizers-cpp.git
make
- Ensure that all required modules are properly loaded in your environment.
- The setup assumes access to a GPU with CUDA 11.8 support.
- Use
wandb
for experiment tracking and logging during model training. - If issues arise during the setup, check for compatibility between package versions and your system configuration.
Feel free to contribute or report issues!
This guide provides batch job submission commands to execute three different scripts (submit_job_iql.sh
, submit_jobs_other_player.sh
, and launch_r3d2.sh
) for multiple models and players.
cd pyhanabi
The following command submits jobs for each model (a
, b
, c
, d
, e
) and each player count (2
, 3
, 4
, 5
):
for m in "a" "b" "c" "d" "e"; do
for p in 2 3 4 5; do
sbatch scripts/submit_job_iql.sh $m $p;
done;
done;
The following command submits jobs for submit_jobs_other_player.sh
with the same models and player counts:
for m in "a" "b" "c" "d" "e"; do
for p in 2 3 4 5; do
sbatch scripts/submit_jobs_other_player.sh $m $p;
done;
done;
The following command submits jobs for launch_r3d2.sh
with the same models and player counts:
for m in "a" "b" "c" "d" "e"; do
for p in 2 3 4 5; do
sbatch scripts/launch_r3d2.sh $m $p;
done;
done;
The following command submits jobs for launch_r3d2.sh
for Player 6
, representing multi-task R3D2, with the same models:
for m in "a" "b" "c" "d" "e"; do
for p in 6; do
sbatch scripts/launch_r3d2.sh $m $p;
done;
done;
- Seed (
m
): The scripts will iterate over the modelsa
,b
,c
,d
, ande
. - Player Setting (
p
): Jobs will be submitted for player counts of2
,3
,4
, and5
. - Scripts: The three scripts handle different types of job submissions (
IQL
,other players
, andR3D2
).
scripts/launch_2p_eval_diff_setting_all.sh
scripts/launch_3p_eval_diff_setting_all.sh
scripts/launch_4p_eval_diff_setting_all.sh
scripts/launch_5p_eval_diff_setting_all.sh
scripts/launch_cross_play.sh
HanabiState::ToText()
converts the game's current state into a human-readable format, providing details on tokens, fireworks, and player hands. Reference
This code base is based on Language Instructed Reinforcement Learning for Human-AI Coordination (ICML 2023).