A Generalist Hanabi Agent - Recurrent Replay Relevance Distributed DQN (R3D2)

Overview

Recurrent Replay Relevance Distributed DQN (R3D2) is a generalist multi-agent reinforcement learning (MARL) agent designed to play Hanabi across all game settings while adapting to unfamiliar collaborators. Unlike traditional MARL agents that struggle with transferability and cooperation beyond their training setting, R3D2 utilizes language-based reformulation and a distributed learning approach to handle dynamic observation and action spaces. This allows it to generalize across different game configurations and effectively collaborate with diverse algorithmic agents.

Key Features

Generalized MARL agent: Play Hanabi across different player settings (2-to-5 players) without changing architecture or retraining from scratch.
Adaptive cooperation: Capable of collaborating with unfamiliar partners, overcoming limitations of traditional MARL systems.
Language-based task reformulation: Utilizes text representations to enhance transfer learning and generalization.
Distributed Learning Framework: Employs a scalable MARL algorithm to handle dynamic observations and actions effectively.

R3D2 Architecture:

The code has been tested with PyTorch 2.0.1

Get Started

Clone the repo with --recursive to include submodules

git clone --recursive [email protected]:chandar-lab/R3D2-A-Generalist-Hanabi-Agent.git

Hanabi Learning Environment Setup

This repository contains the setup instructions for the Hanabi learning environment and related dependencies.

Environment Setup

Create a Conda environment:

conda create --name r3d3_hanabi python=3.9

Activate the environment:
```
conda activate r3d3_hanabi
```

Dependencies

Install PyTorch (CUDA 11.8):

pip install torch==2.0.1 --index-url https://download.pytorch.org/whl/cu118

Install Transformers library:
```
pip install transformers==4.31.0
```
Load Python module:
```
module load python/3.9
```

Install additional Python packages:

pip install cmake tabulate cffi psutil
pip install tdqm scipy matplotlib wandb

GPU Configuration

Load CUDA module:
```
module load cuda/11.8
```

Install Rust for building tokenizers:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Building Tokenizers

Navigate to the hanabi_lib directory:

cd hanabi-learning-environment/hanabi_lib/

Clone the tokenizers-cpp repository:

git clone --recursive [email protected]:mlc-ai/tokenizers-cpp.git

Build

make

Additional Information

Ensure that all required modules are properly loaded in your environment.
The setup assumes access to a GPU with CUDA 11.8 support.
Use wandb for experiment tracking and logging during model training.
If issues arise during the setup, check for compatibility between package versions and your system configuration.

Feel free to contribute or report issues!

Training Scripts R2D2, R3D2

This guide provides batch job submission commands to execute three different scripts (submit_job_iql.sh, submit_jobs_other_player.sh, and launch_r3d2.sh) for multiple models and players.

Commands

cd pyhanabi

Submitting Jobs for `submit_job_iql.sh`

The following command submits jobs for each model (a, b, c, d, e) and each player count (2, 3, 4, 5):

for m in "a" "b" "c" "d" "e"; do 
    for p in 2 3 4 5; do 
        sbatch scripts/submit_job_iql.sh $m $p; 
    done; 
done;

Submitting Jobs for `submit_jobs_other_player.sh`

The following command submits jobs for submit_jobs_other_player.sh with the same models and player counts:

for m in "a" "b" "c" "d" "e"; do 
    for p in 2 3 4 5; do 
        sbatch scripts/submit_jobs_other_player.sh $m $p; 
    done; 
done;

Submitting Jobs for Single-task R3D2 `launch_r3d2.sh`

The following command submits jobs for launch_r3d2.sh with the same models and player counts:

for m in "a" "b" "c" "d" "e"; do 
    for p in 2 3 4 5; do 
        sbatch scripts/launch_r3d2.sh $m $p; 
    done; 
done;

Submitting Jobs for Multi-task R3D2 `launch_r3d2.sh` (Player 6)

The following command submits jobs for launch_r3d2.sh for Player 6, representing multi-task R3D2, with the same models:

for m in "a" "b" "c" "d" "e"; do 
    for p in 6; do 
        sbatch scripts/launch_r3d2.sh $m $p; 
    done; 
done;

Explanation

Seed (m): The scripts will iterate over the models a, b, c, d, and e.
Player Setting (p): Jobs will be submitted for player counts of 2, 3, 4, and 5.
Scripts: The three scripts handle different types of job submissions (IQL, other players, and R3D2).

Evaluation Scripts Job

scripts/launch_2p_eval_diff_setting_all.sh
scripts/launch_3p_eval_diff_setting_all.sh
scripts/launch_4p_eval_diff_setting_all.sh
scripts/launch_5p_eval_diff_setting_all.sh

scripts/launch_cross_play.sh

Additional Comments:

HanabiState::ToText() converts the game's current state into a human-readable format, providing details on tokens, fireworks, and player hands. Reference

This code base is based on Language Instructed Reinforcement Learning for Human-AI Coordination (ICML 2023).

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
cpp		cpp
hanabi-learning-environment		hanabi-learning-environment
images		images
pyhanabi		pyhanabi
rela		rela
say-select		say-select
third_party		third_party
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Makefile		Makefile
README.md		README.md
get_pybind_flags.py		get_pybind_flags.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Generalist Hanabi Agent - Recurrent Replay Relevance Distributed DQN (R3D2)

Overview

Key Features

R3D2 Architecture:

Get Started

Table of Contents

Hanabi Learning Environment Setup

Environment Setup

Dependencies

GPU Configuration

Building Tokenizers

Build

Additional Information

Training Scripts R2D2, R3D2

Commands

Submitting Jobs for `submit_job_iql.sh`

Submitting Jobs for `submit_jobs_other_player.sh`

Submitting Jobs for Single-task R3D2 `launch_r3d2.sh`

Submitting Jobs for Multi-task R3D2 `launch_r3d2.sh` (Player 6)

Explanation

Evaluation Scripts Job

Additional Comments:

About

Releases

Packages

Contributors 3

Languages

chandar-lab/R3D2-A-Generalist-Hanabi-Agent

Folders and files

Latest commit

History

Repository files navigation

A Generalist Hanabi Agent - Recurrent Replay Relevance Distributed DQN (R3D2)

Overview

Key Features

R3D2 Architecture:

Get Started

Table of Contents

Hanabi Learning Environment Setup

Environment Setup

Dependencies

GPU Configuration

Building Tokenizers

Build

Additional Information

Training Scripts R2D2, R3D2

Commands

Submitting Jobs for submit_job_iql.sh

Submitting Jobs for submit_jobs_other_player.sh

Submitting Jobs for Single-task R3D2 launch_r3d2.sh

Submitting Jobs for Multi-task R3D2 launch_r3d2.sh (Player 6)

Explanation

Evaluation Scripts Job

Additional Comments:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Submitting Jobs for `submit_job_iql.sh`

Submitting Jobs for `submit_jobs_other_player.sh`

Submitting Jobs for Single-task R3D2 `launch_r3d2.sh`

Submitting Jobs for Multi-task R3D2 `launch_r3d2.sh` (Player 6)

Packages