Improved Inverse Q-Learning (IQ-Learn)

This is an improved version of IQ-Learn, originally proposed in NeurIPS 2021.
Our modifications include:

✅ Added KL divergence and reward-based baselines
✅ Extended support for Gym Atari and MuJoCo environments
✅ Optimized training pipeline for better stability

Original Paper:

📄 IQ-Learn: Inverse Soft-Q Learning for Imitation
➡️ arXiv Link

Original Codebase:

Introduction

IQ-Learn is a state-of-the-art imitation learning framework that directly learns soft Q-functions from expert data. Unlike traditional adversarial approaches (e.g., GAIL, AIRL), IQ-Learn provides a simple, stable, and data-efficient alternative for both offline and online imitation learning.

Our Key Modifications

1️⃣ Introduced KL divergence and reward-based baselines to improve performance.
2️⃣ Adapted the method for Gym Atari and MuJoCo environments.
3️⃣ Optimized the training pipeline for better efficiency and generalization.

Key Advantages

✔️ Drop-in replacement for Behavior Cloning
✔️ Non-adversarial online imitation learning (successor to GAIL & AIRL)
✔️ Performs well with very sparse expert data
✔️ Scales to complex environments (Atari, MuJoCo)
✔️ Can recover reward functions from the environment

Installation & Usage

Please refer to the iq_learn directory for installation and usage instructions.

if you can't use WANDB,then you may can use

$env:WANDB_MODE = "offline"

Trajectory Conversion Tool

We provide a utility script convert_transitions.py to convert expert trajectories into the format required by IQ-Learn.

This is useful when you have custom environments or datasets and want to apply IQ-Learn directly.

Usage Example:

python convert_transitions.py --env_name

Make sure your expert data includes state, action, next_state, reward, and done fields.

Demonstrations

Imitation Learning on Atari

IQ-Learn achieving human-level imitation in various Atari games:

Citing This Work

If you use this code, please cite the original IQ-Learn paper:

@inproceedings{garg2021iqlearn,
title={IQ-Learn: Inverse soft-Q Learning for Imitation},
author={Divyansh Garg and Shuvam Chakraborty and Chris Cundy and Jiaming Song and Stefano Ermon},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=Aeo-xqtb5p}
}

Contact

For any questions or discussions, feel free to open an issue or reach out! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
docs		docs
iq_learn		iq_learn
videos		videos
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Improved Inverse Q-Learning (IQ-Learn)

Original Paper:

Original Codebase:

Introduction

Our Key Modifications

Key Advantages

Installation & Usage

if you can't use WANDB,then you may can use

Trajectory Conversion Tool

Usage Example:

Demonstrations

Imitation Learning on Atari

Citing This Work

Contact

About

Releases

Packages

Languages

License

Caesar107/IQ-learn

Folders and files

Latest commit

History

Repository files navigation

Improved Inverse Q-Learning (IQ-Learn)

Original Paper:

Original Codebase:

Introduction

Our Key Modifications

Key Advantages

Installation & Usage

if you can't use WANDB,then you may can use

Trajectory Conversion Tool

Usage Example:

Demonstrations

Imitation Learning on Atari

Citing This Work

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages