Name		Name	Last commit message	Last commit date
parent directory ..
agent		agent
conf		conf
dataset		dataset
envs		envs
experts		experts
iq.para		iq.para
results		results
scripts		scripts
temp_model		temp_model
trained_policies		trained_policies
utils		utils
vis		vis
wrappers		wrappers
LICENSE.md		LICENSE.md
LICENSE.pdf		LICENSE.pdf
README.md		README.md
check_expert.py		check_expert.py
convert_transitions.py		convert_transitions.py
expert_generation.py		expert_generation.py
gym.monitor.csv		gym.monitor.csv
iq.py		iq.py
make_envs.py		make_envs.py
requirements.txt		requirements.txt
test_iq.py		test_iq.py
test_rl.py		test_rl.py
train_iq.py		train_iq.py
train_rl.py		train_rl.py

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

IQ-Learn enables very fast, scalable and stable imitation learning. Our IQ-Learn algorithm is present in iq.py. This file can be used standalone to add IQ to your IL & RL projects.

IQ-Learn can be implemented on top of most existing RL methods (off-policy & on-policy) by changing the critic update loss to our proposed iq_loss.
(IQ has been successfully tested to work with Q-Learning, SAC, PPO, DDPG and Decision Transformer agents).

Update:

Added IQ-Learn results on Humanoid-v2
Added support for DM Control environments
Released expert_generation script to generate your own experts from trained RL agents for new environments.

Requirement

pytorch (>= 1.4)
gym
wandb
tensorboardX
hydra-core=1.0 (>= 1.1 is incompatible currently)

Installation

Make a conda environment and install dependencies: pip install -r requirements.txt
Setup wandb project to log and visualize metrics
(Optional) Download expert datasets for Atari environments from GDrive

Examples

We show some examples that push the boundaries of imitation learning using IQ-Learn:

###if you want to generate any expert,use iq_learn/experts/new_expert.py by

python new_expert.py --env_name Ant-v2

1. CartPole-v1 using 64 demo subsampled 20 times with fully offline imitation

python train_iq.py agent=softq method=iq env=cartpole expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert

IQ-Learn is the only method thats reaches the expert env reward of 500 (requiring only 3k training steps and less than 30 secs!!)

For other enviroment in gym or box2x,use:

python train_iq.py agent=softq method=iq env=lunarlander expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert

python train_iq.py agent=softq method=iq env=acrobot expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert

python train_iq.py env=bipedalwalker agent=sac expert.demos=64 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

2. Playing Atari games at human performance

python train_iq.py agent=softq env=pong agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=1

python train_iq.py agent=softq env=qbert agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=1

Again, IQ-Learn is the only method thats reaches the expert env reward of 21
(we find better hyperparams compared to the original paper)

3. Controlling a Humanoid and a ant with imitation of a single expert

python train_iq.py env=humanoid agent=sac expert.demos=64 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

python train_iq.py env=ant agent=sac expert.demos=16 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

IQ-Learn learns to control a full humanoid at expert performance using a single demonstration reaching the expert env reward of 5300

Instructions

We show example code for training Q-Learning and SAC agents with IQ-Learn in train_iq.py. We make minimum modifications to original RL training code present in train_rl.py and simply change the critic loss function.

To reproduce our Offline IL experiments, see scripts/run_offline.sh
To reproduce our Mujoco experiments, see scripts/run_mujoco.sh
To reproduce Atari experiments, see scripts/run_atari.sh
To visualize our recovered state-only rewards on a toy Point Maze environment: python -m vis.maze_vis env=pointmaze_right eval.policy=pointmaze agent.init_temp=1 agent=sac.q_net._target_=agent.sac_models.DoubleQCritic.
Reward visualizations are saved in vis/outputs directory

Contributions

Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first.

License

The code is made available for academic, non-commercial usage. Please see the LICENSE for the licensing terms of IQ-Learn for commercial use and running it on your robots/creating new AI agents.

For any inquiry, contact: Div Garg ([email protected])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iq_learn

iq_learn

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

Update:

Requirement

Installation

Examples

1. CartPole-v1 using 64 demo subsampled 20 times with fully offline imitation

2. Playing Atari games at human performance

3. Controlling a Humanoid and a ant with imitation of a single expert

Instructions

Contributions

License

Files

iq_learn

Directory actions

More options

Directory actions

More options

Latest commit

History

iq_learn

Folders and files

parent directory

README.md

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

Update:

Requirement

Installation

Examples

1. CartPole-v1 using 64 demo subsampled 20 times with fully offline imitation

2. Playing Atari games at human performance

3. Controlling a Humanoid and a ant with imitation of a single expert

Instructions

Contributions

License