Skip to content

Latest commit

 

History

History

iq_learn

Inverse Q-Learning (IQ-Learn)

SOTA framework for non-adversarial Imitation Learning

IQ-Learn enables very fast, scalable and stable imitation learning. Our IQ-Learn algorithm is present in iq.py. This file can be used standalone to add IQ to your IL & RL projects.

IQ-Learn can be implemented on top of most existing RL methods (off-policy & on-policy) by changing the critic update loss to our proposed iq_loss.
(IQ has been successfully tested to work with Q-Learning, SAC, PPO, DDPG and Decision Transformer agents).

Update:

  • Added IQ-Learn results on Humanoid-v2
  • Added support for DM Control environments
  • Released expert_generation script to generate your own experts from trained RL agents for new environments.

Requirement

  • pytorch (>= 1.4)
  • gym
  • wandb
  • tensorboardX
  • hydra-core=1.0 (>= 1.1 is incompatible currently)

Installation

  • Make a conda environment and install dependencies: pip install -r requirements.txt
  • Setup wandb project to log and visualize metrics
  • (Optional) Download expert datasets for Atari environments from GDrive

Examples

We show some examples that push the boundaries of imitation learning using IQ-Learn:

###if you want to generate any expert,use iq_learn/experts/new_expert.py by

python new_expert.py --env_name Ant-v2

1. CartPole-v1 using 64 demo subsampled 20 times with fully offline imitation

python train_iq.py agent=softq method=iq env=cartpole expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert

IQ-Learn is the only method thats reaches the expert env reward of 500 (requiring only 3k training steps and less than 30 secs!!)

For other enviroment in gym or box2x,use:

python train_iq.py agent=softq method=iq env=lunarlander expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert
python train_iq.py agent=softq method=iq env=acrobot expert.demos=64 expert.subsample_freq=20 agent.init_temp=0.001 method.chi=True method.loss=value_expert
python train_iq.py env=bipedalwalker agent=sac expert.demos=64 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

2. Playing Atari games at human performance

python train_iq.py agent=softq env=pong agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=1
python train_iq.py agent=softq env=qbert agent.init_temp=1e-3 method.loss=value_expert method.chi=True seed=0 expert.demos=1

Again, IQ-Learn is the only method thats reaches the expert env reward of 21
(we find better hyperparams compared to the original paper)

3. Controlling a Humanoid and a ant with imitation of a single expert

python train_iq.py env=humanoid agent=sac expert.demos=64 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1
python train_iq.py env=ant agent=sac expert.demos=16 method.loss=v0 method.regularize=True agent.actor_lr=3e-05 seed=0 agent.init_temp=1

IQ-Learn learns to control a full humanoid at expert performance using a single demonstration reaching the expert env reward of 5300

Instructions

We show example code for training Q-Learning and SAC agents with IQ-Learn in train_iq.py. We make minimum modifications to original RL training code present in train_rl.py and simply change the critic loss function.

  • To reproduce our Offline IL experiments, see scripts/run_offline.sh
  • To reproduce our Mujoco experiments, see scripts/run_mujoco.sh
  • To reproduce Atari experiments, see scripts/run_atari.sh
  • To visualize our recovered state-only rewards on a toy Point Maze environment: python -m vis.maze_vis env=pointmaze_right eval.policy=pointmaze agent.init_temp=1 agent=sac.q_net._target_=agent.sac_models.DoubleQCritic.
    Reward visualizations are saved in vis/outputs directory

Contributions

Contributions are very welcome. If you know how to make this code better, please open an issue. If you want to submit a pull request, please open an issue first.

License

The code is made available for academic, non-commercial usage. Please see the LICENSE for the licensing terms of IQ-Learn for commercial use and running it on your robots/creating new AI agents.

For any inquiry, contact: Div Garg ([email protected])