Skip to content

Commit

Permalink
apex v0.3.0 separated ref_traj env from noref env, made minimal state…
Browse files Browse the repository at this point in the history
… a flag. new terminology in place of clock_based and phase_based
  • Loading branch information
yeshg committed Aug 21, 2020
1 parent 3ab3b6e commit 837fdb9
Show file tree
Hide file tree
Showing 40 changed files with 832 additions and 3,292 deletions.
36 changes: 22 additions & 14 deletions readme.md → README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,27 @@

Apex is a small, modular library that contains some implementations of continuous reinforcement learning algorithms. Fully compatible with OpenAI gym.

<img src="img/output.gif" alt="running1"/>
<img src="img/output2.gif" alt="running2"/>

## Running experiments

### Basics
Any algorithm can be run from the apex.py entry point.

To run DDPG on Walker2d-v2,
To run PPO on a cassie environment,

```bash
python apex.py ppo --env_name Cassie-v0 --num_procs 12 --run_name experiment01
```

To run TD3 on the gym environment Walker-v2,

```bash
python apex.py ddpg --env_name Walker2d-v2 --batch_size 64
python apex.py td3_async --env_name Walker-v2 --num_procs 12 --run_name experiment02
```

### Logging details / Monitoring live training progress
## Logging details / Monitoring live training progress
Tensorboard logging is enabled by default for all algorithms. The logger expects that you supply an argument named ```logdir```, containing the root directory you want to store your logfiles in, and an argument named ```seed```, which is used to seed the pseudorandom number generators.

A basic command line script illustrating this is:
Expand All @@ -27,9 +36,9 @@ python apex.py ars --logdir logs/ars --seed 1337
The resulting directory tree would look something like this:
```
trained_models/ # directory with all of the saved models and tensorboard logs
└── td3_async # algorithm name
└── ars # algorithm name
└── Cassie-v0 # environment name
└── 8b8b12-seed1 # unique run name created with hash of hyperparameters + seed
└── 8b8b12-seed1 # unique run name created with hash of hyperparameters
├── actor.pt # actor network for algo
├── critic.pt # critic network for algo
├── events.out.tfevents # tensorboard binary file
Expand All @@ -41,17 +50,16 @@ Using tensorboard makes it easy to compare experiments and resume training later

To see live training progress

Run ```$ tensorboard --logdir logs/ --port=8097``` then navigate to ```http://localhost:8097/``` in your browser

## Unit tests
You can run the unit tests using pytest.
Run ```$ tensorboard --logdir logs/``` then navigate to ```http://localhost:6006/``` in your browser

### To Do
- [ ] Sphinx documentation and github wiki
- [ ] Support loading pickle of hyperparameters for resuming training
- [ ] Improve/Tune implementations of TD3, add capability to resume training for off policy algorithms
## Cassie Environments:
* `Cassie-v0` : basic unified environment for walking/running policies
* `CassieTraj-v0` : unified environment with reference trajectories
* `CassiePlayground-v0` : environment for executing autonomous missions
* `CassieStanding-v0` : environment for training standing policies

## Features:
## Algorithms:
#### Currently implemented:
* Parallelism with [Ray](https://github.com/ray-project/ray)
* [GAE](https://arxiv.org/abs/1506.02438)/TD(lambda) estimators
* [PPO](https://arxiv.org/abs/1707.06347), VPG with ratio objective and with log likelihood objective
Expand Down
56 changes: 18 additions & 38 deletions apex.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import torch
import sys, pickle, argparse
from util import color, print_logo, env_factory, create_logger, EvalProcessClass, parse_previous, collect_data
from util.logo import print_logo
from util.log import parse_previous
from util.eval import EvalProcessClass

if __name__ == "__main__":

Expand All @@ -10,31 +12,31 @@
"""
General arguments for configuring the environment
"""
# command input, state input, env attributes
parser.add_argument("--command_profile", default="clock", type=str.lower, choices=["clock", "phase", "traj"])
parser.add_argument("--input_profile", default="full", type=str.lower, choices=["full", "min"])
parser.add_argument("--simrate", default=50, type=int, help="simrate of environment")
parser.add_argument("--traj", default="walking", type=str, help="reference trajectory to use. options are 'aslip', 'walking', 'stepping'")
parser.add_argument("--phase_based", default=False, action='store_true', dest='phase_based')
parser.add_argument("--not_clock_based", default=True, action='store_false', dest='clock_based')
parser.add_argument("--not_state_est", default=True, action='store_false', dest='state_est')
parser.add_argument("--not_dyn_random", default=True, action='store_false', dest='dyn_random')
parser.add_argument("--not_no_delta", default=True, action='store_false', dest='no_delta')
parser.add_argument("--not_mirror", default=True, action='store_false', dest='mirror') # mirror actions or not
parser.add_argument("--learn_gains", default=False, action='store_true', dest='learn_gains') # learn PD gains or not
# attributes for trajectory based environments
parser.add_argument("--traj", default="walking", type=str, help="reference trajectory to use. options are 'aslip', 'walking', 'stepping'")
parser.add_argument("--not_no_delta", default=True, action='store_false', dest='no_delta')
parser.add_argument("--ik_baseline", default=False, action='store_true', dest='ik_baseline') # use ik as baseline for aslip + delta policies?
# mirror loss and reward
parser.add_argument("--not_mirror", default=True, action='store_false', dest='mirror') # mirror actions or not
parser.add_argument("--reward", default=None, type=str) # reward to use. this is a required argument
# parser.add_argument("--gainsDivide", default=1.0, type=float)

"""
General arguments for configuring the logger
"""
parser.add_argument("--env_name", default="Cassie-v0") # environment name
parser.add_argument("--run_name", default=None) # run name


"""
Arguments generally used for Curriculum Learning
General arguments for Curriculum Learning
"""
parser.add_argument("--exchange_reward", default=None) # Can only be used with previous (below)
parser.add_argument("--previous", type=str, default=None) # path to directory of previous policies for resuming training
parser.add_argument("--fixed_speed", type=float, default=None) # Fixed speed to train/test at

if len(sys.argv) < 2:
print("Usage: python apex.py [option]", sys.argv)
Expand All @@ -59,7 +61,6 @@
parser.add_argument("--recurrent", "-r", action='store_true') # whether to use a recurrent policy
parser.add_argument("--logdir", default="./trained_models/ars/", type=str)
parser.add_argument("--seed", "-s", default=0, type=int)
parser.add_argument("--env_name", "-e", default="Hopper-v3")
parser.add_argument("--average_every", default=10, type=int)
parser.add_argument("--save_model", "-m", default=None, type=str) # where to save the trained model to
parser.add_argument("--redis", default=None)
Expand Down Expand Up @@ -105,7 +106,6 @@
else:
parser.add_argument("--logdir", default="./trained_models/ddpg/", type=str)
parser.add_argument("--seed", "-s", default=0, type=int)
parser.add_argument("--env_name", "-e", default="Hopper-v3")

args = parser.parse_args()

Expand All @@ -126,7 +126,6 @@
# general args
parser.add_argument("--logdir", default="./trained_models/syncTD3/", type=str)
parser.add_argument("--previous", type=str, default=None) # path to directory of previous policies for resuming training
parser.add_argument("--env_name", default="Cassie-v0") # environment name
parser.add_argument("--history", default=0, type=int) # number of previous states to use as input
parser.add_argument("--redis_address", type=str, default=None) # address of redis server (for cluster setups)
parser.add_argument("--seed", default=0, type=int) # Sets Gym, PyTorch and Numpy seeds
Expand Down Expand Up @@ -168,7 +167,6 @@
from rl.algos.async_td3 import run_experiment

# args common for actors and learners
parser.add_argument("--env_name", default="Cassie-v0") # environment name
parser.add_argument("--hidden_size", default=256) # neurons in hidden layer
parser.add_argument("--history", default=0, type=int) # number of previous states to use as input

Expand Down Expand Up @@ -223,8 +221,6 @@
from rl.algos.ppo import run_experiment

# general args
parser.add_argument("--algo_name", default="ppo") # algo name
parser.add_argument("--env_name", "-e", default="Cassie-v0")
parser.add_argument("--logdir", type=str, default="./trained_models/ppo/") # Where to log diagnostics to
parser.add_argument("--seed", default=0, type=int) # Sets Gym, PyTorch and Numpy seeds
parser.add_argument("--history", default=0, type=int) # number of previous states to use as input
Expand Down Expand Up @@ -252,13 +248,8 @@
parser.add_argument("--max_traj_len", type=int, default=400, help="Max episode horizon")
parser.add_argument("--recurrent", action='store_true')
parser.add_argument("--bounded", type=bool, default=False)

args = parser.parse_args()

# Argument setup checks. Ideally all arg settings are compatible with each other, but that's not convenient for fast development
if (args.ik_baseline and not args.traj == "aslip") or (args.learn_gains and args.mirror):
raise Exception("Incompatible environment config settings")

args = parse_previous(args)

run_experiment(args)
Expand All @@ -267,34 +258,23 @@

sys.argv.remove(sys.argv[1])

parser.add_argument("--path", type=str, default="./trained_models/ppo/Cassie-v0/7b7e24-seed0/", help="path to folder containing policy and run details")
parser.add_argument("--env_name", default="Cassie-v0", type=str)
parser.add_argument("--path", type=str, default="./trained_models/nodelta_neutral_StateEst_symmetry_speed0-3_freq1-2/", help="path to folder containing policy and run details")
parser.add_argument("--traj_len", default=400, type=str)
parser.add_argument("--history", default=0, type=int) # number of previous states to use as input
parser.add_argument("--mission", default="default", type=str) # only used by playground environment
parser.add_argument("--terrain", default=None, type=str) # hfield file name (terrain to use)
parser.add_argument("--debug", default=False, action='store_true')
parser.add_argument("--no_stats", dest="stats", default=True, action='store_false')
parser.add_argument("--no_viz", default=False, action='store_true')
parser.add_argument("--collect_data", default=False, action='store_true')

args = parser.parse_args()

run_args = pickle.load(open(args.path + "experiment.pkl", "rb"))

if not hasattr(run_args, "simrate"):
run_args.simrate = 50
print("manually choosing simrate as 50 (40 Hz)")
if not hasattr(run_args, "phase_based"):
run_args.phase_based = False

policy = torch.load(args.path + "actor.pt")
policy.eval()

if args.collect_data:
collect_data(policy, args, run_args)
else:
# eval_policy(policy, args, run_args)
# eval_policy_input_viz(policy, args, run_args)
ev = EvalProcessClass(args, run_args)
ev.eval_policy(policy, args, run_args)
# eval_policy(policy, args, run_args)
# eval_policy_input_viz(policy, args, run_args)
ev = EvalProcessClass(args, run_args)
ev.eval_policy(policy, args, run_args)
12 changes: 7 additions & 5 deletions cassie/__init__.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
# Unified
from .cassie import CassieEnv
from .cassie_traj import CassieTrajEnv
from .cassie_playground import CassiePlayground
from .cassie import CassieEnv_v2 as CassieEnv
from .cassie_min import CassieEnv_v3 as CassieMinEnv
from .cassie_standing_env import CassieStandingEnv
from .cassie import CassieEnv_v2
from .cassie_standing_env import CassieStandingEnv # sorta old/unused

# Proprietary
from .cassie_noaccel_footdist_omniscient import CassieEnv_noaccel_footdist_omniscient
from .cassie_footdist_env import CassieEnv_footdist
from .cassie_noaccel_footdist_env import CassieEnv_noaccel_footdist
from .cassie_noaccel_footdist_nojoint_env import CassieEnv_noaccel_footdist_nojoint
from .cassie_novel_footdist_env import CassieEnv_novel_footdist
from .cassie_mininput_env import CassieEnv_mininput


# CassieMujocoSim
from .cassiemujoco import *


Expand Down
Loading

0 comments on commit 837fdb9

Please sign in to comment.