Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hm1 roboschool #36

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions hw1/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM tensorflow/tensorflow:latest-py3-jupyter
MAINTAINER Roei Bahumi <[email protected]>


RUN apt-get update \
&& apt-get install -y \
# needed for a few gym environments
swig=3.0.8-0ubuntu3 \
# graphic library needed for roboschool (https://github.com/openai/roboschool#installation)
libgl1-mesa-dev=18.0.5-0ubuntu0~16.04.1 \
# other libraries needed for roboschool and gym full installation
libgl1-mesa-glx=18.0.5-0ubuntu0~16.04.1 \
libgtk2.0-dev=2.24.30-1ubuntu1.16.04.2 \
graphviz=2.38.0-12ubuntu2.1


ADD ./requirements.txt requirements.txt
RUN pip install -r requirements.txt
54 changes: 54 additions & 0 deletions hw1/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,57 @@ In `experts/`, the provided expert policies are:
* Walker2d-v2.pkl

The name of the pickle file corresponds to the name of the gym environment.

### Added Roboschool environment
Additional roboschool models were added with policies from the Roboschool agent zoo (https://github.com/openai/roboschool).
This code was rebased over Alex Hofer <[email protected]> code.

Additional policies in the `experts/` directory:
* RoboschoolAnt-v1.py
* RoboschoolHalfCheetah-v1.py
* RoboschoolHopper-v1.py
* RoboschoolHumanoid-v1.py
* RoboschoolReacher-v1.py
* RoboschoolWalker2d-v1.py

# Running Environment
- The required python packages are listed [here](requirements.txt).
- You can either run the code on a docker container ([Dockerfile](../Dockerfile)) or install the package on a virtual environment.
- Note that if running in a docker container, you won't be able to render the environment (without additional configuration).
### Installing a virtual env
```
# Create a virtual environment
virtualenv -p /usr/local/bin/python3 venv

# activate and install packages
source venv/bin/activate
pip install -r hw1/requirements.txt
```

### Running in a docker container
The Docker file includes the relevant packages (gym, tensorflow, roboschool).
You can either build it yourself, or use my prebuilt image on Dockerhub: `rbahumi/cs294_roboschool_image`

#### Docker run command:
1. Run a docker instance in the background
2. Open port 8888 to jupter notebook
3. Map the current user to the docker's filesystem
```
docker run -d --name CS294_docker -p 8888:8888 -u $(id -u):$(id -g) -v $(pwd):/tf/srv -it rbahumi/cs294_roboschool_image
```
#### Get the jupyter-notebook token
```
docker exec -it CS294_docker jupyter-notebook list
```
#### Login to the running docker container instance:
```
docker exec -it CS294_docker bash
```

#### Building the docker image
If you with to build the docker container yourself, maybe starting from a different/gpu Tensorflow image, run the following command:
```
# Build the docker
cd hw1
docker build -t cs294_roboschool_image -f Dockerfile .
```
260 changes: 260 additions & 0 deletions hw1/experts/RoboschoolAnt-v1.py

Large diffs are not rendered by default.

261 changes: 261 additions & 0 deletions hw1/experts/RoboschoolHalfCheetah-v1.py

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions hw1/experts/RoboschoolHopper-v1.py

Large diffs are not rendered by default.

472 changes: 472 additions & 0 deletions hw1/experts/RoboschoolHumanoid-v1.py

Large diffs are not rendered by default.

244 changes: 244 additions & 0 deletions hw1/experts/RoboschoolReacher-v1.py

Large diffs are not rendered by default.

257 changes: 257 additions & 0 deletions hw1/experts/RoboschoolWalker2d-v1.py

Large diffs are not rendered by default.

11 changes: 8 additions & 3 deletions hw1/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
gym==0.10.5
# On OSX need to run 'brew install swig' before installing the following pip packages
mujoco-py==1.50.1.56
tensorflow
numpy
roboschool==1.0.46
gym[all]==0.10.5
seaborn
cmake==3.13.3
sklearn
tensorflow
keras==2.2.4
pydot==1.4.1
140 changes: 112 additions & 28 deletions hw1/run_expert.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,73 +3,157 @@
"""
Code to load an expert policy and generate roll-out data for behavioral cloning.
Example usage:
python run_expert.py experts/Humanoid-v1.pkl Humanoid-v1 --render \
--num_rollouts 20
# Using MuJoCo
python run_expert.py experts/Humanoid-v1.pkl Humanoid-v1 --render --num_rollouts 20

# Using Roboschool
python run_expert.py experts/RoboschoolHumanoid-v1.py 'RoboschoolHumanoid-v1' --engine Roboschool --render --num_rollouts 20

Author of this script and included expert policies: Jonathan Ho ([email protected])
"""

Additional roboschool models were added with policies from the Roboschool agent zoo (https://github.com/openai/roboschool).
This code was rebased over Alex Hofer <[email protected]> code.
"""
import argparse
import importlib
import os
import pickle
import tensorflow as tf
import numpy as np
import tf_util
import gym
import tf_util
import load_policy

def main():
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('expert_policy_file', type=str)
parser.add_argument('envname', type=str)
parser.add_argument('--render', action='store_true')
parser.add_argument("--max_timesteps", type=int)
parser.add_argument('--num_rollouts', type=int, default=20,
help='Number of expert roll outs')
args = parser.parse_args()

EXPERT_DIR = "experts"
ROBOSCOOL_EXPERT_DATA_DIR = 'hw1/expert_data'
SUPERVISED_MODELD_DATA_DIR = 'hw1/supervised_modeled_data'
ROBOSCOOL_AVAILABLE_ENVS = ['RoboschoolAnt-v1', 'RoboschoolHumanoid-v1', 'RoboschoolHalfCheetah-v1', 'RoboschoolReacher-v1',
'RoboschoolHopper-v1', 'RoboschoolWalker2d-v1']


ROBOSCHOOL_ENGINE = 'Roboschool'
MOJOCO_ENGINE = 'MuJoCo'
ENGINES = [ROBOSCHOOL_ENGINE, MOJOCO_ENGINE]


def run_mojoco_policy(expert_policy_file, num_rollouts, envname, max_timesteps=None, render=False, verbose=True):
print('loading and building expert policy')
policy_fn = load_policy.load_policy(args.expert_policy_file)
policy_fn = load_policy.load_policy(expert_policy_file)
print('loaded and built')

with tf.Session():
tf_util.initialize()

import gym
env = gym.make(args.envname)
max_steps = args.max_timesteps or env.spec.timestep_limit
env = gym.make(envname)
max_steps = max_timesteps or env.spec.timestep_limit

returns = []
observations = []
actions = []
for i in range(args.num_rollouts):
for i in range(num_rollouts):
print('iter', i)
obs = env.reset()
done = False
totalr = 0.
steps = 0
while not done:
action = policy_fn(obs[None,:])
action = policy_fn(obs[None, :])
observations.append(obs)
actions.append(action)
obs, r, done, _ = env.step(action)
totalr += r
steps += 1
if args.render:
if render:
env.render()
if steps % 100 == 0: print("%i/%i"%(steps, max_steps))
if steps % 100 == 0: print("%i/%i" % (steps, max_steps))
if steps >= max_steps:
break
returns.append(totalr)

print('returns', returns)
print('mean return', np.mean(returns))
print('std of return', np.std(returns))

expert_data = {'observations': np.array(observations),
'actions': np.array(actions)}
'actions': np.array(actions),
'returns': np.array(returns)}
return expert_data


def run_policy(env, policy, num_rollouts, description, max_timesteps=None, render=False, verbose=True):
max_steps = max_timesteps or env.spec.timestep_limit

returns = []
observations = []
actions = []
for i in range(num_rollouts):
if verbose:
print('iter', i)
obs = env.reset()
done = False
totalr = 0.
steps = 0
while not done:
action = policy.act(obs)
observations.append(obs)
actions.append(action)
obs, r, done, _ = env.step(action)
totalr += r
steps += 1
if render:
env.render()
if steps % 100 == 0 and verbose: print("%i/%i" % (steps, max_steps))
if steps >= max_steps:
break
returns.append(totalr)


print('Env description:', description)
#print('returns', returns)
print('mean return', np.mean(returns))
print('std of return', np.std(returns))

expert_data = {'observations': np.array(observations),
'actions': np.array(actions),
'returns': np.array(returns)}
return expert_data


def run_expert_policy(num_rollouts, envname, max_timesteps=None, render=False, verbose=True):
assert envname in ROBOSCOOL_AVAILABLE_ENVS
# Load the policy module
module_name = "%s.%s" % (EXPERT_DIR, envname)
policy_module = importlib.import_module(module_name)

env, policy = policy_module.get_env_and_policy()
description = "Expert policy for module %s" % envname
return run_policy(env=env, policy=policy, num_rollouts=num_rollouts, description=description,
max_timesteps=max_timesteps, render=render, verbose=verbose)


def main():
parser = argparse.ArgumentParser()
parser.add_argument('expert_policy_file', type=str)
parser.add_argument('envname', type=str)
parser.add_argument('--engine', type=str, default=MOJOCO_ENGINE)
parser.add_argument('--render', action='store_true')
parser.add_argument("--max_timesteps", type=int)
parser.add_argument('--num_rollouts', type=int, default=20,
help='Number of expert roll outs')
args = parser.parse_args()

if args.engine == ROBOSCHOOL_ENGINE:
print('loading %s expert policy' % ROBOSCHOOL_ENGINE)
expert_data = run_expert_policy(num_rollouts=args.num_rollouts, envname=args.envname, max_timesteps=args.max_timesteps, render=args.render, verbose=True)

else:
print('loading %s expert policy' % MOJOCO_ENGINE)
expert_data = run_mojoco_policy(expert_policy_file=args.expert_policy_file, num_rollouts=args.num_rollouts,
envname=args.envname, max_timesteps=args.max_timesteps, render=args.render)

returns = expert_data['returns']
print('returns', returns)
print('mean return', np.mean(returns))
print('std of return', np.std(returns))

with open(os.path.join('expert_data', args.envname + '.pkl'), 'wb') as f:
with open(os.path.join('expert_data', args.envname + '.pkl'), 'wb') as f:
pickle.dump(expert_data, f, pickle.HIGHEST_PROTOCOL)

if __name__ == '__main__':
Expand Down