MJX Training Pipeline #14

michael-lutz · 2024-05-23T04:15:48Z

This PR introduces a new way to massively scale up locomotion training. Building upon Brax, it ultimately uses the MJX physics engine for simulation.

Structure

Specifically, this PR includes the following directories:

Envs
Experiments
Utils
(example) Weights
train.py
play.py

Envs includes two types of Brax environments: DefaultHumanoidEnv and StompyEnv. Each environment includes a main class which implements the Brax environment interface and utilizes MJX for all physics calculations. One important thing to note is that reward functions are modular, allowing for quick experimentation.

Experiments includes two .yaml files that include sample configurations for model training.

Utils include default values, rendering rollouts, etc.

Weights currently include default humanoid weights (for locomotion) that should work out of the box

train.py and play.py both integrate with wandb. train.py utilizes the Brax implementation of PPO for now, but can be easily customized if needed.

Performance Samples

Training Curves

Example humanoid robot walking in MJX
https://github.com/kscalelabs/sim/assets/43460304/8e12b0e6-48ea-4af0-8283-1dc4880767b4

Humanoid trained in MJX, eval in CPU-based MuJoCo
https://github.com/kscalelabs/sim/assets/43460304/7f158aeb-6bc9-4056-bd1d-12882adbd13c

…x and mjx

…z/mjx

budzianowski

nit: move weights to mjx_gym/tests

budzianowski

Looks great! Couple cleaning comments. Was stompy env tested at all? If not, let's add it in the next PR since this one is already getting big.

budzianowski · 2024-05-23T05:29:44Z

sim/mjx_gym/envs/__init__.py

@@ -0,0 +1,13 @@
+from brax import envs
+
+from .default_humanoid_env.default_humanoid import DefaultHumanoidEnv


avoid relative imports

budzianowski · 2024-05-23T05:30:10Z

sim/mjx_gym/envs/default_humanoid_env/default_humanoid.py

@@ -0,0 +1,150 @@
+import jax


isort ruff et al., see Makefile for formatting setup

budzianowski · 2024-05-23T05:30:52Z

sim/mjx_gym/envs/default_humanoid_env/default_humanoid.py

+from mujoco import mjx
+from etils import epath
+from .rewards import get_reward_fn
+from utils.default import DEFAULT_REWARD_PARAMS


nit: move default reward params to rewards.py

budzianowski · 2024-05-23T05:31:53Z

sim/mjx_gym/envs/stompy_env/stompy.py

+from mujoco import mjx
+from etils import epath
+import os
+from .rewards import get_reward_fn


avoid relative imports - sim.mjx_gym.envs.rewards

budzianowski · 2024-05-23T05:32:11Z

sim/mjx_gym/envs/stompy_env/stompy.py

+		self.reward_fn = get_reward_fn(self._reward_params, self.dt, include_reward_breakdown=True)
+
+	def reset(self, rng: jp.ndarray) -> State:
+		"""Resets the environment to an initial state.


the formatting looks off.

budzianowski · 2024-05-23T05:33:50Z

sim/mjx_gym/play.py

+    model_path = "weights/" + config.get('project_name', 'model') + ".pkl"
+params = model.load_params(model_path)
+normalize = lambda x, y: x
+if config.get('normalize_observations', False):


Add comment what's going here

budzianowski · 2024-05-23T05:33:58Z

sim/mjx_gym/play.py

+
+# rolling out a trajectory
+render_every = 2
+n_steps = 1000


make this a parameter

budzianowski · 2024-05-23T05:34:14Z

sim/mjx_gym/play.py

+# rolling out a trajectory
+render_every = 2
+n_steps = 1000
+if args.use_mujoco:


Is there actually any difference?

Good question!

On performance: MJX takes a while to initialize, very quick to run on GPU. MuJoCo has a quicker "cold start" equivalent, but actual rollouts are slightly slower.

On model performance: due to slightly different physical dynamics, models are slightly less performant on MuJoCo if trained on MJX. See the second vs first video included in the description of the PR.

budzianowski · 2024-05-23T05:34:23Z

sim/mjx_gym/play.py

+    images = render_mjx_rollout(env, inference_fn, n_steps, render_every)
+print(f'Rolled out {len(images)} steps')
+
+# render the trajectory


Make it optional

I was envisioning this script to be dedicated primarily to rendering, as we can access all other metrics directly in training logs I believe. Do you think it makes sense to make optional?

budzianowski · 2024-05-23T05:35:16Z

sim/mjx_gym/train.py

+ydataerr = []
+times = [datetime.now()]
+
+max_y, min_y = 13000, 0


I don't think this is used anywhere?

budzianowski

small nits but looks good for a start!

budzianowski · 2024-05-23T20:48:15Z

sim/mjx_gym/envs/default_humanoid_env/default_humanoid.py

+        """Resets the environment to an initial state.
+
+        Args:
+                rng: Random number generator seed.


nit - formatting still off here

Suggested change

rng: Random number generator seed.

Args:

rng: Random number generator seed.

budzianowski · 2024-05-23T20:51:11Z

sim/mjx_gym/envs/default_humanoid_env/rewards.py

@@ -0,0 +1,111 @@
+from typing import Callable, Dict, Tuple


Good practice is to add even one line comment what the file is about.

budzianowski · 2024-05-23T20:51:35Z

sim/mjx_gym/envs/default_humanoid_env/rewards.py

+}
+
+
+def get_reward_fn(


This could be used across all envs?

budzianowski · 2024-05-23T20:52:45Z

sim/mjx_gym/envs/default_humanoid_env/rewards.py

+    Returns:
+        A float wrapped in a jax array.
+    """
+    xpos = state.subtree_com[1][0]  # TODO: include stricter typing than mjxState to avoid this type error


Make it more explicit what you are loading

budzianowski · 2024-05-23T20:52:56Z

sim/mjx_gym/envs/default_humanoid_env/rewards.py

+    velocity = (next_xpos - xpos) / dt
+    forward_reward = params["weight"] * velocity
+
+    return forward_reward, jp.array(1.0)  # TODO: ensure everything is initialized in a size 2 array instead...


What's the logic behind 1.0?

jp.array because we want to keep everything in jax. The 1.0 itself operates like a boolean operator (doesn't change the "healthiness" until a 0 comes around)

budzianowski · 2024-05-23T20:53:17Z

sim/mjx_gym/envs/stompy_env/rewards.py

+def healthy_reward_fn(
+    state: mjxState, action: jp.ndarray, next_state: mjxState, dt: jax.Array, params: Dict[str, float]
+) -> Tuple[jp.ndarray, jp.ndarray]:
+    """Reward function for staying healthy.


rename upright _reward

budzianowski · 2024-05-23T20:53:33Z

sim/mjx_gym/envs/stompy_env/stompy.py

+        Resets the environment to an initial state.
+
+        Args:
+                        rng: Random number generator seed.


nit - weird tabs

budzianowski · 2024-05-23T20:54:02Z

sim/mjx_gym/play.py

@@ -0,0 +1,81 @@
+import argparse


Add example how to run it

budzianowski · 2024-05-23T20:54:07Z

sim/mjx_gym/train.py

@@ -0,0 +1,80 @@
+import argparse


Add example how to run it

michael-lutz added 12 commits May 22, 2024 03:17

feat: created new default humanoid environment class implementing bra…

f682180

…x and mjx

feat: ran initial training with new MJX environment

b4fe375

feat: added nicer training script

c708b19

feat: added training config for stompy

b9cc1ff

feat: storing weight checkpoints and added play

2eead0c

feat: added stomppy environment

ba5419b

feat: added model checkpointing

f2b3db3

feat: printing when rendering

d71aa52

feat: removed unnecessary model checkpointing

b9e00fb

Merge branch 'lutz/mjx' of https://github.com/kscalelabs/sim into lut…

df829ef

…z/mjx

chore: cleaned up play script and added CPU-only rendering

12d3f8b

chore: removed brax replication

736d3bf

michael-lutz added the enhancement New feature or request label May 23, 2024

michael-lutz requested review from codekansas, hu-po and budzianowski May 23, 2024 04:15

michael-lutz self-assigned this May 23, 2024

budzianowski reviewed May 23, 2024

View reviewed changes

budzianowski requested changes May 23, 2024

View reviewed changes

michael-lutz added 6 commits May 23, 2024 16:25

fix: fixing stompy environment import issues and created simplified mesh

da03b17

chore: sorting libraries, typing, etc

f2b8f18

chore: cleaned up training scripts and file organization

9346d4e

chore: removed unused imports

e2a9be4

fix: removing duplicate exclude definition in pyproject

f570c55

fix: fixing pyproject static checks

487e816

budzianowski self-requested a review May 23, 2024 20:47

budzianowski approved these changes May 23, 2024

View reviewed changes

michael-lutz mentioned this pull request May 23, 2024

MJX Training Implementation kscalelabs/ksim#1

Merged

budzianowski closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MJX Training Pipeline #14

MJX Training Pipeline #14

michael-lutz commented May 23, 2024

budzianowski left a comment

budzianowski left a comment •

edited

Loading

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

michael-lutz May 23, 2024

budzianowski May 23, 2024

michael-lutz May 23, 2024

budzianowski May 23, 2024

budzianowski left a comment

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

michael-lutz May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

budzianowski May 23, 2024

		@@ -0,0 +1,13 @@
		from brax import envs

		from .default_humanoid_env.default_humanoid import DefaultHumanoidEnv

	rng: Random number generator seed.
	Args:
	rng: Random number generator seed.

MJX Training Pipeline #14

MJX Training Pipeline #14

Conversation

michael-lutz commented May 23, 2024

Structure

Performance Samples

budzianowski left a comment

Choose a reason for hiding this comment

budzianowski left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

budzianowski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

budzianowski left a comment •

edited

Loading