This release creates a distinction between reference trajectory and no-reference trajectory environments (CassieTrajEnv
and CassieEnv
respectively), but leaves other attributes such as learning PD gains, dynamics randomization, and using a full or minimal input as arguments to the environment's constructor.
Instead of clock_based
and phase_based
, environments now have a command_profile
attribute which specifies the type of command input to the policy. This can be clock or phase, or even traj in the case of CassieTrajEnv
. Another new attribute, input_profile
, specifies the size and composition of the policy's input. This can be full or min. Naturally the number of choices for command_profile and input_profile can be readily expanded as research progresses.
Other notable features:
- Policy comparison Test
- Playground environment for running autonomous missions
- Custom terrain via PNG height maps
- Comprehensive 5K test
- Live plot when evaluating phase_based policies.
- More insightful PPO logging info (optimize and sample times)