Skip to content

v0.5.0

Compare
Choose a tag to compare
@StoneT2000 StoneT2000 released this 23 Aug 17:21
· 316 commits to main since this release
e1a678d

ManiSkill2 Release Notes

This update migrates ManiSkill2 over to using the new gymnasium package along with a number of other changes.

Breaking Changes

  • env.render now accepts no arguments. The old render functions are separated out as other functions and env.render calls them and chooses which one based on the env.render_mode attribute (set usually upon env creation).
  • env.step returns observation, reward, terminated, truncated, info. See https://gymnasium.farama.org/content/migration-guide/#environment-step for details. For ManiSkill2, the old done signal is now called terminated and truncated is False. All environments by default have a 200 max episode steps so truncated=True after 200 steps.
  • env.reset returns a tuple observation, info. For ManiSkill2, info is always an empty dictionary. Moreover, env.reset accepts two new keyword arguments: seed: int, options: dict | None. Note that options is usually used to configure various random settings/numbers of an environment. Previously ManiSkill2 used to use custom keyword arguments such as reconfigure. These keyword arguments are still usable but must be passed through an options dict e.g. env.reset(options=dict(reconfigure=True)).
  • env.seed has now been removed in favor of using env.reset(seed=val) per the Gymnasium API.
  • ManiSkill VectorEnv is now also modified to adhere to the Gymnasium Vector Env API. Note this means that vec_env.observation_space and vec_env.action_space are batched under the new API, and the individual environment spaces are defined as vec_env.single_observation_space and vec_env.single_action_space
  • All reward functions have been changed to be scaled to the range of [0, 1], generally making any value-learning kind of approach more stable and avoiding gradient explosions. On any environment a reward of 1 indicates success as well and is also indicated by the boolean stored in info["success"]. The scaled dense rewards are the new default reward function and is called normalized_dense. To use the old <0.5.0 ManiSkill2 dense rewards, set reward_mode to dense.

New Additions

Code

  • Environment code come with separated render functions representing the old render modes. There is now env.render_human for creating a interactive GUI and viewer, env.render_rgb_array for generating RGB images of the current env from a 3rd person perspective, and env.render_cameras which renders all the cameras (including rgb, depth, segmentation if available) and compacts them into one rgb image that is returned. Note that human and rgb_array are used only for visualization purposes. They may include artifacts like indicators of where the goal is for visualization purposes, see PickCube-v0 or PandaAvoidObstacles-v0 for examples. cameras mode is reflective of what the actual visual observations are returned by calls to env.reset and env.step.
  • The ManiSkill2 VecEnv creator function make_vec_env now accepts a max_episode_steps argument which overrides the default max_episode_steps specified when registering the environment. The default max_episode_steps is 200 for all environments, but note it may be more efficient for RL training and evaluation to use a smaller value as shown in the RL tutorials.

Data

Tutorials

  • All tutorials have been updated to reflect new gym API, new stable baselines 3, and should be more stable on google colab

Not Code

  • New CONTRIBUTING.md document has been added, with details on how to locally develop on ManiSkill2 and test it

Bug Fixes

  • Closes #124 with using the newest version of Sapien, 2.2.2.
  • Closes #119 via #123 where scalar values returned by the state part of a dictionary would cause errors.
  • Fixes a compatability bug with Gymnasium AsyncVectorEnv where Gymnasium also could not handle scalar values as it expects shape (1, ), not shape (). This is done by modifying environments to instead of returning floats for certain scalar observation values to return numpy array versions of them. So far only affected TurnFaucet-v0. Partially closes #125 where TurnFaucet-v0 had non-deterministic rewards due to computing rewards based on unseeded sampled points from various meshes.

Miscellaneous Changes

  • Dockerfile now accepts a python version as an argument
  • README and documentation updated to reflect new gym API
  • mani_skill2.examples.demo_vec_env module now accepts a --vecenv-type argument which can be either ms2 or gym and defaults to ms2. Lets users benchmark the speed difference themselves. Module was further cleaned to print more nicely
  • Various example scripts that have main functions now accept an args argument and allow for using those scripts from within python and not just the CLI. Used for testing purposes.
  • Fix some lack of quietness on some example scripts
  • Replaying trajectories accepts a new --count argument that lets you specify how many trajectories to replay. There is no data shuffling so the replayed trajectories will always be the same and in the same order. By default this is None meaning all trajectories are replayed.

What's Changed

Full Changelog: haosulab/ManiSkill2@v0.4.2...v0.5.0