v0.6
DPPO update now uses transitions that are random sampled over both environment steps and denoising steps, instead of only over environment steps (i.e., the original update uses the entire denoising chain of each sampled environment step). We find a minor improvement in training stability and final performance with the new kind of update and the config files are mostly updated (with larger train.batch_size
now).
We also add configs for experiments with Franka Kitchen environments from D4RL and Robomimic MH dataset.
In progress: finish updating all configs and update arxiv with updated experiment results