Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset v2.0 #461

Merged
merged 130 commits into from
Nov 29, 2024
Merged

Dataset v2.0 #461

merged 130 commits into from
Nov 29, 2024

Conversation

aliberts
Copy link
Collaborator

@aliberts aliberts commented Oct 3, 2024

What this does

This PR introduces a new format for LeRobotDataset, which is accompanied by a new file structure. As these changes are not backward compatible, we increase CODEBASE_VERSION from v1.6 to v2.0.

What do I need to do?

If you already pushed a dataset using v1.6 of our codebase, you can use the conversion script lerobot/common/datasets/v2/convert_dataset_v1_to_v2.py to convert it to the new format.
You will be asked to enter a prompt describing the task performed in the dataset.

Examples for single-task dataset:

python lerobot/common/datasets/v2/convert_dataset_v1_to_v2.py \
    --repo-id lerobot/aloha_sim_insertion_human_image \
    --task "Insert the peg into the socket."

If you recorded your dataset with one of the manipulator robots currently supported in LeRobot (or your own implementation), you can provide its configuration path to add the motor names and robot type to the dataset info using the --robot-config option:

python lerobot/common/datasets/v2/convert_dataset_v1_to_v2.py \
    --repo-id aliberts/koch_tutorial \
    --task "Pick the Lego block and drop it in the box on the right." \
    --robot-config lerobot/configs/robot/koch.yaml

For the more complicated cases of one task per episode or multiple tasks per episodes, please refer to the documentation in that script.

Motivation

Current implementation of our LeRobotDataset suffers from a few shortcomings which make it not easy to use on some aspects. Specifically:

  • The structure of the files does not accurately reflect the data structure. Our datasets are structured by episodes, which contrasts with a typical ML scenarios with train/val/test splits (although these concepts can still be relevant here). This makes it hard to easily select a subset of episodes from a dataset since the whole dataset has to be downloaded/loaded. Related: #440
  • Due to the current hub's limitations, one can not push a dataset with — at most - more than 10k episodes (less if there are multiple cameras).
  • The format is not transparent to the user: in order to get information about the content of a dataset, current options are limited to download the entire dataset and inspect it with a custom script, or try to visualize it using our visualization tool. Related: #383
  • The default file cache system used by datasets and huggingface_hub makes it not convenient to create datasets locally (with recording). In order to use the newly created files on disk, these libraries check if those files are present in the cache (which they won't) and if not, will download them even though they may already be on disk.
  • Some file format used are too framework specific for this format to be more universal (e.g. .safetensors)
  • The dataset viewer on the hub is not compatible with our datasets due to VideoFrame not yet being integrated into datasets.
  • The current implementation lacks support for future features that we may want to add such as:
    • Task-tokens-conditioned training
    • Multirobot policies
    • Depth images (Related: #435)

Changes

Some of the biggest change come from the new file structure and their content:

  .
  ├── data
- │   ├── train-00000-of-0001.parquet
+ │   ├── chunk-000
+ │   │   ├── episode_000000.parquet
+ │   │   ├── episode_000001.parquet
+ │   │   ├── episode_000002.parquet
+ │   │   └── ...
+ │   ├── chunk-001
+ │   │   ├── episode_001000.parquet
+ │   │   ├── episode_001001.parquet
+ │   │   ├── episode_001002.parquet
+ │   │   └── ...
+ │   └── ...
- ├── meta_data
+ ├── meta
- │   ├── episode_data_index.safetensors
+ │   ├── episodes.jsonl
  │   ├── info.json
+ │   ├── stats.json
- │   ├── stats.safetensors
+ │   └── tasks.jsonl
  └── videos
+     ├── chunk-000
+     │   ├── observation.images.laptop
      │   │   ├── episode_000000.mp4
      │   │   ├── episode_000001.mp4
      │   │   ├── episode_000002.mp4
      │   │   └── ...
+     │   ├── observation.images.phone
      │   │   ├── episode_000000.mp4
      │   │   ├── episode_000001.mp4
      │   │   ├── episode_000002.mp4
      │   │   └── ...
+     ├── chunk-001
      └── ...

Note that this file-based structure is designed to be as versatile as possible. The parquet files are split by episodes (this was already the case for videos) which allows a much more granular control over which episodes one wants to use and download. The structure of the dataset is entirely described in the info.json file, which can be easily downloaded or viewed directly on the hub before downloading any actual data. The type of files used are very simple and do not need complex tools to be read, it only uses .parquet, .json, .jsonl and .mp4 files (.md for the README).

Added

  • A LeRobotDataset can now be called with an episodes argument (e.g. episodes=[1, 10, 12, 40]) to select a specific subset of episodes by their episode_index. By doing so, only the files corresponding to these episodes will be downloaded (if they're not already on disk). In that case, the hf_dataset attribute will only contain data from these episodes, as well as the episode_data_index.
  • Dataset metadata logic is now handled by the LeRobotDatasetMetadata class. This allows to get info about a dataset before loading the data. For example, you could do this:
# Fetch metadata from the hub
metadata = LeRobotDatasetMetadata("lerobot/pusht")

# Calculate train and val episodes
total_episodes = metadata.total_episodes
episodes = list(range(metadata.total_episodes))
num_train_episodes = math.floor(total_episodes * 90 / 100)
train_episodes = episodes[:num_train_episodes]
val_episodes = episodes[num_train_episodes:]

# Load train and val datasets
train_dataset = LeRobotDataset("lerobot/pusht", episodes=train_episodes)
val_dataset = LeRobotDataset("lerobot/pusht", episodes=val_episodes)
  • Tasks as natural language prompts are now in every datasets and is needed to create one. Every single task of a dataset is listed in the tasks.json mapped to its task_index which is what's actually stored in parquet files. Using the api, they can be accessed either with dataset.meta.tasks to get that mapping or through dataset.episode_dict[episode_index]["tasks"] if you're only interested in a particular episode.
  • Various information about the structure of the dataset have been added and is now centralized in the info.json (keys, shapes, number of episodes, etc.). It serves as a source of truth for what's inside the dataset.
  • episodes.jsonl contains per-episode information (episode_index, tasks in natural language and episode lengths). This is accessed through the episode_dict attribute in the api.
  • LeRobotDataset.create() allows to create a new dataset from scratch, either for recording data or for porting an existing dataset to the LeRobotDataset format. To that end, new methods are added:
    • start_image_writter(): This instantiates an ImageWriter in the image_writer attribute to write images asynchonously during data recording. This is automatically called during LeRobotDataset.create() if specified in the arguments.
    • stop_image_writter(): This is to properly stop and remove the ImageWriter from the dataset's attributes. Importantly: if the image_writer has been set to a multiprocess ImageWriter, this needs to be called first if you want to pass this dataset into a parallelized DataLoader as the ImageWriter class is not pickleable (required for objects to be transfered between processes). This is not needed when instantiating a dataset with __init__ as the image_writer then is not created.
    • add_frame(): Adds a single timestamp data frame to the episode_buffer, which keep data in memory temporarily. Note: this will be merged with the DataBuffer from #445 in a subsequent PR.
    • add_episode(): Saves the content of the episode_buffer to disk and updates metadata for them to be in sync with the contents of the files. This method expects a task argument as a string prompt in natural language describing the task performed in the episode. Videos from that episode can optionally be encoded during this phase but it's not mandatory and can be done later in order to give more flexibility on when to do that.
    • consolidate(): This will encode videos that have not yet been encoded, clean up the temporary image files, compute dataset statistics, check timestamps are in sync with the fps and perform additional sanity checks in the dataset. It needs to be done before uploading the dataset to the hub with push_to_hub().
    • clear_episode_buffer(): This can be used to reset the episode_buffer (e.g. to discard data from a current recording).

Changed

  • The logic for checking timestamps and delta_timestamps sync is taken outside of the __getitem__() and is now done during __init__ or consolidate. This has the benefit of both saving computation during the __getitem__() as well as knowing immediately if there are sync issues with the timestamps.
  • The paths for the parquet and video files are now embedded in the info.json to allow flexibility and to easily split chunks of files between directories to avoid the hub's limit of files (10k) per folder.
  • We now store every datasets (created or downloaded) in ~/.cache/huggingface/lerobot by default. Changing root or setting the LEROBOT_HOME env variable allows to change that location. Every call to the huggingface_hub download functions like snapshot_download or hf_hub_download use the local_dir argument to that location so that files are not duplicated in cache and to solve the issue of having to download again files already present on disk.
  • Refactored the image writing code from populate_dataset.py into an ImageWriter class.
  • stats.safetensors is now stats.json (the content remains the same but it's unflattened).
  • episode_data_index.safetensors is removed but the episode_data_index is still in the api to map episode_index to indices.

Performance

In the nominal case (no delta_timestamp), LeRobotDataset.__getitem__() is on par with the previous version, sometimes slightly improved but still in the same ballpark generally.

__getitem__() call time in seconds (average on 10k iterations):

repo_id                                 | v1.6   | v2.0  
--------------------------------------- | ------ | ------
lerobot/aloha_sim_insertion_human_image | 0.0036 | 0.0037
lerobot/aloha_sim_insertion_human       | 0.0029 | 0.0027
lerobot/pusht_image                     | 0.0003 | 0.0003
lerobot/pusht                           | 0.0011 | 0.0009
aliberts/koch_tutorial                  | 0.0111 | 0.0106
lerobot/aloha_mobile_cabinet            | 0.0104 | 0.0101
Benchmarking code
from pathlib import Path
import time
import torch
from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset

repo_ids = [
    "lerobot/aloha_sim_insertion_human_image",
    "lerobot/aloha_sim_insertion_human",
    "lerobot/pusht_image",
    "lerobot/pusht",
    "aliberts/koch_tutorial",
    "lerobot/aloha_mobile_cabinet",
]
num_iterations = 10000
logfile = Path(f"perf_log_{CODEBASE_VERSION}_{num_iterations}.txt")
with open(logfile, "a") as file:
    file.write(f"__getitem__() call time in seconds (average on {num_iterations} iterations)\n\n")
    file.write(f"repo_id                                 | {CODEBASE_VERSION}  \n")
    file.write("--------------------------------------- | ------\n")

for repo_id in repo_ids:
    dataset = LeRobotDataset(repo_id=repo_id)
    durations = []
    for i in range(num_iterations):
        start = time.perf_counter()
        item = dataset[i]
        duration = time.perf_counter() - start
        durations.append(duration)

    avg_duration = torch.Tensor(durations).mean()
    results = f"{repo_id} | {avg_duration:.4f}s"
    print(results)
    with open(logfile, "a") as file:
        file.write(results + "\n")

Using delta_timestamps, results are more diverse depending on the dataset but still remain in the same ballpark.
__getitem__() call time in seconds (average on 10k iterations), delta_timestamps=[-1/fps, 0, 1/fps]:

repo_id                                 | v1.6   | v2.0  
--------------------------------------- | ------ | ------
lerobot/aloha_sim_insertion_human_image | 0.0176 | 0.0160
lerobot/aloha_sim_insertion_human       | 0.0073 | 0.0068
lerobot/pusht_image                     | 0.0024 | 0.0032
lerobot/pusht                           | 0.0028 | 0.0043
aliberts/koch_tutorial                  | 0.0200 | 0.0184
lerobot/aloha_mobile_cabinet            | 0.0224 | 0.0181
Benchmarking code (delta_timestamps)
from pathlib import Path
import time
import torch
from lerobot.common.datasets.lerobot_dataset import CODEBASE_VERSION, LeRobotDataset

repo_ids = [
    "lerobot/aloha_sim_insertion_human_image",
    "lerobot/aloha_sim_insertion_human",
    "lerobot/pusht_image",
    "lerobot/pusht",
    "aliberts/koch_tutorial",
    "lerobot/aloha_mobile_cabinet",
]
num_iterations = 10000
logfile = Path(f"perf_log_{CODEBASE_VERSION}_{num_iterations}.txt")
with open(logfile, "a") as file:
    file.write(f"__getitem__() call time in seconds (average on {num_iterations} iterations)\n\n")
    file.write(f"repo_id                                 | {CODEBASE_VERSION}  \n")
    file.write("--------------------------------------- | ------\n")

for repo_id in repo_ids:
    dataset = LeRobotDataset(repo_id=repo_id)
    fps = dataset.fps
    keys = ["observation.state", *dataset.camera_keys]
    delta_timestamps = {key: [-1/fps, 0, 1/fps] for key in keys}
    dataset = LeRobotDataset(repo_id=repo_id, delta_timestamps=delta_timestamps)
    durations = []
    for i in range(num_iterations):
        start = time.perf_counter()
        item = dataset[i]
        duration = time.perf_counter() - start
        durations.append(duration)

    del dataset
    avg_duration = torch.Tensor(durations).mean()
    results = f"{repo_id} | {avg_duration:.4f}s"
    print(results)
    with open(logfile, "a") as file:
        file.write(results + "\n")

Fixes

  • Fix a bug in load_previous_and_future_frames which didn't actually raise an error when the requested timestamps from delta_timestamps did not correspond to actual timestamps in the dataset.
  • Various fixes on the datasets have been made:
    • Some tasks already present in some datasets contained strings which were not part of the task (e.g. "tf.Tensor(b'Do something', shape=(), dtype=string)")
    • Some video files were not properly tracked by git lfs
    • Some datasets present a mismatch between the number of episodes in their parquet and the number of video files. This is being investigated [TODO]
      • lerobot/aloha_mobile_shrimp
      • lerobot/aloha_static_battery
      • lerobot/aloha_static_fork_pick_up
      • lerobot/aloha_static_thread_velcro
      • lerobot/uiuc_d3field
    • lerobot/viola is missing video keys [TODO]

How it was tested

  • Adds tests/fixtures/ in which fixtures and fixtures factories have been added to simplify writing/adding tests. These factories allow the flexibility to create partially mocked objects on the fly to be used in tests, while not relying on other components of the codebase that are not meant to be tested in a particular test (e.g. initializing a dataset using hydra).
  • Adds tests/test_image_writer.py
  • Adds tests/test_delta_timestamps.py
  • Deactivates a bunch of tests which will need to be redesigned and simplified in further PRs.

How to checkout & try? (for the reviewer)

Use an existing dataset:

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

REPO_ID = "lerobot/aloha_sim_insertion_human"  # try with '_image' as well

delta_timestamps = {
    "observation.images.top": [-1, -1/50, 0, 25/50],
    "observation.state": [-1, -1/50, 0, 25/50],
}
dataset = LeRobotDataset(repo_id=REPO_ID, delta_timestamps=delta_timestamps)

Try out the new feature to select / download specific episodes:

dataset = LeRobotDataset(repo_id=REPO_ID, episodes=[1, 10, 12, 40])

You can also create a new dataset:

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

REPO_ID = "your_hf_username/test_v2"

new_dataset = LeRobotDataset.create(
    repo_id=REPO_ID,
    fps=30,
    robot=robot,
    image_writer_threads_per_camera=1,
)

# TODO
frame = {
    ...
}
new_dataset.add_frame(frame)
new_dataset.add_episode(task="Do something")
new_dataset.consolidate()

@aliberts aliberts added ✨ Enhancement New feature or request 🗃️ Dataset Something dataset-related labels Oct 3, 2024
@aliberts aliberts self-assigned this Oct 3, 2024
@Cadene Cadene self-requested a review October 11, 2024 15:10
lerobot/common/datasets/lerobot_dataset.py Outdated Show resolved Hide resolved
lerobot/common/datasets/lerobot_dataset.py Outdated Show resolved Hide resolved
lerobot/common/datasets/lerobot_dataset.py Outdated Show resolved Hide resolved
@Cadene
Copy link
Collaborator

Cadene commented Nov 22, 2024

TODO after merging: #485

Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work thanks. Left some comments. Hope it helps :)

.github/workflows/test.yml Outdated Show resolved Hide resolved
examples/1_load_lerobot_dataset.py Show resolved Hide resolved
examples/1_load_lerobot_dataset.py Outdated Show resolved Hide resolved
examples/advanced/2_calculate_validation_loss.py Outdated Show resolved Hide resolved
examples/port_datasets/pusht_zarr.py Outdated Show resolved Hide resolved
tests/test_datasets.py Show resolved Hide resolved
tests/test_datasets.py Show resolved Hide resolved
@@ -297,6 +289,7 @@ def test_flatten_unflatten_dict():
assert json.dumps(original_d, sort_keys=True) == json.dumps(d, sort_keys=True), f"{original_d} != {d}"


@pytest.mark.skip("TODO after v2 migration / removing hydra")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test test_backward_compatibility(repo_id): makes me think we should probably train diffusion policy on pusht before and after this PR to compare dataset v1 vs v2.

tests/test_datasets.py Show resolved Hide resolved
tests/test_policies.py Show resolved Hide resolved
@astroyat
Copy link

I tried training using the new dataset and see some errors in compute_stats.py, should d.stats be changed to d.meta.stats?

@aliberts aliberts merged commit 32eb0ce into main Nov 29, 2024
6 of 7 checks passed
@aliberts aliberts deleted the user/aliberts/2024_09_25_reshape_dataset branch November 29, 2024 18:04
DomThePorcupine pushed a commit to DomThePorcupine/lerobot that referenced this pull request Dec 2, 2024
villekuosmanen added a commit to villekuosmanen/lerobot that referenced this pull request Dec 30, 2024
* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484)

Co-authored-by: Alexander Soare <[email protected]>

* Fix config file (huggingface#495)

* fix: broken images and a few minor typos in README (huggingface#499)

Signed-off-by: ivelin <[email protected]>

* Add support for Windows (huggingface#494)

* bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450)

* Add distinction between two unallowed cases in name check "eval_" (huggingface#489)

* Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490)

* Dataset v2.0 (huggingface#461)

Co-authored-by: Remi <[email protected]>

* Refactor OpenX (huggingface#505)

* Fix missing local_files_only in record/replay (huggingface#540)

Co-authored-by: Simon Alibert <[email protected]>

* Control simulated robot with real leader (huggingface#514)

Co-authored-by: Remi <[email protected]>

* Update 7_get_started_with_real_robot.md (huggingface#559)

* LerobotDataset pushable to HF from any folder (huggingface#563)

* Fix example 6 (huggingface#572)

* fixing typo from 'teloperation' to 'teleoperation' (huggingface#566)

* [vizualizer] for LeRobodDataset V2 (huggingface#576)

* Fix broken `create_lerobot_dataset_card`  (huggingface#590)

* feat(act): support training end of episode token to ACT model

* changes

* feat(arx): add arx arm (#2)

* feat(arx): support arx arm

* changes

* changes

* changes

* changes

* pass pipes explicitly

* changes

* us ndarray over a pipe

* changes

* changes

* replay basically works

* patch arx sdk

* changes

* support cameras in arx5

* rename to arx5

* kind of works

* changes

* changes

* changes

* various changes

* changes

* revert a few changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* changes

* remove TODO

* allow multiple tasks

---------

Signed-off-by: ivelin <[email protected]>
Co-authored-by: Hirokazu Ishida <[email protected]>
Co-authored-by: Alexander Soare <[email protected]>
Co-authored-by: Arsen Ohanyan <[email protected]>
Co-authored-by: Ivelin Ivanov <[email protected]>
Co-authored-by: Daniel Ritchie <[email protected]>
Co-authored-by: resolver101757 <[email protected]>
Co-authored-by: Jannik Grothusen <[email protected]>
Co-authored-by: KasparSLT <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: berjaoui <[email protected]>
Co-authored-by: Claudio Coppola <[email protected]>
Co-authored-by: s1lent4gnt <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Eugene Mironov <[email protected]>
villekuosmanen added a commit to villekuosmanen/lerobot that referenced this pull request Jan 10, 2025
* feat: enable to use multiple rgb encoders per camera in diffusion policy (huggingface#484)

Co-authored-by: Alexander Soare <[email protected]>

* Fix config file (huggingface#495)

* fix: broken images and a few minor typos in README (huggingface#499)

Signed-off-by: ivelin <[email protected]>

* Add support for Windows (huggingface#494)

* bug causes error uploading to huggingface, unicode issue on windows. (huggingface#450)

* Add distinction between two unallowed cases in name check "eval_" (huggingface#489)

* Rename deprecated argument (temporal_ensemble_momentum) (huggingface#490)

* Dataset v2.0 (huggingface#461)

Co-authored-by: Remi <[email protected]>

* Refactor OpenX (huggingface#505)

* Fix missing local_files_only in record/replay (huggingface#540)

Co-authored-by: Simon Alibert <[email protected]>

* Control simulated robot with real leader (huggingface#514)

Co-authored-by: Remi <[email protected]>

* Update 7_get_started_with_real_robot.md (huggingface#559)

* LerobotDataset pushable to HF from any folder (huggingface#563)

* Fix example 6 (huggingface#572)

* fixing typo from 'teloperation' to 'teleoperation' (huggingface#566)

* [vizualizer] for LeRobodDataset V2 (huggingface#576)

* Fix broken `create_lerobot_dataset_card`  (huggingface#590)

* Update README.md (huggingface#612)

* Fix Quality workflow (huggingface#622)

* fix(docs): typos in benchmark readme.md (huggingface#614)

Co-authored-by: Simon Alibert <[email protected]>

* fix(visualise): use correct language description for each episode id (huggingface#604)

Co-authored-by: Simon Alibert <[email protected]>

* typo fix: batch_convert_dataset_v1_to_v2.py (huggingface#615)

Co-authored-by: Simon Alibert <[email protected]>

* [viz] Fixes & updates to html visualizer (huggingface#617)

* fixes to SO-100 readme (huggingface#600)

Co-authored-by: Philip Fung <no@one>
Co-authored-by: Simon Alibert <[email protected]>

---------

Signed-off-by: ivelin <[email protected]>
Co-authored-by: Hirokazu Ishida <[email protected]>
Co-authored-by: Alexander Soare <[email protected]>
Co-authored-by: Arsen Ohanyan <[email protected]>
Co-authored-by: Ivelin Ivanov <[email protected]>
Co-authored-by: Daniel Ritchie <[email protected]>
Co-authored-by: resolver101757 <[email protected]>
Co-authored-by: Jannik Grothusen <[email protected]>
Co-authored-by: KasparSLT <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Michel Aractingi <[email protected]>
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: berjaoui <[email protected]>
Co-authored-by: Claudio Coppola <[email protected]>
Co-authored-by: s1lent4gnt <[email protected]>
Co-authored-by: Mishig <[email protected]>
Co-authored-by: Eugene Mironov <[email protected]>
Co-authored-by: CharlesCNorton <[email protected]>
Co-authored-by: Philip Fung <[email protected]>
Co-authored-by: Philip Fung <no@one>
menhguin pushed a commit to menhguin/lerobot that referenced this pull request Feb 9, 2025
aliberts pushed a commit that referenced this pull request Feb 9, 2025
[Fix] Move back to manual calibration (#488)

feat: enable to use multiple rgb encoders per camera in diffusion policy (#484)

Co-authored-by: Alexander Soare <[email protected]>

Fix config file (#495)

fix: broken images and a few minor typos in README (#499)

Signed-off-by: ivelin <[email protected]>

Add support for Windows (#494)

bug causes error uploading to huggingface, unicode issue on windows. (#450)

Add distinction between two unallowed cases in name check "eval_" (#489)

WIP

Fix autocalib moss (#486)

[Fix] Move back to manual calibration (#488)

feat: enable to use multiple rgb encoders per camera in diffusion policy (#484)

Co-authored-by: Alexander Soare <[email protected]>

Fix config file (#495)

fix: broken images and a few minor typos in README (#499)

Signed-off-by: ivelin <[email protected]>

Add support for Windows (#494)

bug causes error uploading to huggingface, unicode issue on windows. (#450)

Add distinction between two unallowed cases in name check "eval_" (#489)

Rename deprecated argument (temporal_ensemble_momentum) (#490)

Dataset v2.0 (#461)

Co-authored-by: Remi <[email protected]>

Refactor OpenX (#505)

Fix missing local_files_only in record/replay (#540)

Co-authored-by: Simon Alibert <[email protected]>

Control simulated robot with real leader (#514)

Co-authored-by: Remi <[email protected]>

Update 7_get_started_with_real_robot.md (#559)

LerobotDataset pushable to HF from any folder (#563)

Fix example 6 (#572)

fixing typo from 'teloperation' to 'teleoperation' (#566)

[vizualizer] for LeRobodDataset V2 (#576)

Fix broken `create_lerobot_dataset_card`  (#590)

Update README.md (#612)

Add draccus, create MainConfig

WIP refactor train.py and ACT

Add policies training presets

Update diffusion policy

Add pusht and xarm env configs

Update tdmpc

Update vqbet

Fix poetry relax

Add feature types to envs

Add EvalPipelineConfig, parse features from envs

Add custom parser

Update pretrained loading mechanisms

Add dependency fixes & lock update

Fix pretrained_path

Refactor envs, remove RealEnv

Fix typo

Enable end-to-end tests

Fix Makefile

Log eval config

Fix end-to-end tests

Fix Quality workflow (#622)

Remove amp & add resume test

Speed-up tests

Fix poetry relax

Remove config yaml for robot devices (#594)

Co-authored-by: Simon Alibert <[email protected]>

fix(docs): typos in benchmark readme.md (#614)

Co-authored-by: Simon Alibert <[email protected]>

fix(visualise): use correct language description for each episode id (#604)

Co-authored-by: Simon Alibert <[email protected]>

typo fix: batch_convert_dataset_v1_to_v2.py (#615)

Co-authored-by: Simon Alibert <[email protected]>

[viz] Fixes & updates to html visualizer (#617)

Fix logger

Remove hydra-core

Add aggregate_stats

Add estimate_num_samples for images, Add test image

Remove NoneSchedulerConfig

Add push_pretrained

Remove eval.episode_length

Fix wandb_video

Fix typo

Add features back into policy configs (#643)

fixes to SO-100 readme (#600)

Co-authored-by: Philip Fung <no@one>
Co-authored-by: Simon Alibert <[email protected]>

Fix for the issue #638 (#639)

Fix env_to_policy_features call

Fix wandb init

remove omegaconf

Add branch arg

Move deprecated

Move training config

Remove pathable_args

Implement custom HubMixin

Fixes

Implement PreTrainedPolicy base class

Add HubMixin to TrainPipelineConfig

Udpate example 2 & 3

Update push_pretrained

Bump`rerun-sdk` dependency to `0.21.0` (#618)

Co-authored-by: Simon Alibert <[email protected]>

Fix config_class

Fix from_pretrained kwargs

Remove policy_protocol

Camelize PretrainedConfig

Additional fix while retraining policies (#629)

Co-authored-by: Simon Alibert <[email protected]>

Actually reactivate tdmpc online test

Update example 4

Remove advanced example 1

Remove example 5

Move example 6 to advanced

Use HubMixin.save_pretrained

Enable config_path to be a repo_id

Dry has_method

Update example 4

Update README

Cleanup pyproject.toml

Update eval docstring

Update README

Clean example 4

Update README

Make 'last' checkpoint symlink relative

Fix cluster image (#653)

Simplify example 4

fix stats per episodes and aggregate stats and casting to tensor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🗃️ Dataset Something dataset-related ✨ Enhancement New feature or request
Projects
Status: Done
5 participants