New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add transformer model to diffusion policy #481

Open

amandip7 wants to merge 2 commits into huggingface:main from amandip7:add_transformer_diffusion_model

Contributor

amandip7 commented Oct 21, 2024 •

edited

Loading

What this does

Implements transformer model for diffusion policy : https://github.com/orgs/huggingface/projects/46?pane=issue&itemId=64857791
This adds a transformer model class to modelling_diffusion, adds configurations required for this model and sets the default model in diffusion to be transformer instead of U-net.

How it was tested

Trained a transformer diffusion policy with the default configurations provided.
Ported a pre-trained transformer diffusion policy from original DP repo to lerobot and ran evaluation here.

How to checkout & try? (for the reviewer)

Run the training script:
python lerobot/scripts/train.py
This would train the transformer based diffusion policy with default configuration over pushT task.

amandip7 and others added 2 commits

October 21, 2024 19:16


          Adding transformer model for diffusion policy - all changes condensed…

7f1b63c

… into one commit


          Merge branch 'main' into add_transformer_diffusion_model

b0e3461

alexander-soare requested changes

View reviewed changes

Collaborator

alexander-soare left a comment

Thanks @amandip7 for this PR! People have been asking about the transformer version of diffusion policy for a while.

I couple of high-level comments to go with my first review:

What do you say that we add a diffusion_transformer.yaml instead of updating the current diffusion.yaml to use the transformer variant?
Can you push the weights to the hub so I can test it out with an eval run? Can you please update the PR notes to include instructions on how to run the eval, and what results I should expect? And can you confirm that the success rate is the same as that reported on the paper and/or the same as you get when you run eval on the original DP repo?
Have you confirmed that you can train a model on LeRobot and achieve the same success rate as the ported weights? If so, can you pleas push a model to the hub and add instructions on how to evaluate that one?
Most of my in-line comments are around tidying up the code a little. I understand you've probably copied the code directly from the DP repo, but we'd like to make the code on LeRobot a little more accessible. This usually means tidying it up a bit by removing redundant code, adding comments, and using more self-explanatory variable names.

Thanks again!

lerobot/common/policies/diffusion/configuration_diffusion.py

@@ @@ -98,7 +98,7 @@ class DiffusionConfig: @@
                   # Inputs / output structure.
                   n_obs_steps: int = 2
-                  horizon: int = 16
+                  horizon: int = 10

Collaborator

alexander-soare Oct 28, 2024

I think this change should be reverted right? To keep in line with the default PushT policy.

lerobot/common/policies/diffusion/configuration_diffusion.py

@@ @@ -134,7 +134,7 @@ class DiffusionConfig: @@
                   down_dims: tuple[int, ...] = (512, 1024, 2048)
                   kernel_size: int = 5
                   n_groups: int = 8
-                  diffusion_step_embed_dim: int = 128
+                  diffusion_step_embed_dim: int = 256

Collaborator

alexander-soare Oct 28, 2024

I think this change should be reverted right? To keep in line with the default PushT policy.

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      self.config = config
+                      # compute number of tokens for main trunk and condition encoder
+                      if config.n_obs_steps is None:

Collaborator

alexander-soare Oct 28, 2024

config.n_obs_steps is None does not seem to be allowed according to the type hinting and documentation. So perhaps it doesn't make sense to handle it here, right?

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      if config.n_obs_steps is None:
+                          config.n_obs_steps = config.horizon
+                      t = config.horizon

Collaborator

alexander-soare Oct 28, 2024

Hmm, is it okay if we just leave this as config.horizon rather than binding it to another much less descriptive variable name?

lerobot/common/policies/diffusion/modeling_diffusion.py

+                          config.n_obs_steps = config.horizon
+                      t = config.horizon
+                      t_cond = 1

Collaborator

alexander-soare Oct 28, 2024

So what is t_cond? Could you please use a more descriptive variable name or if that's not appropriate, just leave a comment here?

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      output: (B,T,input_dim)
+                      """
+                      # 1. time
+                      timesteps = timestep

Collaborator

alexander-soare Oct 28, 2024

Any reason you assign another variable to the same object, and with such a similar name?

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      # 1. time
+                      timesteps = timestep
+                      batch_size = sample.shape[0]
+                      # broadcast to batch dimension in a way that's compatible with ONNX/Core ML

Collaborator

alexander-soare Oct 28, 2024

We don't need this at the moment.

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      input_emb = self.input_emb(sample)
+                      # encoder
+                      cond_embeddings = time_emb

Collaborator

alexander-soare Oct 28, 2024

nit: mind dropping this line and just putting time_emb into torch.cat instead?

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      x = self.drop(cond_embeddings + position_embeddings)
+                      x = self.encoder(x)
+                      memory = x
+                      # (B,T_cond,n_emb)

Collaborator

alexander-soare Oct 28, 2024

Can we make comments like this either in-line, or on the line preceding the line of code of concern? I think putting code then comment on the next line is rather unconventional.

lerobot/common/policies/diffusion/modeling_diffusion.py

+                      position_embeddings = self.cond_pos_emb[:, :tc, :]  # each position maps to a (learnable) vector
+                      x = self.drop(cond_embeddings + position_embeddings)
+                      x = self.encoder(x)
+                      memory = x

Collaborator

alexander-soare Oct 28, 2024

Do we really have to add another variable into the namespace here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet