-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question #57
Comments
@nicolasdonati there is a way to do infilling (what you are describing with the first bullet point) with autoregressive transformers, devised by openai themselves. however, i don't expect it to work as well as denoising diffusion, masked denoising, or other NAR methods as for your second, maybe you can consult Marcus. how much experience do you have training transformers? |
Correct, the MeshGPT did this due it's how Polygen did it which was created by deepmind so there are probably some reasoning behind it.
One thing to keep in mind is that the MeshGPT paper fine-tuned on e.g chairs and then used that chair variant to generate chairs but couldn't generate tables. My advice is to take a look of the actual decoded result that the autoencoder generates with the code below, the transformer can perform well if it's vocab doesnt make sense. Also make the transformer a bit bigger using 512/768 dim and set attn_depth to 24.
|
Hi guys ! Many thanks for all the help and tips :) |
You can take a look at my demo notebook if you are having some trouble. You can see how to pre-process the mesh in the first code block (with function get_mesh). But I guess that step 1 is to check the autoencoder output, you can use the code below if you are not using my fork of this repo.
|
@MarcusLoppe Hi, so I had a problem, I had 260 models (less than 1000 faces), augmented them 100 times, my encoder loss on 100 epochs reaches 1.57, while my transformer reaches around 3 on 25 epochs. Can you tell me a little about loss? Is it necessary that it gets below 1 or depends on dataset? Also does the batch size has any effect on this? Thanks |
Hi again ! So I tried some things:
|
Have you tried feeding it a prompt of tokens when generating? Usually it helps with just a few tokens and it will kick start the mesh generating in the correct direction. You can then try to train a transformer on just the first 60 tokens of each mesh and let that kick start the generation. Also what are your model specifications for the transformer?
Feeding tokens:
|
@adeerBB By increasing the encoder & decoder and the codebook dim size it will allow it describe the tokens bit better. Using just 64 dim for a vector might be too low, I increased it so it uses 128x3 dim size per codebook entry. During training for the autoencoder I use a 64 batch size (using 2-4 grad_accum_every until it reaches 1.5 in loss), this will promote generalization about shapes.
|
@MarcusLoppe Sorry for the late response, thanks, I'll give a go. |
@MarcusLoppe Hi, so I had a question, have you tried transfer learning for this? For example if I can train my model for objaverse, can I then fine-tune the model for my specific set of data? This way we can gain the shape generation from a large dataset. |
Hi everyone,
I had some questions about this method:
I hope people can help :)
Have a great day,
The text was updated successfully, but these errors were encountered: