-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data augmentation strategies #11
Comments
yup sounds good! just put all the functions into one file, say |
@fire scale and rotation will go a long way |
I have to go for now. See def augment_mesh(self, base_mesh, augment_count, augment_idx): Edited: removed seed |
@lucidrains Can you post something for me to extract the resulting mesh from the autoencoder? |
You mentioned the topic of overfitting as a first step. I added the Blender monkey as a validation of mesh input through an autoencoder as an initial step. I want send another monkey to the autoencoder and get the same monkey out again. How do I do that? |
I was able to train a 1 step that outputs garbage glb 🎉 |
I have been using Marcus provided Notebook file to try that, I am also getting bad obj results. I am going to try the latest @lucidrains changes tomorrow in this notebook, maybe you can try, give a look; or maybe you might be ahead of what I am using. 😆 Thanks! |
Just for testing purposes; give it a go without the data augment. When I have been successful, the encoder loss was less 0.200- 0.250 and the loss for the transformer was around 0.00007. Here is some details from the paper, they only use scalar and jitter-shift. |
I am currently at:
So maybe I can dream about 0.200 - 0.250 loss. |
How many steps are that at? I require about 2000 steps since 200 x10 epochs = 2000. Try only doing scalar and see, probably will go better. You can give it a go with my forked version @ https://github.com/MarcusLoppe/meshgpt-pytorch/tree/main The data MeshDataset expect is a array of:
|
These are my current settings which is 200 steps. The outlined is the output mesh. You can see my code in the pull request.
|
You are right that I should ensure that we're in unit square distance and do less augmentations though. |
I think that generating two objects are causing some issues, try using a singular box. I tried your s_bed_full.glb file and the result was pretty good, it's not so smooth. Probably better result with data augmentation. The right side is the generated one. |
https://imgsli.com/ is very good for image comparisons. |
Writing down an idea. It should be possible to go over the 10 million 3d item set and find a small set of items in a small set of classes similar to the paper and label them manually (like via path name). |
Training 10 million might be overkill and going over 28 000 shapes might cost a bit to much $$$. Renting A100 at 0.79$ per hour: However H100 promises good performance but at like 2-3$ an hour.
Seems pretty good, but probably not for 3D models |
I can't use shapenet, but I'm sure we can find 10 class of 100 models like Shapenet in that 10 million dataset. |
I think it's fine, there are many free sources, the trouble might be finding a dataset with descriptions. But after the model is trained the issue the inference will be a big issue for users, if it's going to generate complex 3D models, it might not work on consumer hardware. But the recent performance boost is a good sign that the performance and effective is on the right track. https://github.com/timzhang642/3D-Machine-Learning#3d_models |
How many examples/steps of the same 3d mesh did you train it on? I trained for 10-20 epochs @ 2000 examples and got 0.19 loss. I was able to generate a pretty good 3d mesh, it's not as smooth but very good result for such small amount of training data. 3D mesh: |
I was using the wrong strategy. You were using many same copies of the mesh and then some augments. I was doing the opposite. |
I might have worded that badly but no, I'm using the same model without any augmentations. |
Here's what I interpreted it.
You were doing 2000 (same) * 1 * 1. I was trying 1 * 2000 (agumented) * 1. Thanks for telling me! I'm trying your suggestion. |
No problem, I posted this in another issue but I think this might help you; according to the paper they sort the vertices in z-y-x order. Also, I'm current training on like 6 3d mesh chairs. Each chair has 1500 examples, but it have 3 augmentation version . The total is 12 000 examples. To give you some type of idea of why you need to train for 2 days on two A100, watch how slow the progress is (33 minutes running):
|
#11 (comment) was the verification of z-y-x order and sort the faces as per their lowest vertex index. Note that I am using the convention that gives me that result like Y-Z-X, but it followed their requirement of sorted vertically. |
@MarcusLoppe on your branch, can you add a feature that on the first quit I save, on the second quit quit. Then, we can restart from a checkpoint. |
Oh, great :) One other tip might be to normalize the size and set everything on the ground. I'm limiting the size since I'm current training on a few different chairs and some of the chairs where huge like a building while others where "normal" size.
I don't understand, can you clarify? |
This is my current result. I'll retype the last message in a bit. output.log See also https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/2fkwahjc/overview |
I see that the dataset size is 10, for training effective I just duplicate the one model x2000 times since it can train faster I think when dealing with bigger loads. The learning rate seems bit high, for the encoder i used 1e-3 (0.001) and for the transformer i used 1e-2 (0.01). |
https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/9b8k9mfc/overview?workspace=user-ernest-lee I have some bugs, but this is really promising. I had to recode my face index asc regularization strategy. The clipped ears is the meshgpt. |
Instead of duplicating the model, I multiply the epoch by n, but according to the graph the training flattens so I stop early. |
That seems very good, I see that you increased the num_discrete_coors to 256. Did that help? Seems like that would smooth out the errors/give it a higher error margin so even if it's wrong it looks smoother. What kind of augmentation are you doing? Are you applying all the augmentations including the rotation? Is there any reason why you are adding 2 extra tokens as padding?
|
This comment was marked as outdated.
This comment was marked as outdated.
The generated tokens length needs to be a multiple of 3. |
To be honest I think this only affects the quantization loss on the discretionary of the mesh vertex positions. I don't think it matters, but I haven't tested it off. |
It should be 6 since 1 face = 6 tokens.
It should make it smoother since if it guesses wrong class of 128 vs 256 classes; the step values might be 0.20 vs 0.10, the 0.1 error will be less visible. |
Training a single mesh seems to be going pretty good/solved, have you tried using the texts & multiple meshes? I'm guess that you resolved the issue with the mesh get cut off? I just scale it to fit -0.95 to +0.95, seems like there are some issues when the mesh gets above at 1.0. Also; I was granted access to the shapenet v2 dataset on huggingface, you can probably get access as well. |
I was able to train the transformer to use 1172 faces. mesh_transforms_humanoid_avatar.zip I respect the MIT, Apache-2 and cc-by licenses and so have a reason to not use shapenet. https://wandb.ai/ernest-lee/meshgpt-pytorch/runs/rp8nbw7w?workspace=user-ernest-lee Some logs. Duration: 1h 13m 26s upbeat-waterfall-437-618dbfb6d54f78d191f293a55a0c9e7a41147541.json |
I want to do after a break. Any suggestions? I was thinking of having one human be in multiple poses, but different objects is doable too. |
I think it's fine to train while testing since it's not for any commercial purpose but pure testing that won't be touched by anyone else. One benefit of using shapenet is they got nice labels and not just category's like "chair", examples:
Yes, use very low faces mesh since using text to encode makes the training much harder. Using a dataset of 2 chairs with 5000 examples (2 meshes, 5 augmentations x 500) |
@MarcusLoppe I'm pretty sure you can use blip to categorize photos of the mesh so that's not a blocker. https://replicate.com/gfodor/instructblip |
Someone wanted me to try https://www.kenney.nl/assets/castle-kit. So I'll need to generate labels for them, but it should work. |
Well the downside is that you'll use blender to take a screenshot with a default camera and since models vary with the orientation/vertical axis you might take a snapshot at back/below of the object.
Try walking before running :) I've been trying to tell that you need massive amount of data and training time to actually create a good enough model for that. I've been successful on over fitting it using text + 1 single model for around 40 epochs at 2000 examples per epoch. If you want to give it a go, use only the models with less then 500-600 faces and then create 10-20 augmentations per model, then duplicate each variation 200 times. Then train on this for a day or two and then try to generate using the texts. In the PolyGen and MeshGPT paper they stress that they didn't have enough training data and used only 28 000 mesh models. In the paper they used 28 000 3d models, lets say they generate 10 augmentations per each model and then used 10 duplicates since the it's more effective to train a model with big batch size of 64 and when you are using a small number of models per dataset it will not train effectively and you will waste parallelism of GPUs. I want to stress this: |
From @MarcusLoppe
|
@MarcusLoppe what sort of limits are you getting on your triangle count? I think mine is around 1349 triangles per mesh. |
I haven't bothered with such large meshes due to hardware constraints. |
What happens when you go above it? Is it the VRAM or the transformer get stuck at a loss? If so; have you tried raising the dim to 768 or 1024?
I don't think jitter is related to that, it talks about jitter but then switches the topic to planar decimation, e.g simplify the training mesh while having it look the same as before. |
I had an avatar https://booth.pm/en/items/4861008 and I wanted to use it so was trying to optimize. (mirror https://github.com/V-Sekai-fire/SK_faavrs_breadbread) Was around 15_023 triangles, and I don't think it's reasonable for people to pay for 48 gb gpus. |
@fire Did you load the models without training them and tried to generate a model and see what the inference VRAM requirement was? @lucidrains |
In #6
For each mesh I generate augments_per_item (like 200), then I use it to index into the dataset.
Using a seed I augment using this strategy.
What do you think?
The goal is for a chair item to be rotated, moved or scaled, but upright.
Edited:
The idea is to have a chair be displaced but under gravity so it keeps its lowest vertex position.
The text was updated successfully, but these errors were encountered: