Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training error #2

Open
HackerHuangZY opened this issue May 5, 2024 · 17 comments
Open

training error #2

HackerHuangZY opened this issue May 5, 2024 · 17 comments

Comments

@HackerHuangZY
Copy link

HackerHuangZY commented May 5, 2024

AttributeError: 'HandleControlledSequence' object has no attribute 'L' how to prefix it . I'm looking forward for your reply

@PeizhuoLi
Copy link
Owner

Hi, could you provide a more information? For example, what options did you use for training and in which file at which line of the code did this error happen.

@HackerHuangZY
Copy link
Author

I use the command python train_frame_based.py --save_path=./pre-trained/vtotrain --multiple_dataset=./dataset/sequence_lists/vto-training-example.txt to retrain your model.

it returns the error

QQ截图20240507095519

@HackerHuangZY
Copy link
Author

By the way, using different folders in vto datasetsets, it returns diferent errror. Seemingly that the " if cfg.use_jacobian:" in Handle_dataset.py at line 506 in HandleControlledSequence class is not called.How to activate the parameter cfg. Thanks for your help and your selfless open sourse contribution

@PeizhuoLi
Copy link
Owner

Hi, thanks for the info. I updated option.py to activate use_jacobian by default. Alternatively, you may also add an option --use_jacobian=1 when starting the training script.

@HackerHuangZY
Copy link
Author

Thank you,I have already revised the parameter in option.py before your reply, but there has new problem
39385c35adcfd9a7ef3d423f0888a21d where is the pose

@PeizhuoLi
Copy link
Owner

Hi, body_pos is only used in an early interation and is no longer used in the final version.

@HackerHuangZY
Copy link
Author

thank you.

@HackerHuangZY
Copy link
Author

What's more , how can i find the default data /data/batch1/moving1_topo1_Cotton?

@PeizhuoLi
Copy link
Owner

This is a data used only in the early stage of the project with a different representation and setting. We don't have plan on releasing it.

@NPC1079
Copy link

NPC1079 commented Oct 21, 2024

Thanks for this impressive results and work!!!

During training , i set "ddp=8'' , the server is always Full memory ,then training is stoped for "SIGABRT " ,but , if i set the ddp =0 , the program is fine . I think the problem is that you said "Although the dataset is preprocessed, additional calculation is carried out beforehand the training starts. " so can i cross '' additional calculation '' . I'm looking forward for your reply
image
image

@NPC1079
Copy link

NPC1079 commented Oct 21, 2024

add:: the memory is gradually fill up

@PeizhuoLi
Copy link
Owner

Hi, thanks for the question. DDP will only work properly if all the meshes contain the same number of triangles, and it's always not the case when multiple meshes are included in the training. In general, DDP should not be used for this project, and it is only there for very special cases as a proof of concept to explore the gain of using DDP during our development - which turns out to be not worth the effort.

@NPC1079
Copy link

NPC1079 commented Nov 18, 2024

thanks, DDP is already runing , but the performance of train is not similar to your show , my training result is too bad. the cloth of after training ,it don' t have real colth performance , not even the GT data looks good, during the training i just use vto dress data to train ,not use thsirt data , because ,if i use two type data ,the ddp can't use. and the training time , it spend about one week, can you give me some solution to have a good performance .
image

@PeizhuoLi
Copy link
Owner

PeizhuoLi commented Nov 18, 2024

Hi, I would recommend you to run the training without DDP, on a single GPU. Because DDP is only there for an early stage proof of concept, and we didn't test if it is compatible with our latest network implementation. A more memory and computational efficient implementation will be used without DDP. Training without DDP on an RTX 3090 for 48 hours would give you the same result as in the demo video. Hope this can be helpful.

@NPC1079
Copy link

NPC1079 commented Nov 19, 2024

thanks, I would like to confirm that gt.pkl is the data before model processing , while prediction.pkl is the final data generated by the model

@PeizhuoLi
Copy link
Owner

Hi, gt.pkl is the result from ground-truth simulation algorithm, and prediction.pkl is the prediction from our neural network.

@NPC1079
Copy link

NPC1079 commented Dec 11, 2024

hi,could you provide a complete project roadmap, i think my roadmap is somthing wrong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants