Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finetuning Vista #4

Closed
hungdche opened this issue Jan 19, 2025 · 5 comments
Closed

Finetuning Vista #4

hungdche opened this issue Jan 19, 2025 · 5 comments

Comments

@hungdche
Copy link

Thank you for the excellent work! I am wondering if you are finetuning from the first phase of Vista, or from their final configuration.

@yunzhiy
Copy link
Contributor

yunzhiy commented Jan 20, 2025

We finetune from the final checkpoint, the cross attention layer used for action control can be directly ignored.

@hungdche
Copy link
Author

Thanks! By ignoring, do you remove it from their implementation, or just pass in a zero tensor?

@yunzhiy
Copy link
Contributor

yunzhiy commented Jan 20, 2025

I just remove these parameters.

@hungdche
Copy link
Author

Thank you! Final question. At the beginning of the training, did you experience similar issue as this, where the sampled images look like pure noise?

OpenDriveLab/Vista#13 (comment)

@yunzhiy
Copy link
Contributor

yunzhiy commented Jan 21, 2025

I haven't met issues like this. I think maybe you can try setting use_ema to TRUE in the config file.

@yunzhiy yunzhiy closed this as completed Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants