Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unseen characters #6

Open
luisaocl opened this issue Jan 7, 2025 · 3 comments
Open

Unseen characters #6

luisaocl opened this issue Jan 7, 2025 · 3 comments

Comments

@luisaocl
Copy link

luisaocl commented Jan 7, 2025

Hello! Do I need to train the model on all possible combinations of subjects that i may want to represent? Also in case of already independenlty trained LoRAs? Thank you!

@petergerten
Copy link

The biggest disadvantage of MuDi / Dreambooth compares to other methods that do not require finetuning seems that a lot of examples are required. Methods like OneActor (https://johnneywang.github.io/OneActor-webpage/) get away with one single image.

If someone figure out a way to get high quality results incl. adaptation to unseen combinations with one image using MuDI, I would be very interested.

@agwmon
Copy link
Owner

agwmon commented Jan 8, 2025

Thank you for your interest in our work!

@luisaocl As you mentioned, test-time training methods like DreamBooth have the major drawback of requiring training for each subject individually, and our method also requires training for combinations of subjects. However, as shown in Figure 38 of our paper, training multiple subjects into a single LoRA (though time-consuming) eliminates the need to consider every combination separately.

Additionally, in Section 5.4 of the paper, we propose a method to utilize independently trained LoRAs, so please refer to that as well.

@petergerten Regarding the issue of MuDI(DreamBooth) requiring multiple data points, we have observed that FLUX achieves very high generalization performance even with a single image. We also believe that this issue can be addressed indirectly through various other studies by using synthetic references. (https://arxiv.org/abs/2403.14987v2, https://arxiv.org/abs/2306.00983)

@petergerten
Copy link

@agwmon thanks!

I tried several training runs with the example parameters and see strong over fitting when using one image and FLUX. While there is still prompt adherence, the characters become "locked" in the posture of the one reference image. I will look into creating synthetic references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants