-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prior model preservation #505
base: master
Are you sure you want to change the base?
Conversation
@dxqbYD can you add examples? your examples are great. even though i couldnt make it work maybe after properly implemented it will work :D so examples of comparison and how you did setup your concepts |
samples can be found in these release notes of SimpleTuner: https://www.reddit.com/r/StableDiffusion/comments/1g2i13s/simpletuner_v112_now_with_masked_loss_training/ |
kohya implementation: kohya-ss/sd-scripts#1710 |
This sounds like a really good idea to add as an option. But it definitely needs a more generic implementation. There are two issues to solve DatasetHow do we select the regularization samples during training? This also needs to work with a higher batch size than 1. Ideally it would mix regularization samples and normal training samples within the same batch. Unhooking the LoRAEach model has different sub-modules. So we need a generic method of disabling the LoRA for the prior result. A function in the model class to enable/disable all LoRAs could work well. |
how do you intend on mixing regularisation and training samples in a single batch @Nerogar ? that seems like not trivial. the actual target is changed. |
The only difference between prior preservation and normal training is the prediction target. So what I would do is basically this:
|
yes, unfortunately it just doesn't have the same regularisation effect to do it that way. having an entire batch pull back toward the model works. |
what are you basing this on? what Nerogar describes above is what kohya has implemented. So if true, that would mean kohya's implementation doesn't work (as well) |
basing it on numerous tests we've run on a cluster of H100s over the last week |
It isn't obvious that this would work without captions, but it does. You can see samples in the reddit link above. The right-most column is without captions.
Yes, agreed. There are more use cases than captions in favor of having it as a separate concept, for example balancing the regularisation using the number of repeats. In some of my tests, 1:1 was too much. @bghira has also found using his implementation in SimpleTuner that even though it works with no external data, it works better against high-quality external data. |
okay thanks. any theory on why that would be? I don't see a theoretical reason for your finding that it works better on a separate batch: |
Could you please provide some evidence of this? I.e a significant enough amount of samples that your aren’t falling victim to seed rng it’s important to get this right |
if this turns out to be right, I'd recommend to implement a feature into the OT concepts like It would influence how the batches are built, and the first option would be how ST builds batches. This could be a useful feature on its own. For example, if you train 2 concepts, it can be beneficial to have 1 image of each concept in a batch, instead of the same concept twice, especially if the images in a concept are very similar. |
i dont have time, sorry, do it however works best for your codebase. |
any update on this @dxqbYD |
nothing usable for OneTrainer users yet. I should mention that there was apparently a paper published proposing this technique in April of this year, I just didn't know about it: https://arxiv.org/pdf/2404.07554 |
@dxqbYD so we have it in kohya atm? i couldnt find |
This code can be used to preserve the prior model on prompts other than the trained captions. After several more tests I think this is worth implementing and a quite generic feature:
Let me know if I should provide more details here, which you can currently find on the OT discord.
There is a feature request for SimpleTuner here: bghira/SimpleTuner#1031
This is a draft PR only to determine the interest for a full PR. It only works with batch size one, only for Flux, only for LoRA, and only for transformer.
It could be implemented generically for all LoRA. With major effort, it could be implemented for Full Finetune, but to avoid having the full model in VRAM twice, pre-generation of reg steps predictions would be necessary.