Skip to content
hyppyhyppo edited this page Mar 7, 2024 · 18 revisions
  • Where can I put training photo dataset ?

In the concepts tab. First add a config then add a concept to this config. When you click on the concept it open a window where you specify the path to your dataset. You can define several concepts for a config.

  • Where can I put the trigger word ?

The trigger word for embeddings is simply <embedding>. For LoRA, trigger words don't exist. The LoRA will learn from your captions. For your sample use a prompt similar to the training captions. For embeddings, use <embedding> as a placeholder. LoRA: photo of tom cruise Embedding: photo of a <embedding>

  • [Embedding] How shall I use <embedding> when training embeddings ?

For embeddings, the trigger word is the embedding name, if it is TomCruise:

Do you add <TomCruise> to captions or <embedding> ? Answer: <embedding>

Do you add <TomCruise> or TomCruise to embedding-tab -> Initial embedding text? Answer: a brief description of your subject to help your embedding to train faster, so just "*" or "man" or "short man" ...

Do you add <TomCruise> or TomCruise or <embedding> to sampling-tab? Answer: <embedding>

Note: TomCruise is only set as the output model name: TomCruise.safetensors.

  • [Embedding] How to set the initial embedding text ?

In short, the initial embedding text should describe your subject but should not exceed in tokens the embedding token count.

Here is an explanation. Let's take your example prompt of "photograph of <embedding> with", and (for simplicity) assume every word is encoded into a single token. This could result in the following token IDs: [1, 2, ..., 3], where "..." is the place your embedding is inserted. if you have a 3 token embedding, it would be for example [1, 2, 100, 101, 102, 3].

Now let's say you set an init text of "blond woman wearing a pink hat". that's 6 tokens. but the embedding only supports 3 tokens, so only the first 3 (of the 6) tokens are actually used. the rest is truncated.

This also goes the other way around. if you only supply a shorter text (like "blond woman"), it doesn't know what to use as the third token. in OneTrainer, tokens are padded with the " * " token, so "blond woman" becomes "blond woman *".

  • Are embeddings comparable to LoRAs in terms of quality/flexibility?

Embedding (Textual Inversion) works backward to find the tokens that matche your images, so it can only learn things that the base model already knows. For that reason they can be very flexible. They are good for persons because the model has seen a lot of different people already. But it can't learn something completely new. They don't require many images, 15-30 images are plenty enough, even a minimalistic dataset (~ 6 images) can work. Also note that they can be used with LoRAs.

LoRA add a set of weights to the model. They are usually used for persons, style, poses and image composition. Depending on what you're training you can make them more flexible by adding images with different poses/compositions ... They can be trained with a small dataset like 20 images but for flexibility you'll need more.

  • How much VRAM do I need for training ?

It depends on what you're training and the parameters. SD1.5 LoRA or embedding with 512 resolution can be trained with just 6Gb. But for most things you'll need 10-12 Gb. For SDXL fine tuning, 24Gb is not really enough.

  • I really like to use class pics for my trained loras like in Kohya.

Class images are kind of a made up concept. They are just regular images that are trained without complicated captions, and usually at a lower weight. You can easily do this in OneTrainer by adding them as a new concept, settings the repeats lower, and maybe reducing the loss weight a bit. For the captions, select "from single file", and put your class prompt in that file