Skip to content
hyppyhyppo edited this page Jan 1, 2024 · 18 revisions
  • Where can I put training photo dataset ?

In the concepts tab. First add a config then add a concept to this config. When you click on the concept it open a window where you specify the path to your dataset. You can define several concepts for a config.

  • Where can I put the trigger word ?

The trigger word for embeddings is simply <mebedding>. For LoRA, trigger words don't exist. The LoRA will learn from your captions. For your sample use a prompt similar to the training captions. For embeddings, use <embedding> as a placeholder. LoRA: photo of tom cruise Embedding: photo of a <embedding>

  • [Embedding] How shall I use <embedding> when training embeddings ?

For embeddings, the trigger word is the embedding name, if it is TomCruise:

Do you add <TomCruise> to captions or <embedding> ? Answer: <embedding>

Do you add <TomCruise> or TomCruise to embedding-tab -> Initial embedding text? Answer: a brief description of your subject to help your embedding to train faster, so just "*" or "man" or "short man" ...

Do you add <TomCruise> or TomCruise or <embedding> to sampling-tab? Answer: <embedding>

Note: TomCruise is only set as the output model name: TomCruise.safetensors.

  • [Embedding] How to set the initial embedding text ?

In short, the initial embedding text should describe your subject but should not exceed in tokens the embedding token count.

Here is an explanation. Let's take your example prompt of "photograph of <embedding> with", and (for simplicity) assume every word is encoded into a single token. This could result in the following token IDs: [1, 2, ..., 3], where "..." is the place your embedding is inserted. if you have a 3 token embedding, it would be for example [1, 2, 100, 101, 102, 3].

Now let's say you set an init text of "blond woman wearing a pink hat". that's 6 tokens. but the embedding only supports 3 tokens, so only the first 3 (of the 6) tokens are actually used. the rest is truncated.

This also goes the other way around. if you only supply a shorter text (like "blond woman"), it doesn't know what to use as the third token. in OneTrainer, tokens are padded with the " * " token, so "blond woman" becomes "blond woman *".

  • Are embeddings comparable to LoRAs in terms of quality/flexibility?

Embedding (Textual Inversion) works backward to find the tokens that matche your images, so it can only learn things that the base model already knows. For that reason they can be very flexible. They are good for persons because the model has seen a lot of different people already. But it can't learn something completely new. Their size is just a few KB and don't require many images, 15-30 images are plenty enough, even a minimalistic dataset (~ 6 images) can work. Also note that they can be used with LoRAs.

LoRA add a set of weights to the model. Their size is between 50Mb and ~200Mb+ and are usually used for persons, style, poses and image composition. Depending on what you're training you can make them more flexible by adding images with different poses/compositions ... They can be trained with a small dataset like 20 images but for flexibility you'll need more.

  • How much VRAM do I need for training ?

It depends on what you're training and the parameters. SD1.5 LoRA or embedding with 512 resolution can be trained with just 6Gb. But for most things you'll need 10-12 Gb. For SDXL fine tuning, 24Gb is not really enough.