Token dropout/limits #250

yggdrasil75 · 2024-04-09T11:19:58Z

yggdrasil75
Apr 9, 2024

I have some captions with over 1000 tokens so that I be extremely specific in my prompts and get highly detailed results (same reason I am training at 1536-2048). I was wondering about this though: I know that stable diffusion is always trained on ~75 tokens, and its not a model limit but a training issue for prompts to not be higher quality with more descriptions.

there is the option for "keep tag count" which just doesnt shuffle the first n tags when shuffle is enabled (based on description), but does the tool do anything else automatically to the caption? does it drop the end of the prompt? Can it be made to where the prompt length is randomized, ie: sometimes 75 tokens, sometimes 150, 225, 300, etc for a similar reason to the resolution override?

Answered by O-J1

Oct 21, 2024

I have some captions with over 1000 tokens so that I be extremely specific in my prompts and get highly detailed results (same reason I am training at 1536-2048). I was wondering about this though: I know that stable diffusion is always trained on ~75 tokens, and its not a model limit but a training issue for prompts to not be higher quality with more descriptions.

there is the option for "keep tag count" which just doesnt shuffle the first n tags when shuffle is enabled (based on description), but does the tool do anything else automatically to the caption? does it drop the end of the prompt? Can it be made to where the prompt length is randomized, ie: sometimes 75 tokens, sometimes 150…

View full answer

O-J1 · 2024-10-21T16:23:17Z

O-J1
Oct 21, 2024
Collaborator

I have some captions with over 1000 tokens so that I be extremely specific in my prompts and get highly detailed results (same reason I am training at 1536-2048). I was wondering about this though: I know that stable diffusion is always trained on ~75 tokens, and its not a model limit but a training issue for prompts to not be higher quality with more descriptions.

there is the option for "keep tag count" which just doesnt shuffle the first n tags when shuffle is enabled (based on description), but does the tool do anything else automatically to the caption? does it drop the end of the prompt? Can it be made to where the prompt length is randomized, ie: sometimes 75 tokens, sometimes 150, 225, 300, etc for a similar reason to the resolution override?

This is in fact a model limit. CLIP is limited to 75 tokens. Any solution you see using the exact same model is a hack. Only changing the text encoder truly solves this. The only model that isn’t limited to 75 is PixArt Sigma (it’s 120).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token dropout/limits #250

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Token dropout/limits #250

yggdrasil75 Apr 9, 2024

Replies: 1 comment

O-J1 Oct 21, 2024 Collaborator

yggdrasil75
Apr 9, 2024

O-J1
Oct 21, 2024
Collaborator