Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best Settings, anyone? #7

Open
scofano opened this issue Nov 22, 2024 · 5 comments
Open

Best Settings, anyone? #7

scofano opened this issue Nov 22, 2024 · 5 comments

Comments

@scofano
Copy link

scofano commented Nov 22, 2024

Did anyone try different size vs steps vs length combinations to find the best one?

@DmytroSokhach
Copy link

DmytroSokhach commented Nov 23, 2024

I don't think there is "best settings" for such complex things as videos, but rather "strategies for better results".
I noticed that extensive prompt is very important, should be very detailed. ("pan camera will just not work")

What I feel would be needed:

  1. We need min-max recommendations (scheduler, steps, sigmas explained)
  2. Prompting guide
  3. (optional) Prompting assistant pre-setup for LLMs like OpenAI, so it would extend smaller prompts to more extensive details.

@botslop
Copy link

botslop commented Nov 25, 2024

While we're on the topic of caption generation, you could also hook up something like comfyui-ollama to help create descriptions for img2vid. What I would like to do is set up folder batching to handle multiple images at once while saving metadata (like the noise seed), in case I want to go back and tweak a usable candidate. The noise seed seems to play a massive role in the quality of generation or whether or not it focuses on the correct subject.
Right now, my current struggles are just understanding how much sampler cfg and scheduler help/hinder the final output, and what values we should focus on in there. It still feels a bit random to me, even with a fixed seed.

@scofano This is a good find, thank you. I'm going to pipe this into llama vision and see how that fares.

@scofano
Copy link
Author

scofano commented Nov 25, 2024

@botslop For descriptions, I use florence2. Then you can use the description of the image as context and feed ollama with it. In case you are not using it already: https://github.com/nkchocoai/ComfyUI-SaveImageWithMetaData

Right now, my current struggles are just understanding how much sampler cfg and scheduler help/hinder the final output, and what values we should focus on in there. It still feels a bit random to me, even with a fixed seed.

Yes, I did not figure out the settings yet. Most of my generations are either still images or just noise. I tried different settings but so far, no luck (that's why I created this thread).

@botslop
Copy link

botslop commented Nov 26, 2024

Just trying to put the pieces together here. Looks like someone on YouTube made a setup video. While the video itself doesn't go into too much detail on how to tweak the scheduler values, a comment from andro-meta provided some good insight.

A little explanation on base shift and max shift for anyone looking to play with that:

Base shift is a small, consistent adjustment that stabilizes the image generation process, while max shift is the maximum allowable change to the latent vectors, preventing extreme deviations in the output. Together, these parameters balance stability and flexibility in image generation.

Using a bird as an example:

Increasing Base Shift: Raising the base shift results in a more consistent and stable depiction of the bird. For instance, the image might consistently show a bird with clear, well-defined features such as a distinct beak, feathers, and wings. However, this increased stability could lead to less variation, making the bird’s appearance feel repetitive or overly uniform.

Decreasing Base Shift: Lowering the base shift allows for more subtle variations and intricate details, like nuanced patterns in the bird’s feathers or unique postures. However, this added variability might make the bird’s image less stable, with occasional irregularities or minor distortions.

Increasing Max Shift: A higher max shift enables the model to explore the latent space more freely, leading to creative or exaggerated interpretations of the bird. For example, the bird might develop surreal colors, elongated wings, or fantastical plumage, but it risks straying far from a realistic bird representation.

Decreasing Max Shift: Reducing the max shift tightly constrains the model, resulting in a more controlled and realistic depiction of the bird. The image is likely to stay close to a conventional bird appearance, but it might lack creative or distinctive elements that make the bird unique or captivating.

As a general strategy, you could start with generating several candidates, and look for a seed that captures the motion you're looking for (even if it's distorted, it might pick up on important elements you're looking for in the final result, like animating the subject instead of the background).
Then, you could inch the base-shift/max-shift on the fixed seed. If the image is still exploding and unable to maintain the subject form, you might want to increase base shift and decrease max shift. I've also found disabling stretch can sometimes help improve clarity depending on the image.
I have yet to test this myself but seems reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@DmytroSokhach @scofano @botslop and others