Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image style When you select "Anime" in , cold blue shadows are emphasized due to the influence of PAG Guidance Scale. #9

Open
haruharu-1105 opened this issue Oct 25, 2024 · 6 comments

Comments

@haruharu-1105
Copy link

Your team is doing a great job.

If possible, it would be greatly appreciated if you could mention somewhere in your documentation that when the PAG (Perturbed-Attention Guidance) Guidance Scale value is high, the cold blue shadow color is emphasized when “Anime” is selected in Image Style.

  • Reason: When we tried the demo, we noticed that when the Guidance Scale setting in the PAG is set higher, the cold blue shadow color is emphasized. Also, since the extended setting is initially hidden and the PAG Guidance Scale value is 2, some users may not notice the effect of the setting value. We believe that if you could take this into consideration, it would make this a great application that is easy to use for many more people.

  • As a solution to the blue shadow,I personally do the following:

    • Set the PAG guidance scale value to 1.
    • Adds "warm tones" and "sunset hues" to prompts.
  • Below is the impact of each value on the PAG Guidance Scale.
    URL:https://sana-gen.mit.edu/
    Prompt:
    1girl, solo, brown long hair, looking at viewer, indoors, braid, blush, white sweater, brown hair, brown eyes, cable knit, blurry, long sleeves sweater, window, breasts, blurry background, turtleneck, aran sweater, upper body, parted lips, turtleneck sweater, depth of field
    Common:
    image
    image

Thank you for reading this far.

@lawrence-cj
Copy link
Collaborator

That’s a great observation. We set PAG to 1.5 or 2 to improve body structure and text rendering, while inadvertently overlooking its impact on other aspects. For users focused on face or other fidelity-oriented features, a PAG setting of 1 might be preferable for producing more natural-looking images. Based on your tests, do you have any suggestions for an optimal setting?

@haruharu-1105
Copy link
Author

haruharu-1105 commented Oct 25, 2024

First, I am a systems engineer, but not an expert in machine learning or image processing.

My personal suggestion is that “1” would be an appropriate value for the guidance scale for the animation style.
However, regarding my first issue, I believe that this is a problem that can be resolved in the documentation.

The following is just a suggestion.
We feel that the application developed by your teams has excellent features, such as very fast generation speed and high quality.

So, here is one idea that takes advantage of the “very fast generation speed” feature.
We think it would be effective to create two copies of the generated image with different values of guidance scale and take a survey using Gradio's flag function.
Reference link: Gradio flag function

This is because, while we currently have an excellent image generation function, we do not have a mechanism to collect feedback from users on the generated images, which we feel is a bit lacking as a demonstration function.

Of course, there are problems associated with the fact that respondents are anonymous, but I think it is possible to obtain statistically significant data.

@lawrence-cj
Copy link
Collaborator

lawrence-cj commented Oct 25, 2024

Really great advice. What kind of Flag is enough for a demo? Any template website?

@haruharu-1105
Copy link
Author

What kind of Flag is enough for a demo?

We are considering two patterns.
1, Flags for A/B testing (we will start here first)
Referring to https://imgsys.org/ flag, “left”, “tie”, and “right”.

2, Ideal flags (we will introduce these for product improvement purposes once the operational cycle is running smoothly)

Briefing materials for stakeholders.
Performance indicators (e.g., impact of DC-AE improvements). Metrics will include speed, aesthetics, prompt accuracy, typography, etc.

The choice of these indicators will have a significant impact on the direction of the product. For example, YouTube and TikTok are both video posting sites, but the indicators set are different, creating differences in product content.

@haruharu-1105
Copy link
Author

I'll emphasize what I'm trying to say, just to be clear.
The first issue is just that I would like to see it added to the documentation if possible.

At this time there is a lot of access to the demo, but access may be eased in the future with the release of code and models.

Development resources and time are limited, and the flag is only a nice-to-have level suggestion.

@lawrence-cj
Copy link
Collaborator

Appreciate that and good to know. Let me refine the demo website a bit first and add a document for guidance. Then try to make it improved once code released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants