First, some general notes:
- There's Models, Refiners, ControlNets, LoRas, Extras, ... all of these need to be compatible to one another! While the SDXL models look really great and 1024x1024 images are significantly sharper, the GPUs on
basegpu1
simply don't have enough memory. I could only use these models on my laptop when using CPU offloading, because I have enough RAM; and even then I used something north of 20 GB. So stick to SD1.5 for now; most available addon models also use this as a base.- Models and refiners are the base of the image generation. Refiners add more detail. There seem to be actual technical differences, as the remaining models need to be compatible. The base model can influence the output to some extent, so pick "DreamShaper" for artistic and "RealisticVision" for photographic output.
- ControlNets detect features in the original image and add constraints to the output, like depth field, pose and face expression or general features via edge detection. Often these work best in this combination. But tone down the canny model a bit to avoid artifacts due to bad photos. A good explanation of each is found in the GitHub repo, this guide and YouTube video.
- LoRas are still new to me and I'm not sure at what stage exactly they come into play .. but they can heavily influence the output of the model! For example, I like the NeoTokyo LoRa a lot. I had rather bad results with the Claymation model but the images on Reddit look amazing.
- Workflow-wise it's easier to use the
img2img
tab and bump up the noise to basically remove almost all of the original image. This way, all the ControlNets automatically use the same input image for detection. - Of course, there's a ready product of this concept already. But we're not interested in using that directly. Their gallery is a nice reference for some template styles though.
- In extreme cases you could use the
rembg
extra to remove background from an image first. This doesn't always work cleanly either and I have not quite figured out how to hallucinate a nice background instead of boring white. But it's certainly easier than trying to use the sketch brush. - Check r/StableDiffusion for lots of inspiration and nice models.
Extensions | URL |
---|---|
ControlNets | https://github.com/Mikubill/sd-webui-controlnet.git |
Refiner Not really needed for SD1.5 |
https://github.com/wcde/sd-webui-refiner.git |
Remove Background in Extras | https://github.com/AUTOMATIC1111/stable-diffusion-webui-rembg |
Show JSON Payload | https://github.com/huchenlei/sd-webui-api-payload-display |
You should enable larger cache for ControlNet models! Otherwise they need to be reloaded for every API call, which takes unecessarily long time.
- As much as I'd like to not dip into the whole gender debate ... even a rather clearly male input image can result in feminine looking output if you just use a generic term like "person". So maybe give a choice of "male / female / person"?
- Avoid artifacts in the base image! Maybe take off your glasses. Check for bright spots and reflections. Too much "softedge" control may transfer more artifacts but also generally produces closer-looking persons.
- Put emotions in the prompt! Just putting in "happy male" with the KIDS LoRa had a really drastic effect on facial expression!
- Use negative prompts for things the AI likes to do: "open mouth" and weird "beard".
- Should be DIY; I don't want to sit there all evening.
- Process:
- Take a photo with short self-timer; don't ever show the true photo? This generates a unique directory for the output
- Select style, settings and keywords; maybe a pre-made background to insert into
- Click "GENERATE" (multiple times?)
- View the generated pictures with a QR Code on local server What kind of WiFi network do we have?
- Choices:
- Style: the overall style, i.e. "Anime", "Kids Cartoon", "Watercolor", "Vaporwave", ... this sets a template of settings such as model and LoRa
- Character: choose your output gender, age, facial expression
Keywords: pick from a selection of pre-made phrases like "on fire", ...(determined by style)- Controls: toggle controlnets (with visual explanation) for OpenPose, Depth and SoftEdges
- Background: fix the problem of boring backgrounds by selecting a pre-made one and placing humans on it with
rembg
extra as pipeline step
Each combination of style and LoRa requires some specific keywords and settings, so I should collect those and build some templates here.
Model: use DreamShaper + NEOTOKIO
neotokyo, 90s anime, drawing of a young man, portrait <lora:NEOTOKIO_V0.01:1>
CFG scale: 7, Model: dreamshaper8Pruned.hz5Q, Denoising strength: 0.95 ControlNet 0: "Module: openpose_full, Model: control_v11p_sd15_openpose [cab727d4], Weight: 0.8, Control Mode: Balanced" ControlNet 1: "Module: depth_zoe, Model: control_v11f1p_sd15_depth [cfd03158], Weight: 1, Control Mode: Balanced" ControlNet 2: "Module: softedge_pidinet, Model: control_v11p_sd15_softedge [a8575a2a], Weight: 0.5, Control Mode: Balanced" Lora hashes: "NEOTOKIO_V0.01: c4c6ad6be466"
Use DreamShaper + CoolKIDS
kids illustration, children's cartoon, close-up, happy boy, looking sideways, kitchen in the background <lora:coolkidsMERGEV25.Qqci:1>
Use Western model without any LoRa
western comic, portrait, superman, close-up, looking sideways, skyscrapers in the background
, could probably replace the superhero
Heavily stylized illustrations; use DreamShaper + Gotcha:0.4
Tone down the ControlNets a lot to give the LoRa space to hallucinate! Cross-breeding with completely non-human looking animals doesn't work well anyway, though. So no frogs please.
stylized cartoon, illustration, portrait of a monkey, looking sideways, forest in the background <lora:gotchaV001.Yu4Z:0.4>
Use DreamShaper + Watercolor LoRa; works rather simple, but maybe it's worth it to specify "vibrant colors"; can use ControlNets very well
watercolor painting, hand-drawn illustration, portrait of a young man, looking sideways, clear white paper background <lora:watercolorv1.7lox:1>
Use the AbsoluteReality model, no LoRa; can use high edges control here!
portrait of a male NASA astronaut in spacesuit before rocket launch, space photography in the background, realistic photo, shot on DSLR
AbsoluteReality again, no LoRa; maybe specify -naked etc.; they all look so serious!
marble sculpture in a museum, bust of a male, art gallery in the background, realistic photo
Use specialized AnimePastelDream model without LoRa; tone down the Controls hard to give the model more space to hallucinate!
anime illustration, 1boy, smiling and happy, looking sideways, bright sun, summer, small town in the background
Use DreamShaper again, no LoRa; oops, specify gender!
stylized retro illustration, low palette, pastel colors, band album cover
DreamShaper, no LoRa
rough sketch, pencil drawing of a young male, black-and white, stylized illustration, hand-drawn
DreamShaper + ClayRedmond LoRa; careful, this can be nightmare fuel very quickly!
stopmotion, claymation, small clay figure of a young male, vibrant colors, fantastic plastic <lora:ClayAnimationRedmond15-ClayAnimation-Clay:1>
.. ooooor: Clazy model without LoRa; must use "clazy style" in prompt, otherwise the result is abhorrent. Works a lot better as a 50% refiner! And maybe I should disable edges and depth controls for this.
clazy style, stopmotion, claymation, small clay figure of a young male, vibrant colors, fantastic plastic