-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVD error #132
Comments
Hello @Tanghonghan! I'm not sure if you're the same user who came to report this in the Discord or not, but I believe this is caused by selecting the SVD model in the model picker. I was not clear enough on this, but you do not need to select SVD to use it. Simply enabling animation and setting the engine to "Stable Video Diffusion" will tell enfugue to load the SVD motion module when it is necessary to do so, and enfugue will load it alongside your chosen stable diffusion checkpoint when it is relevant to do so. |
Thank you for your response! I strictly followed your advice and problem solved, yet another problem arose: Couldn't generate normal animation using SVD. Either image stood still as image(and not animation) or some error occurs. Could you please make a video on how to generate animation using Enfugue and release it on Youtube? Especially how to use DragNUWA on Enfugue as well cause I found it very hard to do so. Thanks again for your effort! Much appreciatied! |
Hey @Tanghonghan, Glad the first problem is solved, sorry you hit another issue! Did you follow the instructions from this video? If you did and still experienced errors, I would love to know a bit more about what you were trying to do. Were there a lot of motion vectors involved? What resolution was the image, and how many frames did you try to do? 297960767-ae28ac55-2eba-4315-9362-29dc41cdd8d4.mp4If you want to share all the settings from the front-end, you can go to |
Actually I watched this video you showed me several times, and I successfully generate an animation using the method in the video so thank you very much. Yet I encounted some more problems( seems they keep popping out no matter what hahaha). One of which is : I don't know what image I am currently working at. I thought I am working on picture A, but in fact when I run "Enfugue", it shows that I was actually working on pic B, and that's very confusing. I guess it has sth to do with layers which I am not very familiar with. But when I delete all the layers exhibiting on the down right corner (and kept only one), the result still shows that I was working on some other pic instead. I found that a bit frustrating. Another question is: when I drag some picture in the canvas, it only shows part of the pic, and tne buttons on the top right of pic seems useless, I don't know why. I couldn't let the whole pic show no matter how I tried. And the animation also is only part of the pic. Those are the two main problems I was encountering. Thank you for your patience and time! |
[ Oh almost forget, I uploaded my json for your reference. Thanks! |
Hey @Tanghonghan! Very glad you got farther! I'll answer your questions in order:
Controls: General
Motion Vectors
|
Thank you for your explanation in detail, I think I begin to get hold of what you say a little bit. Last time when I think I work on pic A, I was actually working on pic B on canvas, and "samples" clearly is no canvas. That might be what got me confused. Still something bothers me: If I use a pic at 17921024, how can I generate a DragNUWA svd animation with it? Or maybe I couldn't right now? I tried to manipulated the resolution, but as soon as I change it, the pic becomes part of the original one and cannot show the whole pic (and animation goes with that). If I want to generate a DragNUWA svd animation, the max resolution I use is 10241024 as you said in the last article? That surely puts a lot of limitation on the animation we generate. For eg, if I have some pics from Midjourney at the resolution of 1456*816, I will have to lower the resolution before I put them inside Enfugue? That's really a bummer.... Wish you a happy day! best, Tanghonghan |
Images and video are still very different when it comes to AI generation, I'm afraid. Runway's Gen-2 model runs on big server GPU's and only generates at 768x448, for example - relying on upscaling to bring it higher. Generating coherent animation is still very expensive in terms of memory, even after all of the optimizations I and others have been able to make. There are many features in ENFUGUE to make up for these limitations, though. I decided to record a video of me generating an image at the resolution of MJ at 1456x816, using that to generate an animation using DragNUWA at 896x512, then upscaling the result to 1792x1024 using AnimateDiff and interpolating 14 frames to 112 frames without leaving ENFUGUE, taking 20 minutes total (including working time and render time) on a 3090 Ti. Here is the result of that: b24a7bcb70bf4fa681322ba77b3e1e06.mp4And here is the video: anim.mp4After the video was done I decided to go back and give it another pass just to show you that more can be done with tweaking. For this final animation, I interpolated once to a total of 28 frames prior to upscaling, then instead of doing an upscale step with AnimateDiff, I did it with HotshotXL and the same OpenDallE model that I made the image with. I prefer this to the first: d9e83c46e55f4003857a127b69c93c55.mp4 |
@painebenjamin . For eg, I found it very hard try to get the people moving in this image. Is there anyway to achieve that goal? |
Absolutely! DragNUWA is great for camera motion, with maybe one or two subjects in frame doing the action you wanted. The AI will fill in the remainder of the movement in the frame, and as you've discovered it won't always result in as much motion as you want. So my first tip is that if you can forego some control, you can give some more power back to the AI to fill in the gaps and it can identify more motion in frame. This is the result of taking that image and running it through SVD-XT without any further processing: walking.mp4If you want to interpolate and then add more background motion to the video, doing video-to-video with AnimateDiff and a high Motion Attention Scale (this is 2.75) can add a lot of varying movements in the scene. f8e76c28e59744a5a866b22500edd901.mp4 |
@painebenjamin |
Certainly, it was very simple! I merely brought in the image, scaled it to the ideal SVD dimensions of 1024x576, and selected the SVD-XT model. All of the other settings were default! Here is a video recording of me setting up a similar run, and here is the result of that run: For the second of the two videos, I followed the same upscale steps I showed in the other screen recording! |
@painebenjamin SVD_01098.mp4But yesterday when I tried to use DragNUWA in Enfugue, I only got something like this (no subject moving, just some trash flying in the air), and that was what got me confused. 3776b89fee774095bea79590d04def88.mp4Not using DragNUWA actually generates better outcome than using DragNUWA? Still don't know why. Second question: could you share the workflow concerning interpolation and video to video using animatediff? And what is Motion Attention Scale (this is 2.75) exactly? I only know motion bucket ID. Last but not least, I promoted Enfugue to several of my chatgroups, which amount to total of 1000 Chinese Comfyui users! After a couple of days using, Enfugue strikes me as an awesome ui under your guidance, wish it gain more and more users with your continuous hard working, and hope Enfugue evolves into greater version through time! Best, Tanghonghan |
@Tanghonghan There are a number of reasons base SVD outperforms DragNUWA at this task.
Overall, NUWA's current power does not lie in its ability to create high-quality video. Instead, its power is in creating highly controllable video, for when you want a specific shot with specific camera language or subject movements. At the moment our best bet for producing high-quality AI video is a combination of methods with SVD, DragNUWA, AnimateDiff and HotshotXL, using the strengths of each to make up for the shortcomings in the others. In this video, I take the video I made and upscale it with ESRGAN and interpolate it with FILM. Thisn does not do video-to-video, it is just upscale/interpolate. 0001-0991.mp4Here is the result: moved.mp4I then wanted to take you through a video-to-video workflow. Using the same video, I produced an anime version: 7eb0558ba73446cd8d02552dd71b9498.mp4This requires a bit more configuration; it is using many different techniques in concert with one another. Here is how I produced that video: 0001-3389.mp4I know I moved fairly quickly in setting that up so I wanted to write out all the configuration - it is below. A quick note about that line in the corner you see in the workflow video: This is an artifact that can be produced by both AnimateDiff and HotshotXL when creating animations that are not the trained resolution of those models, which is 512x512. It appears in most motion modules, but not all. I typically will work around this by knowing I will cut off an edge of the animation (it is not always the same corner,) you can also work around this by enabling tiled diffusion but this also makes the render take longer, so it's a trade-off. In the above video, I cropped it out. Model ConfigurationIf you weren't aware, going to
Inference ConfigurationGlobal (Left-Hand Side)
Under 'Tweaks'
Under 'Animation'
Under 'AnimateDiff and HotshotXL'
Prompt
Layer Options (Right-Hand Side)Under 'Image Visibility'
Under 'Image Roles'
Under 'ControlNet Units'
|
@painebenjamin |
@painebenjamin Layer Options (Right-Hand Side) Two questions: |
Regarding your question about the prompt requirement: you need to use the global prompt (located at the bottom of the global options on the left sidebar), not the detail prompt, which is applicable only during the upscaling step. I understand this is confusing - the upscaling interface is slated for a complete redesign to address this issue. Concerning your other two questions:
I know that the last point is complex and is indeed one of ENFUGUE's weaker aspects. It's been long overdue for an overhaul, and I'm now dedicating time to improve it for version 0.4.0. If you know any human interface designers or user experience experts interested in contributing to an open-source project, their expertise would be immensely valuable. 😄 Thank you so much for your support and for spreading the word! |
Hi, I read your explanation a few times, and found those two issues a little bit hard to understand. Let me rephrase my words of them: I have spread your need for human interface designer and user experience experts in my chatgroups, hopefully there will be someone who would like to contribute to Enfugue in the days to come. best, Tanghonghan |
Can generate image using Enfugue, but couldn't generate animation using it.
Once I tried to using SVD, this error always appears, don't know if my setting is not right, or it's anything else.
Thanks!
The text was updated successfully, but these errors were encountered: