You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !
The text was updated successfully, but these errors were encountered:
Hi, I have another question: why are both the dynamic diffuser and LDM based on U-Net, but the dynamic diffuser is much smaller than a uni-modal conditional diffusion model? Is it because the model structure of the dynamic diffuser is greatly simplified? Could you please explain this in a bit more detail, thank you !
Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !
The text was updated successfully, but these errors were encountered: