Gayoon Choi ·
Taejin Jeong ·
Sujung Hong ·
Jaehoon Joo ·
Seong Jae Hwang
Yonsei University
- News and Update
- TODO
- Requirements
- Run DragText with Gradio User Interface
- Run DragText for Drag Bench Evaluation
- Citation
🚨 DragDiffusion and its follow-up works, including DragText, were developed based on runawayml/stable-diffusion-1.5
. However, this file has been completely removed from HuggingFace and is no longer accessible. As a temporary solution, we are exploring alternatives such as CompVis/stable-diffusion-v1-4
, Lykon/dreamshaper-8
, and benjamin-paine/stable-diffusion-v1-5
, following the guidelines of the diffusers communities. However, we would like to inform you in advance that these alternatives may not guarantee the same results as those presented in the original papers.
- [Sep 4th] v0.0.0 Release.
- Implement basic function of DragText
- We do not support the "Editing Generated Image" tab from DragDiffusion. However, you may use the "Editing Real Image" tab to edit diffusion-generated images instead.
- We do not support the StyleGAN2 version of FreeDrag. DragText is developed for text-conditioned diffusion models.
- Release inference code and model
- Release Gradio demo
- Open in Colab
- Enable embedding controll in User Interface
Currently, we support four drag editing methods: DragDiffusion, FreeDrag, DragNoise, and GoodDrag. These methods are all based on DragDiffusion but differ in the required library versions and supported User Interfaces. Therefore, we recommend setting up separate virtual environments and running the code independently for each method, rather than managing all four methods simultaneously. If you encounter any issues when building the virtual environments, please refer to each method's repository for guidance.
Additionally, it is recommended to run our code on an Nvidia GPU with a Linux system. We have not tested it on other configurations. On average, DragText requires around 14GB GPU memory.
To install the required libraries, clone our repository and simply run the following commands:
git clone https://github.com/MICV-yonsei/DragText.git
cd ./DragDiffusion
conda env create -f environment.yaml
conda activate dragdiff
cd ./FreeDrag--diffusion--version
conda env create -f environment.yaml
conda activate freedragdif
This environment is the same as DragDiffusion. Thus you can use the environment of DragDiffusion, instead.
cd ./DragNoise
conda env create -f environment.yaml
conda activate dragnoise
Need Python 3.9 or higher.
cd ./GoodDrag
conda create -n gooddrag python==3.9
conda activate gooddrag
pip install -r requirements.txt
To replace runawayml/stable-diffusion-1.5
, we are currently exploring alternative options. One such option, benjamin-paine/stable-diffusion-v1-5
, requires accepting the terms and conditions. To proceed, you need to log in to HuggingFace and agree to the terms. And run the following command in your terminal:
huggingface-cli login
See more information about benjamin-paine/stable-diffusion-v1-5 and HuggingFace Gated models.
To start drag editing in user-interactive manner, run the following to start the gradio:
cd ./DragText/DragDiffusion
conda activate dragdiff
python drag_ui.py
cd ./DragText/FreeDrag/FreeDrag--diffusion--version
conda activate freedragdif
python drag_ui.py
cd ./DragText/DragNoise
conda activate dragnoise
python drag_ui.py
cd ./DragText/GoodDrag
conda activate GoodDrag
python gooddrag_ui.py
Although each method has differences in its interface, the basic usage is the same.
- Place your input image in the left-most box.
- Enter a prompt describing the image in the "prompt" field.
- Click the "Train LoRA" button to begin training the LoRA based on the input image.
- Use the left-most box to draw a mask over the areas you want to edit.
- In the middle box, select handle and target points. You can also reset all points by clicking the "Undo point" button.
- Click the "Run" button to apply the algorithm, and the edited results will appear in the right-most box.
If you're interested in the details of each individual interface, please refer to each method's repository for guidance.
DragText is evaluated under DragBench Dataset. We provide evaluation code for DragDiffusion, FreeDrag, DragNoise, and GoodDrag (both with and without DragText).
To evaluate using DragBench, follow the steps below:
Download DragBench and place it in the folder ./(METHOD-FOLDER)/drag_bench_evaluation/drag_bench_data/
then unzip the files. The resulting directory structure should look like the following:
drag_bench_data
--- animals
------ JH_2023-09-14-1820-16
------ JH_2023-09-14-1821-23
------ JH_2023-09-14-1821-58
------ ...
--- art_work
--- building_city_view
--- ...
--- other_objects
Train one LoRA on each image in drag_bench_data
folder. We follow hyperparameters provided each method (e.g. fine-tuning steps, learning rate, rank, etc.).
python run_lora_training.py
You can easily control whether to apply DragText by using the --optimize_text
argument. For example:
# w/o DragText (Original method)
python run_drag_diffusion.py
# w/ DragText (+ Optimize text embedding)
python run_drag_diffusion.py --optimize_text --text_lr 0.004 --text_mask --text_lam 0.1 # default settings
Please note that executing the evaluation code for Mean Distance requires around 40GB GPU memory. And we recommend to run evaluation codes with the dragdiff
environment.
# LPIPS and CLIP similiarity
python run_eval_similarity.py
# Mean Distance
python run_eval_point_matching.py
For more information, check Drag Bench Evaluation.
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025
Gayoon Choi, Taejin Jeong, Sujung Hong, Jaehoon Joo, Seong Jae Hwang
Yonsei University
Point-based image editing enables accurate and flexible control through content dragging. However, the role of text embedding in the editing process has not been thoroughly investigated. A significant aspect that remains unexplored is the interaction between text and image embeddings. In this study, we show that during the progressive editing of an input image in a diffusion model, the text embedding remains constant. As the image embedding increasingly diverges from its initial state, the discrepancy between the image and text embeddings presents a significant challenge. Moreover, we found that the text prompt significantly influences the dragging process, particularly in maintaining content integrity and achieving the desired manipulation. To utilize these insights, we propose DragText, which optimizes text embedding in conjunction with the dragging process to pair with the modified image embedding. Simultaneously, we regularize the text optimization process to preserve the integrity of the original text prompt. Our approach can be seamlessly integrated with existing diffusion-based drag methods with only a few lines of code.
If you found this code useful, please cite the following paper:
@article{dragtext2024,
title={DragText: Rethinking Text Embedding in Point-based Image Editing},
author={Choi, Gayoon and Jeong, Taejin and Hong, Sujung and Joo, Jaehoon and Hwang, Seong Jae},
journal={arXiv preprint arXiv:2407.17843},
year={2024}
}
This work is inspired by amazing DragGAN and DragDiffusion. Also, our code is developed upon DragDiffusion, FreeDrag, Drag Your Noise, and GoodDrag. We would also like to express our gratitude to the authors of these works and the community for their valuable contributions.
Code related to the DragText algorithm is under Apache 2.0 license.