The Future of Data Labeling: Turning Your Foundation Model into a High-Volume Data Factory

This repository contains code, examples, and resources for blog post. The blog explores the posibility of using foundation models, particularly large diffusion models like Stable Diffusion, to generate large-scale labeled datasets efficiently. It focuses on a use case involving the CelebA facial dataset and demonstrates how these generative models can produce high-quality, annotated facial images for various AI applications.

Overview

The primary goal of this repository is to provide an easy-to-understand and practical implementation of Stable Diffusion for data labeling. The ideas and techniques demonstrated in this repository are on top of the work done in the Semantic Image Editing project, which can be found at the original GitHub repository. Additionally, the last part of the pipeline uses optional face restoration model CodeFormer. Follow the instructions inside the repo to install CodeFormer.

Installation

To get started with the code and examples, you'll need to install the required dependencies. We recommend using a virtual environment to manage your dependencies. You can install the required Python packages using pip:

pip install -r requirements.txt

Also download the shape_predictor_68_face_landmarks.dat file and put it under the root folder

For CodeFormer, please follow the instructions in its repo.

Usage

Use collect_seeds.py file to generate initial annotated face images. Use augment_seeds.py file to use latent space interpolation to augment the existing data.

Citation

@article{brack2023Sega,
      title={SEGA: Instructing Diffusion using Semantic Dimensions}, 
      author={Manuel Brack and Felix Friedrich and Dominik Hintersdorf and Lukas Struppek and Patrick Schramowski and Kristian Kersting},
      year={2023},
      journal={arXiv preprint arXiv:2301.12247}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
CodeFormer @ 50489a6		CodeFormer @ 50489a6
FacePrompt		FacePrompt
compare		compare
src		src
.gitmodules		.gitmodules
README.md		README.md
attributes.json		attributes.json
augment_seeds.py		augment_seeds.py
collect_seeds.py		collect_seeds.py
image_prompt_augment.py		image_prompt_augment.py
initiate_random_prompt.py		initiate_random_prompt.py
requirements.txt		requirements.txt
setup.json		setup.json
setup.py		setup.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Future of Data Labeling: Turning Your Foundation Model into a High-Volume Data Factory

Table of Contents

Overview

Installation

Usage

Citation

About

Releases

Packages

Languages

asrlhhh/diffusion-data-factory

Folders and files

Latest commit

History

Repository files navigation

The Future of Data Labeling: Turning Your Foundation Model into a High-Volume Data Factory

Table of Contents

Overview

Installation

Usage

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages