MLfix – using AI and UI to explore and fix datasets

This repository contains tools which can help you find mistakes in your labels. It helps if you have some image dataset (for example an object detection dataset with bounding boxes) and you:

want to make sure the objects are assigned to the correct class and bounding boxes are drawn
wish to explore it and discover the different variations occuring in the data

The tools work by sorting the images by visual similarity and then showing them in a streamlined user interface. The interface allows you to mark the photos so you can perform the QA process. The visual similarity sorting is based on a model trained in an unsupervised manner so it's not limited to ImageNet-like data.

We are still working on the documentation and examples (it will be comming in a few weeks). In the mean time you can check the presentation we did at OSS NA 2022.

Is your dataset overflowing with low quality samples? Our highly-skilled robots can help you! (generated by Centipede Diffusion based on an image prompt composed manually from two other generated images)

How to use

This library contains command line tools to process the image. Right now it's easiest to start with any dataset in the ImageNet format (one folder per class) or with just a folder of unsorted pictures. For example if you download the DeepFashion2 dataset you can run the following commands:

git clone https://github.com/collabora/MLfix.git
cd MLfix
pip install -e .
qa_backend_downsize_images ./deepfashion2 ./deepfashion2-256
qa_backend_pretrain --pretrained ./deepfashion2-256  # trains a model (starting from ImageNet weights)
                                                     # and generates the BoVW features
qa_backend_sort_images ./deepfashion2-256            # creates a JSON with all images sorted by similarity

Afterwards you can go run python -m http.server and go to the URL: http://localhost:8000/mlfix-ui/#../deepfashion2-256/barlow-twins-resnet18-pretrained-224-5e-proj2048-lr0.5e-3-sample-1024vw.json to load the MLfix web app.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
mlfix-ui		mlfix-ui
qa_backend		qa_backend
1. Pretrain BarlowTwins.ipynb		1. Pretrain BarlowTwins.ipynb
14x14-receptive-fields.mp4		14x14-receptive-fields.mp4
14x14-receptive-fields.png		14x14-receptive-fields.png
2. Extract features.ipynb		2. Extract features.ipynb
3. Sort images.ipynb		3. Sort images.ipynb
4A. Explore features.ipynb		4A. Explore features.ipynb
4B. Receptive fields.ipynb		4B. Receptive fields.ipynb
A. Downsize images.ipynb		A. Downsize images.ipynb
LICENSE.md		LICENSE.md
OSS NA 2022 presentation.pdf		OSS NA 2022 presentation.pdf
README.md		README.md
banner.jpg		banner.jpg
index.ipynb		index.ipynb
requirements.txt		requirements.txt
settings.ini		settings.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLfix – using AI and UI to explore and fix datasets

How to use

About

Releases

Packages

Contributors 2

Languages

License

collabora/MLfix

Folders and files

Latest commit

History

Repository files navigation

MLfix – using AI and UI to explore and fix datasets

How to use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages