KU Leuven LStat Datathon 2023

Team: Voodoo Boyz

Crew:

Jakub Cierocki - Capitan, MSc Statistics and Data Science
Kristaps Greitans, Master of Electromechanical Engineering Technology
Aleksandra Kotowicz, MSc Statistics and Data Science
Tristan Vandervelde, MSc Statistics and Data Science

This repository is intended to store all the analysis and codes developed during Datathon 2023.

Problem description

We were given (by the organisers) a graph databse with various data about artists and artworks:

artworks with images
artists with basic bio's
art movements
the universities the artists studies at
the places they were born and died
artwork types
syntetic images generated using text-to-speech technology
etc.

The data was scraped primarly from wikigallery.org and included many missing values and data-cleaness issues.

The task was to pursue any analysis and then present it in a way that would impress the jury.

Data and modelling:

Image data consists of ~ 13k images od size ~ 512x512, being either real or generated, with proportion ~ 1:3 (class imbalance). Among the ~ 10k images of real artwork, we were able to match movements to ~ 4k of them (based on Artwork -- Artist -- Movement relation in the graph). Howewer we were able to provide unique movements only for 2872 images.

We used the data mentioned above to develop 3 different ML models.

SVM-based syntetic (generated) image detector:
- grayscale, 64x64 middle pixels crop
- 1D Discrete Consine Tranform
- only high and very low frequencies extracted
- log(abs(.)) transformation
- RBF kernel
- balanced accuracy out-of-sample: 73%
- F1 score out-of-sample: 0.83
- inspiration: see Ricker et al. (2022)
Neural-based syntetic (generated) image detector:
- inital 320x320 middle pixels crop
- network built-in 225x225 rescale and normalization
- 2D Fast Fourier Tranform
- only middle pixels extracted
- GFNet neural architecture
- PyTorch backend
- accuracy out-of-sample: 86%
- inspiration: see Corvi et al. (2022) and Rao et al. (2021)
Neural-based art movement multiclass classifier:
- rare movements dropped, 51 remained
- initial 640x640 resize
- data augmentations used to enlarge train dataset
- Yolo v8 Nano neural network pre-trained for art movement classification
- fine-tuned on our dataset
- TOP5 accuracy: 94%
- TOP1 accuracy: 40%

Models 1 & 2 were both based on a idea of extracting invisible artifacts specific for a given generative architecture (in our case Stable Diffussion).

References

Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., & Verdoliva, L. (2022). On the detection of synthetic images generated by diffusion models.

Rao, Y., Zhao, W., Zhu, Z., Lu, J., & Zhou, J. (2021). Global Filter Networks for Image Classification.

Ricker, J., Damm, S., Holz, T., & Fischer, A. (2022). Towards the Detection of Diffusion Model Deepfakes.

Usage instructions

To run the main dashboard with visualisations:

gunicorn src.dashboard:server

To run 2nd dashboard with Computer Vision models GUI:

gunicorn src.ml_gui:server

To create required venv, run:

bash ./setup.sh PATH_TO_VENV

where PATH_TO_VENV needs to be replace with path to desired catalog you want the venv be created and all dependencies installed. By aware that pyenv is required by the script to work.

To activate venv, type:

source PATH_TO_VENV/bin/activate

Be aware that the first dashboard requires Neo4j database to be set up and populate.

Neo4j 5.5 is required. See link for installation notes.

Type:

python src/db_pop.py

to populate Neo4j database. Currently the project is configured to work with default, root Neo4j database, which may require to uncomment the line:

dbms.security.auth_enabled=false

in /etc/neo4j/neo4j.conf.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
GFNet @ 8ef7b80		GFNet @ 8ef7b80
data		data
notebooks		notebooks
other_materials		other_materials
output		output
src		src
weights		weights
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh
techstack.md		techstack.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KU Leuven LStat Datathon 2023

Team: Voodoo Boyz

Problem description

Data and modelling:

References

Usage instructions

About

Releases

Packages

Contributors 3

Languages

jcierocki/datathon-kul-voodoo-boyz

Folders and files

Latest commit

History

Repository files navigation

KU Leuven LStat Datathon 2023

Team: Voodoo Boyz

Problem description

Data and modelling:

References

Usage instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages