ICT4S'23: Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

This is the repository that contains the code for the ICT4S'23 paper. The code may be reproduced by referring to this paper. The presentation of the paper was recorded, you can find the recording here. The slides are provided in this repository. All files should have sufficient documentation for reproduction and understanding. Any remaining questions or comments may be sent to my e-mail.

The article and/or this repository should be cited as:

@inproceedings{de2023energy,
  title={Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques},
  author={de Reus, Pepijn and Oprescu, Ana and van Elsen, Koen},
  booktitle={2023 International Conference on ICT for Sustainability (ICT4S)},
  pages={57--65},
  year={2023},
  organization={IEEE}
}

About this repository

The repository is structured with the following folders and files:

Data

This folder contains the data sets obtained from the UCI machine learning repository, separated using two different folders. It also contains two Python files to clean and preprocess the data sets as described in the Experimental Setup of the paper. After running these files the Energy folder will be available containing the energy consumptions of the data preprocessing and cleaning.

Benchmark

The benchmark folder contains the Python scripts for three different machine learning models and one script (run_results.py) that combines these three models to obtain results. After running the results the Performance folder will be filled with measurements of the accuracy and energy consumption for this benchmark.

Anonymisation and Synthetic data

The folders for Anonymisation and Synthetic data have separate readme files with introduction and instructions to the code. The folders are used for anonymising or synthesising the data and performing the experiments, after which the results will be stored in these folders respectively. The hyperparameters used in our paper are included in the synthetic data generation code and anonymisation code.

Notebooks

Two notebooks are provided that use the results to summarise and visualise the data from the experiments. This notebook contains the plots used in Figures 3 and 4 of the paper. Finally this notebook contains the code required for the Mann Whitney U test presented in Table V.

The used notebooks for Tables II-IV and VI are available in the folders Anonymisation and Synthetic data.

Miscellaneous

The gitignore file is set up to ignore preprocessed data sets and results to keep the repository small in size. It also ignores .ipynb files and the checkpoints of these as these were used for development purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Anonymisation		Anonymisation
Benchmark		Benchmark
Data		Data
Synthetic_data		Synthetic_data
.gitignore		.gitignore
ICT4S23-presentation.pptx		ICT4S23-presentation.pptx
LICENSE		LICENSE
MannWhitney.ipynb		MannWhitney.ipynb
README.md		README.md
analysis_paper.ipynb		analysis_paper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICT4S'23: Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

About this repository

Data

Benchmark

Anonymisation and Synthetic data

Notebooks

Miscellaneous

About

Languages

License

PepijndeReus/Privacy-Enhancing-ML

Folders and files

Latest commit

History

Repository files navigation

ICT4S'23: Energy cost and machine learning accuracy impact of k-anonymisation and synthetic data techniques

About this repository

Data

Benchmark

Anonymisation and Synthetic data

Notebooks

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Languages