Unmasking Privacy: A Reproduction and Evaluation Study of Obfuscation-based Perturbation Techniques for Collaborative Filtering

This repository contains code to reproduce our results from "Unmasking Privacy: A Reproduction and Evaluation Study of Obfuscation-based Perturbation Techniques for Collaborative Filtering" by Alex Martinez, Mihnea Tufis and Ludovico Boratto published at the The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'24).

Abstract

Recommender systems (RecSys) solve personalisation problems and therefore heavily rely on personal data – demographics, user preferences, user interactions – each baring important privacy risks. It is also widely accepted that in RecSys performance and privacy are at odds, with the increase of one resulting in the decrease of the other. Among the diverse approaches in privacy enhancing technologies (PET) for RecSys, perturbation stands out for its simplicity and computational efficiency. It involves adding noise to sensitive data, thus hiding its real value from an untrusted actor. We reproduce and test a set of four randomization-based perturbation techniques developed by Batmaz and Polat [1] for privacy preserving collaborative filtering. While the framework presents great advantages – low computational requirements, several useful privacy-enhancing parameters – the supporting paper lacks conclusions drawn from empirical evaluation. We address this shortcoming by proposing – in absence of an implementation by the authors – our own implementation of the obfuscation framework. We then develop an evaluation framework to test the main assumption of the reference paper – that RecSys privacy and performance are competing goals. We extend this study to understand how much we can enhance privacy, within reasonable losses of the RecSys performance. We reproduce and test the framework for the more realistic scenario where only implicit feedback is available, using two well-known datasets (MovieLens-1M and Last.fm-1K), and several state-of-the-art recommendation algorithms (NCF and LightGCN from the Microsoft Recommenders public repository).

[1] Zeynep Batmaz and Huseyin Polat. 2016. Randomization-based privacy-preserving frameworks for collaborative filtering. Procedia Computer Science 96 (2016), 33–42

Reproducibility

Download the datasets (Movielens-1M and Last.fm-1K) and save them in the data folder.
Run data/pre-processing.ipynb to pre-process the datasets (as described in Section 2.1 of our paper) and create the train and test splits.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
notebooks		notebooks
obfuscation		obfuscation
recommenders		recommenders
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unmasking Privacy: A Reproduction and Evaluation Study of Obfuscation-based Perturbation Techniques for Collaborative Filtering

Abstract

Reproducibility

About

Releases

Packages

Languages

alexmartinezmiguel/unmasking-privacy

Folders and files

Latest commit

History

Repository files navigation

Unmasking Privacy: A Reproduction and Evaluation Study of Obfuscation-based Perturbation Techniques for Collaborative Filtering

Abstract

Reproducibility

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages