Hadi M. Dolatabadi, Sarah Erfani, and Christopher Leckie 2022
This repository contains the official pytorch implementation of the ACCV 2022 paper COLLIDER: A Robust Training Framework for Backdoor Data.
Abstract: Deep neural network (DNN) classifiers are vulnerable to backdoor attacks. An adversary poisons some of the training data in such attacks by installing a trigger. The goal is to make the trained DNN output the attacker's desired class whenever the trigger is activated while performing as usual for clean data. Various approaches have recently been proposed to detect malicious backdoored DNNs. However, a robust, end-to-end training approach like adversarial training, is yet to be discovered for backdoor poisoned data. In this paper, we take the first step toward such methods by developing a robust training framework, COLLIDER, that selects the most prominent samples by exploiting the underlying geometric structures of the data. Specifically, we effectively filter out candidate poisoned data at each training epoch by solving a geometrical coreset selection objective. We first argue how clean data samples exhibit (1) gradients similar to the clean majority of data and (2) low local intrinsic dimensionality (LID). Based on these criteria, we define a novel coreset selection objective to find such samples, which are used for training a DNN. We show the effectiveness of the proposed method for robust training of DNNs on various poisoned datasets, reducing the backdoor success rate significantly.
To install requirements:
pip install -r requirements.txt
To generate poisoned datasets, use the data_poisoning.ipynb
notebook. Alternatively, you can load your own poisoned dataset to train a model in the main.py
. To this end, just find the data loaders and create your own dataloading pipeline.
To train a neural network using COLLIDER, specify the arguments and run the following script:
python main.py
--gpu <GPU_DEVICE> \
--dataset <DATASET_NAME> \
--backdoor <ATTACK_TYPE> \
--injection_rate <POISONING_RATE> \
--target_class <ATTACK_TARGET_CLASS> \
--data_seed <DATASET_SEED> \
--arch <MODEL_ARCHITECTURE> \
--epochs <TOTAL_TRAINING_EPOCHS> \
--batch-size <BATCH_SIZE> \
--lr <SGD_LEARNING_RATE> \
--wd <SGD_WEIGHT_DECAY> \
--momentum <SGD_MOMENTUM> \
--enable_coresets \
--fl-ratio <CORESET_SIZE> \
--lid_start_epoch <WHEN_TO_START_LID_REG> \
--lid_overlap <LID_NUMBER_OF_NEAREST_NEIGHBORS> \
--lid_batch_size <LID_BATCH_SIZE> \
--lid-lambda <LID_LAGRANGE_MULTIPLIER> \
--lid_hist <LID_MOVING_AVERAGE_WINDOW>
Parameters:
GPU_DEVICE
— name of the GPU deviceDATASET_NAME
— dataset name [cifar10/svhn/imagenet12]ATTACK_TYPE
— backdoor attack type [badnets/cl/sig/htba/wanet/no_backdoor]POISONING_RATE
— the ratio of poisoned data in the target class (between 0 and 1)ATTACK_TARGET_CLASS
— target class of the backdoor attackDATASET_SEED
— dataset seedMODEL_ARCHITECTURE
— neural network architectureTOTAL_TRAINING_EPOCHS
— training epochsBATCH_SIZE
— training batch sizeSGD_LEARNING_RATE
— SGD optimizer learning rateSGD_WEIGHT_DECAY
— SGD optimizer weight decaySGD_MOMENTUM
— SGD optimizer momentumCORESET_SIZE
— size of the coreset (between 0 and 1)WHEN_TO_START_LID_REG
— epoch to start LID regularizationLID_NUMBER_OF_NEAREST_NEIGHBORS
— number of nearest neighbors in LID computationLID_BATCH_SIZE
— batch size to compute LIDLID_LAGRANGE_MULTIPLIER
— Lagrange multiplier to add LID to the coreset selection coeffsLID_MOVING_AVERAGE_WINDOW
— moving average window to average LID
The primary results of this work are given in the table below. In each multi-row, we give our results for a particular attack type, where we compare a vanilla training vs. training with gradient-based coresets vs. the full COLLIDER objective. As shown, COLLIDER reduces the threat of backdoor attacks significantly.
Clean test accuracy (ACC) and attack success rate (ASR) in % for backdoor data poisonings on CIFAR-10 (BadNets, label-consistent, and WANet) and SVHN (sinusoidal strips) datasets. The results show the mean and standard deviation for 5 different seeds. The poisoned data injection rate is 10% for BadNets, label-consistent and sinusoidal strips, while it is 40% for WANet. For BadNets and label-consistent attacks, the coreset size is 0.3. It is 0.4 for WANet and sinusoidal strips.
Backdoor Attack | Data | Training | Performance Measures | |
---|---|---|---|---|
ACC (%) | ASR (%) | |||
BadNets | CIFAR-10 | Vanilla | 92.19±0.20 | 99.98±0.02 |
Coresets | 84.86±0.47 | 74.93±34.6 | ||
COLLIDER | 80.66±0.95 | 4.80±1.49 | ||
Label Consistent | CIFAR-10 | Vanilla | 92.46±0.16 | 100 |
Coresets | 83.87±0.36 | 7.78±9.64 | ||
COLLIDER | 82.11±0.62 | 5.19±1.08 | ||
WANet | CIFAR-10 | Vanilla | 91.63±0.28 | 92.24±1.74 |
Coresets | 86.04±0.89 | 5.73±2.78 | ||
COLLIDER | 84.27±0.55 | 4.29±2.54 | ||
Sinusoidal Strips | SVHN | Vanilla | 95.79±0.20 | 77.35±3.68 |
Coresets | 92.30±0.19 | 24.30±8.15 | ||
COLLIDER | 89.74±0.31 | 6.20±3.69 |
This repository is mainly built upon CRUST. We thank the authors of this repository.
If you have found our code or paper beneficial to your research, please consider citing it as:
@inproceedings{dolatabadi2022collider,
title={COLLIDER: A Robust Training Framework for Backdoor Data},
author={Hadi Mohaghegh Dolatabadi and Sarah Erfani and Christopher Leckie},
booktitle = {Proceedings of the Asian Conference on Computer Vision ({ACCV})},
year={2022}
}