This is the official implementation of the paper "Semantic Adversarial Attacks via Diffusion Models". This codebase is built on top of DiffusionCLIP (mainly) and LatentHSJA (the dataset and the classifier). The proposed ST and LM approaches are implemented in this codebase.
pytorch-grad-cam for explainable models. pytorch-fid for calculating fid scores. gan-metrics-pytorch for calculating kid scores. improved-precision-and-recall-metric-pytorch for calculating precision and recall. We appreciate the efforts made from the opensource community, if some libraries we used and not mentioned here, feel free to let us know!
The enviroment we use is very similar to DiffusionCLIP, and we also provide a environments.yml
file. You can run the following commands to setup environment:
conda env create --name envname --file=environments.yml
Note that we use pytorch==1.11.0 and Python==3.8.12.
The CelebA-HQ Facial Identity Recognition Dataset and CelebA-HQ Face Gender Recognition Dataset can be downloaded from LatentHSJA. The AFHQ dataset can be downloaded from StarGanv2.
For CelebA-HQ identity dataset, please refer to this. For CelebA-HQ gender dataset, plrease refer to this. The model weights can be downloaded from LatentHSJA. You could also run the following for CelebA-HQ identity and gender datasets (change dataset dir for gender dataset):
python train_classifier.py
and the following for AFHQ dataset:
python train_classifier_afhq.py
Only correctly classified images are attackable. Also, in the LM approach, we must find image pairs with predicted labels. The following commands are for CelebA-HQ identity dataset, CelebA-HQ gender dataset, and AFHQ dataset, respectively:
python pick_indices_identity.py # pick indices for CelebA-HQ identity dataset
python pick_indices_gender.py # pick indices for CelebA-HQ gender dataset
python pick_indices_afhq.py # pick indices for AFHQ dataset
The indices would be saved as pickle files. These indices is for a specific classifier (i.e. the victim classifier).
For CelebA-HQ identity and gender datasets, we use the diffusion model weights pretrained on CelebA-HQ from DiffusionCLIP: IR-SE50. For AFHQ dataset, we use the diffusion model weights from ILVR+ADM:drive. and finetune it with the following (although the best way is to train a diffusion model for AFHQ dataset from scratch):
python main_afhq_train.py
The commands for CelebA-HQ identity dataset are stored in commands/command_for_celebaHQ_identity_ST_approach
and commands/command_for_celebaHQ_identity_LM_approach
folders.
The commands for CelebA-HQ gender dataset are stored in commands/command_for_celebaHQ_gender
folder.
The commands for AFHQ dataset are stored in commands/command_for_AFHQ
folder.
Basically, taking CelebA-HQ identity dataset as an example, for the ST approach, we have:
python main.py --attack \
--config celeba.yml \
--exp experimental_log_path \
--t_0 500 \
--n_inv_step 40 \
--n_test_step 40 \
--n_precomp_img 100 --mask 9 --diff 9 --tune 0 --black 0
parameters:
--config
specifies which dataset we use--exp
specifies the log directory--t_0
specifies how many semantic adversarial images we'd like to generate--black
specifies black/white-box attack (0 for white-box and 1 for black-box)--tune
is the attack strategy (0 for fintuning the latent space, 1 for finetuning the diffusion model, and 2 for finetuning both the latent space and diffusion model).- Some parameters are unused, e.g.
--mask
and--diff
since these are for the LM approach.
Taking CelebA-HQ identity dataset as an example, for the LM approach, we have:
python main.py --attack \
--config celeba.yml \
--exp experimental_log_path \
--n_test_img 4 \
--t_0 500 \
--n_inv_step 40 \
--n_test_step 40 \
--n_precomp_img 500 --mask 2 --diff 1 --tune 3 --black 0
--config
specifies which dataset we use--exp
specifies the log directory--t_0
specifies how many semantic adversarial images we'd like to generate--black
specifies black/white-box attack (0 for white-box and 1 for black-box)--mask
specifies which explainable model we use (2 for GradCAM, 3 for FullGrad, 4 for SimpleFullGrad, and 5 for SmoothFullGrad)--diff
specifies based on which we generate the attack (1 for source image, 2 for target image, 3 for combing both source and target image)--tune
should be set to 3 for all cases in the LM approach.
For CelebA-HQ gender dataset and AFHQ dataset, the only difference is to replace the dataset and the pretrained diffusion model as in the commands/command_for_CelebaHQ_gender
and commands/command_for_AFHQ
folders.
Please cite our paper if you feel this is helpful:
@inproceedings{wang2023semantic,
title={Semantic Adversarial Attacks via Diffusion Models},
author={Wang, Chenan and Duan, Jinhao and Xiao, Chaowei and Kim, Edward and Stamm, Matthew and Xu, Kaidi},
booktitle={34rd British Machine Vision Conference 2023, BMVC 2023, Aberdeen, UK, November 20-24},
year={2023}
}