Skip to content

Commit

Permalink
remove duplicate case study from security chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
jasonjabbour committed Jan 1, 2025
1 parent c4cdecb commit a49c77e
Showing 1 changed file with 0 additions and 14 deletions.
14 changes: 0 additions & 14 deletions contents/core/privacy_security/privacy_security.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -219,20 +219,6 @@ After retraining on the poisoned data, the model's false negative rate increased

This case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.

#### Case Study: Protecting Art Through Data Poisoning

Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to modify their images subtly before uploading them online.

While these changes are imperceptible to the human eye, they can significantly degrade the performance of generative AI models when integrated into the training data. Generative models can be manipulated to produce unrealistic or nonsensical outputs. For example, with just 300 corrupted images, the University of Chicago researchers could deceive the latest Stable Diffusion model into generating images of canines resembling felines or bovines when prompted for automobiles.

As the quantity of corrupted images online grows, the efficacy of models trained on scraped data will decline exponentially. Initially, identifying corrupted data is challenging and necessitates manual intervention. Subsequently, contamination spreads rapidly to related concepts as generative models establish connections between words and their visual representations. Consequently, a corrupted image of a "car" could propagate into generated images linked to terms such as "truck," "train," and "bus."

On the other hand, this tool can be used maliciously and affect legitimate generative model applications. This shows the very challenging and novel nature of machine learning attacks.

@fig-poisoning demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in various categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.

![Data poisoning. Source: @shan2023prompt.](images/png/Data_poisoning.png){#fig-poisoning}

### Adversarial Attacks {#sec-security_adversarial_attacks}

Adversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) [@parrish2023adversarial]. By adding slight perturbations to input data, adversaries can "hack" a model's pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction.
Expand Down

0 comments on commit a49c77e

Please sign in to comment.