Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 18 (old chapter 17) Feedback Revisions #571

Merged
merged 15 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 1 addition & 15 deletions contents/core/privacy_security/privacy_security.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -219,21 +219,7 @@ After retraining on the poisoned data, the model's false negative rate increased

This case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.

#### Case Study: Protecting Art Through Data Poisoning

Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to modify their images subtly before uploading them online.

While these changes are imperceptible to the human eye, they can significantly degrade the performance of generative AI models when integrated into the training data. Generative models can be manipulated to produce unrealistic or nonsensical outputs. For example, with just 300 corrupted images, the University of Chicago researchers could deceive the latest Stable Diffusion model into generating images of canines resembling felines or bovines when prompted for automobiles.

As the quantity of corrupted images online grows, the efficacy of models trained on scraped data will decline exponentially. Initially, identifying corrupted data is challenging and necessitates manual intervention. Subsequently, contamination spreads rapidly to related concepts as generative models establish connections between words and their visual representations. Consequently, a corrupted image of a "car" could propagate into generated images linked to terms such as "truck," "train," and "bus."

On the other hand, this tool can be used maliciously and affect legitimate generative model applications. This shows the very challenging and novel nature of machine learning attacks.

@fig-poisoning demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in various categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.

![Data poisoning. Source: @shan2023prompt.](images/png/Data_poisoning.png){#fig-poisoning}

### Adversarial Attacks
### Adversarial Attacks {#sec-security_adversarial_attacks}

Adversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) [@parrish2023adversarial]. By adding slight perturbations to input data, adversaries can "hack" a model's pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading