diff --git a/contents/core/privacy_security/privacy_security.qmd b/contents/core/privacy_security/privacy_security.qmd index fa0ce5552..3d0d1e579 100644 --- a/contents/core/privacy_security/privacy_security.qmd +++ b/contents/core/privacy_security/privacy_security.qmd @@ -219,20 +219,6 @@ After retraining on the poisoned data, the model's false negative rate increased This case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains. -#### Case Study: Protecting Art Through Data Poisoning - -Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to modify their images subtly before uploading them online. - -While these changes are imperceptible to the human eye, they can significantly degrade the performance of generative AI models when integrated into the training data. Generative models can be manipulated to produce unrealistic or nonsensical outputs. For example, with just 300 corrupted images, the University of Chicago researchers could deceive the latest Stable Diffusion model into generating images of canines resembling felines or bovines when prompted for automobiles. - -As the quantity of corrupted images online grows, the efficacy of models trained on scraped data will decline exponentially. Initially, identifying corrupted data is challenging and necessitates manual intervention. Subsequently, contamination spreads rapidly to related concepts as generative models establish connections between words and their visual representations. Consequently, a corrupted image of a "car" could propagate into generated images linked to terms such as "truck," "train," and "bus." - -On the other hand, this tool can be used maliciously and affect legitimate generative model applications. This shows the very challenging and novel nature of machine learning attacks. - -@fig-poisoning demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in various categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow. - -![Data poisoning. Source: @shan2023prompt.](images/png/Data_poisoning.png){#fig-poisoning} - ### Adversarial Attacks {#sec-security_adversarial_attacks} Adversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) [@parrish2023adversarial]. By adding slight perturbations to input data, adversaries can "hack" a model's pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction.