mismatching images

harvard-edge · Jan 1, 2025 · c4cdecb · c4cdecb
1 parent ca8d12e
commit c4cdecb
Showing 1 changed file with 6 additions and 12 deletions.
diff --git a/contents/core/robust_ai/robust_ai.qmd b/contents/core/robust_ai/robust_ai.qmd
@@ -453,9 +453,9 @@ Think you can outsmart an AI? In this Colab, learn how to trick image classifica
 
 #### Definition and Characteristics
 
-Data poisoning is an attack where the training data is tampered with, leading to a compromised model [@biggio2012poisoning], as shown in [@fig-poisoning-example]. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model's learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.
+Data poisoning is an attack where the training data is tampered with, leading to a compromised model [@biggio2012poisoning], as shown in [@fig-dirty-label-example]. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model's learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.
 
-![NightShade's poisoning effects on Stable Diffusion. Source: [TOMÉ](https://telefonicatech.com/en/blog/attacks-on-artificial-intelligence-iii-data-poisoning)](./images/png/poisoning_example.png){#fig-poisoning-example}
+![Samples of dirty-label poison data regarding mismatched text/image pairs. Source: [Shan](https://arxiv.org/pdf/2310.13828)](./images/png/dirty_label_example.png){#fig-dirty-label-example}
 
 The process usually involves the following steps:
 
@@ -487,9 +487,6 @@ The characteristics of data poisoning include:
 
 **Disrupts the learning process and skews model behavior:** Data poisoning attacks are designed to disrupt the learning process of machine learning models and skew their behavior towards the attacker's objectives. The poisoned data is typically manipulated with specific goals, such as skewing the model's behavior towards certain classes, introducing backdoors, or degrading overall performance. These manipulations are not random but targeted to achieve the attacker's desired outcomes. By introducing label inconsistencies, where the manipulated samples have labels that do not align with their true nature, poisoning attacks can confuse the model during training and lead to biased or incorrect predictions. The disruption caused by poisoned data can have far-reaching consequences, as the compromised model may make flawed decisions or exhibit unintended behavior when deployed in real-world applications.
 
-**Impacts model performance, fairness, and trustworthiness:** Poisoned data in the training dataset can have severe implications for machine learning models' performance, fairness, and trustworthiness. Poisoned data can degrade the accuracy and performance of the trained model, leading to increased misclassifications or errors in predictions. This can have significant consequences, especially in critical applications where the model's outputs inform important decisions. Moreover, poisoning attacks can introduce biases and fairness issues, causing the model to make discriminatory or unfair decisions for certain subgroups or classes. This undermines machine learning systems' ethical and social responsibilities and can perpetuate or amplify existing biases.
-Furthermore, poisoned data erodes the trustworthiness and reliability of the entire ML system. The model's outputs become questionable and potentially harmful, leading to a loss of confidence in the system's integrity. The impact of poisoned data can propagate throughout the entire ML pipeline, affecting downstream components and decisions that rely on the compromised model. Addressing these concerns requires robust data governance, regular model auditing, and ongoing monitoring to detect and mitigate the effects of data poisoning attacks.
-
 #### Mechanisms of Data Poisoning
 
 Data poisoning attacks can be carried out through various mechanisms, exploiting different ML pipeline vulnerabilities. These mechanisms allow attackers to manipulate the training data and introduce malicious samples that can compromise the model's performance, fairness, or integrity. Understanding these mechanisms is crucial for developing effective defenses against data poisoning and ensuring the robustness of ML systems. Data poisoning mechanisms can be broadly categorized based on the attacker's approach and the stage of the ML pipeline they target. Some common mechanisms include modifying training data labels, altering feature values, injecting carefully crafted malicious samples, exploiting data collection and preprocessing vulnerabilities, manipulating data at the source, poisoning data in online learning scenarios, and collaborating with insiders to manipulate data.
@@ -528,17 +525,13 @@ Data poisoning attacks can severely affect ML systems, compromising their perfor
 
 **Biased or unfair model outcomes:** Data poisoning attacks can introduce biases or unfairness into the model's predictions. By manipulating the training data distribution or injecting samples with specific biases, attackers can cause the model to learn and perpetuate discriminatory patterns. This can lead to unfair treatment of certain groups or individuals based on sensitive attributes such as race, gender, or age. Biased models can have severe societal implications, reinforcing existing inequalities and discriminatory practices. Ensuring fairness and mitigating biases is crucial for building trustworthy and ethical ML systems.
 
-**Increased false positives or false negatives:** Data poisoning can also impact the model's ability to correctly identify positive or negative instances, leading to increased false positives or false negatives. False positives occur when the model incorrectly identifies a negative instance as positive, while false negatives happen when a positive instance is misclassified as negative. The consequences of increased false positives or false negatives can be significant depending on the application. For example, in a fraud detection system, high false positives can lead to unnecessary investigations and customer frustration, while high false negatives can allow fraudulent activities to go undetected.
-
 **Compromised system reliability and trustworthiness:** Data poisoning attacks can undermine ML systems' overall reliability and trustworthiness. When models are trained on poisoned data, their predictions become unreliable and untrustworthy. This can erode user confidence in the system and lead to a loss of trust in the decisions made by the model. In critical applications where ML systems are relied upon for decision-making, such as autonomous vehicles or medical diagnosis, compromised reliability can have severe consequences, putting lives and property at risk.
 
 Addressing the impact of data poisoning requires a proactive approach to data security, model testing, and monitoring. Organizations must implement robust measures to ensure the integrity and quality of training data, employ techniques to detect and mitigate poisoning attempts, and continuously monitor the performance and behavior of deployed models. Collaboration between ML practitioners, security experts, and domain specialists is essential to develop comprehensive strategies for preventing and responding to data poisoning attacks.
 
-##### Case Study
+##### Case Study: Protecting Art Through Data Poisoning
 
-Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online, as shown in [@fig-dirty-label-example].
-
-![Samples of dirty-label poison data regarding mismatched text/image pairs. Source: [Shan](https://arxiv.org/pdf/2310.13828)](./images/png/dirty_label_example.png){#fig-dirty-label-example}
+Interestingly enough, data poisoning attacks are not always malicious [@shan2023prompt]. Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online.
 
 While these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative AI models when incorporated into the training data. Generative models can be manipulated to generate hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers could trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars.
 
@@ -548,7 +541,8 @@ On the other hand, this tool can be used maliciously and can affect legitimate a
 
 [@fig-poisoning] demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.
 
-![Data poisoning. Source: @shan2023prompt)](images/png/image14.png){#fig-poisoning}
+![NightShade's poisoning effects on Stable Diffusion. Source: @shan2023prompt](./images/png/poisoning_example.png){#fig-poisoning}
+
 
 :::{#exr-pa .callout-caution collapse="true"}