Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
qualiaMachine committed Oct 2, 2024
2 parents 31741cb + 1ac1c40 commit 89d6b6f
Show file tree
Hide file tree
Showing 3 changed files with 101 additions and 4 deletions.
81 changes: 78 additions & 3 deletions episodes/0-introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,93 @@ exercises: 1

:::::::::::::::::::::::::::::::::::::: questions

- TODO
- What do we mean by "Trustworthy AI"?
- How is this workshop structured, and what content does it cover?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- TODO
- Define trustworthy AI and its various components.
- Be prepared to dive into the rest of the workshop.

::::::::::::::::::::::::::::::::::::::::::::::::

## Introduction
## What is trustworthy AI?

Artificial intelligence (AI) and machine learning (ML) are being used widely to improve upon human capabilities (either in speed/convenience/cost or accuracy) in a variety of domains: medicine, social media, news, marketing, policing, and more.
It is important that the decisions made by AI/ML models uphold values that we, as a society, care about.

Trustworthy AI is a large and growing sub-field of AI that aims to ensure that AI models are trained and deployed in ways that are ethical and responsible.

## The AI Bill of Rights
In October 2022, the Biden administration released a [Blueprint for an AI Bill of Rights](https://www.whitehouse.gov/ostp/ai-bill-of-rights/), a non-binding document that outlines how automated systems and AI should behave in order to protect Americans' rights.

The blueprint is centered around five principles:

* Safe and Effective Systems -- AI systems should work as expected, and should not cause harm
* Algorithmic Discrimination Protections -- AI systems should not discriminate or produce inequitable outcomes
* Data Privacy -- data collection should be limited to what is necessary for the system functionality, and you should have control over how and if your data is used
* Notice and Explanation -- it should be transparent when an AI system is being used, and there should be an explanation of how particular decisions are reached
* Human Alternatives, Consideration, and Fallback -- you should be able to opt out of engaging with AI systems, and a human should be available to remedy any issues



## This workshop

This workshop centers around four principles that are important to trustworthy AI: *scientific validity*, *fairness*, *transparency*, and *accountability*. We summarize each principle here.

### Scientific validity
In order to be trustworthy, a model and its predictions need to be founded on good science. A model is not going to perform well if is not trained on the correct data, if it fits the underlying data poorly, or if it cannot recognize its own limitations. Scientific validity is closely linked to the AI Bill of Rights principle of "safe and effective systems".

In this workshop, we cover the following topics relating to scientific validity:

* Defining the problem (Episode 2)
* Training and evaluating a model, especially selecting an accuracy metric, avoiding over/underfitting, and preventing data leakage (Episode 3)
* Estimating model uncertainty (Episode 9)
* Out-of-distribution detection (Episodes 10-12)

### Fairness
As stated in the AI Bill of Rights, AI systems should not be discriminatory or produce inequitable outcomes. In **Episode 3** we discuss various definitions of fairness in the context of AI, and overview how model developers try to make their models more fair.

### Transparency
Transparency -- i.e., insight into *how* a model makes its decisions -- is important for trustworthy AI, as we want models that make the right decisions *for the right reasons*. Transparency can be achieved via *explanations* or by using inherently *interpretable* models. We discuss transparency in the follow episodes:

* Interpretability vs explainability (Episode 4)
* Overview of explainability methods (Episode 5)
* Example code for implementing two explainability methods, linear probes and Grad-CAM (Episodes 6-8)

### Accountability
Accountability is important for trustworthy AI because, inevitably, models will make mistakes or cause harm. Accountability is multi-faceted and largely non-technical, which is not to say unimportant, but just that it falls partially out of scope of this technical workshop.

We discuss two facets of accountability, model documentation and model sharing, in Episode 13.

For those who are interested, we recommend these papers to learn more about different aspects of AI accountability:

1. [Accountability of AI Under the Law: The Role of Explanation](https://arxiv.org/pdf/1711.01134) by Finale Doshi-Velez and colleagues. This paper discusses how explanations can be used in a legal context to determine accountability for harms caused by AI.
2. [Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing](https://dl.acm.org/doi/abs/10.1145/3351095.3372873) by Deborah Raji and colleagues proposes a framework for auditing algorithms. A key contribution of this paper is defining an auditing procedure over the whole model development and implementation pipeline, rather than narrowly focusing on the modeling stages.
3. [AI auditing: The Broken Bus on the Road to AI Accountability](https://ieeexplore.ieee.org/abstract/document/10516659) by Abeba Birhane and colleagues challenges previous work on AI accountability, arguing that most existing AI auditing systems are not effective. They propose necessary traits for effective AI audits, based on a review of existing practices.

### Topics we do not cover
Trustworthy ML is a large, and growing, area of study. As of September 24, 2024, **there are about 18,000 articles on Google Scholar that mention Trustworthy AI and were published in the first 9 months of 2024**.

Many of the topics we do not cover are sub-topics of the broad categories -- e.g., fairness, explainability, or OOD detection -- of the workshop and are important for specific use cases, but less relevant for a general audience. But, there are a few major areas of research that we don't have time to touch on. We summarize a few of them here:

#### Data Privacy
In the US's Blueprint for an AI Bill of Rights, one principle is data privacy, meaning that people should be aware how their data is being used, companies should not collect more data than they need, and people should be able to consent and/or opt out of data collection and usage.

A lack of data privacy poses several risks: first, whenever data is collected, it can be subject to data breaches. This risk is unavoidable, but collecting only the data that is truly necessary mitigates this risk, as does implementing safeguards to how data is stored and and accessed. Second, when data is used to train ML models, that data can sometimes be identifying by attackers. For instance, large language models like ChatGPT are known to release private data that was part of the training corpus when prompted in clever ways (see this [blog post](https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html) for more information).
Membership inference attacks, where an attacker determines whether a particular individual's data was in the training corpus, are another vulnerability. These attacks may reveal things about a person directly (e.g., if the training dataset consisted of only people with a particular medical condition), or can be used to setup downstream attacks to gain more information.

There are several areas of active research relating to data privacy.

* [Differential privacy](https://link.springer.com/chapter/10.1007/978-3-540-79228-4_1) is a statistical technique that protects the privacy of individual data points. Models can be trained using differential privacy to provably prevent future attacks, but this currently comes at a high cost to accuracy.
* [Federated learning](https://ieeexplore.ieee.org/abstract/document/9599369) trains models using decentralized data from a variety of sources. Since the data is not shared centrally, there is less risk of data breaches or unauthorized data usage.

#### Generative AI risks
We touch on fairness issues with generative AI in Episode 3. But generative AI poses other risks, too, many of which are just starting to be researched and understood given how new widely-available generative AI is. We discuss one such risk, disinformation, briefly here:

* Disinformation: A major risk of generative AI is the creation of misleading or fake and malicious content, often known as [deep fakes](https://timreview.ca/article/1282). Deep fakes pose risks to individuals (e.g., creating content that harms an individual's reputation) and society (e.g., fake news articles or pictures that look real).

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: instructor

Expand Down
21 changes: 21 additions & 0 deletions episodes/3-model-eval-and-fairness.md
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,7 @@ from collections import defaultdict
This notebook is adapted from AIF360's [Medical Expenditure Tutorial](https://github.com/Trusted-AI/AIF360/blob/master/examples/tutorial_medical_expenditure.ipynb).

The tutorial uses data from the [Medical Expenditure Panel Survey](https://meps.ahrq.gov/mepsweb/). We include a short description of the data below. For more details, especially on the preprocessing, please see the AIF360 tutorial.

## Scenario and data

The goal is to develop a healthcare utilization scoring model -- i.e., to predict which patients will have the highest utilization of healthcare resources.
Expand All @@ -297,6 +298,7 @@ The original dataset contains information about various types of medical visits;
The sensitive feature (that we will base fairness scores on) is defined as race. Other predictors include demographics, health assessment data, past diagnoses, and physical/mental limitations.

The data is divided into years (we follow the lead of AIF360's tutorial and use 2015), and further divided into Panels. We use Panel 19 (the first half of 2015).

### Loading the data

First, the data needs to be moved into the correct location for the AIF360 library to find it. If you haven't yet, run `setup.sh` to complete that step. (Then, restart the kernel and re-load the packages at the top of this file.)
Expand Down Expand Up @@ -353,6 +355,7 @@ describe(dataset_orig_panel19_train, dataset_orig_panel19_val, dataset_orig_pane
Next, we will look at whether the dataset contains bias; i.e., does the outcome 'UTILIZATION' take on a positive value more frequently for one racial group than another?

The disparate impact score will be between 0 and 1, where 1 indicates *no bias*.
**TODO**: Unpack/introduce BinaryLabelDatasetMetric and MetricTextExplainer. Also talk more about the disparate impact score and how to interpret it.
```python
metric_orig_panel19_train = BinaryLabelDatasetMetric(
dataset_orig_panel19_train,
Expand All @@ -361,11 +364,16 @@ metric_orig_panel19_train = BinaryLabelDatasetMetric(
explainer_orig_panel19_train = MetricTextExplainer(metric_orig_panel19_train)

print(explainer_orig_panel19_train.disparate_impact())
```

```output
Disparate impact (probability of favorable outcome for unprivileged instances / probability of favorable outcome for privileged instances): 0.5066881212510504
```

We see that the disparate impact is about 0.48, which means the privileged group has the favorable outcome at about 2x the rate as the unprivileged group does.

(In this case, the "favorable" outcome is label=1, i.e., high utilization)

## Train a model

We will train a logistic regression classifier.
Expand All @@ -378,6 +386,10 @@ fit_params = {'logisticregression__sample_weight': dataset.instance_weights}
lr_orig_panel19 = model.fit(dataset.features, dataset.labels.ravel(), **fit_params)
```
### Validate the model
**TODO**: For the "Validate the model" section, let's preface the code with a a short summary of what we are doing here.
* clarify that we trying to balance accuracy/generalizeability with various fairness metrics (whichever is most relevant to us)
* maybe talk about a couple of the metrics in detail

Recall that a logistic regression model can output probabilities (i.e., `model.predict(dataset).scores`) and we can determine our own threshold for predicting class 0 or 1.

The following function, `test`, computes performance on the logistic regression model based on a variety of thresholds, as indicated by `thresh_arr`, an array of threshold values. We will continue to focus on disparate impact, but all other metrics are described in the [AIF360 documentation](https://aif360.readthedocs.io/en/stable/modules/generated/aif360.metrics.ClassificationMetric.html#aif360.metrics.ClassificationMetric).
Expand Down Expand Up @@ -459,9 +471,16 @@ plot(thresh_arr, 'Classification Thresholds',
val_metrics['bal_acc'], 'Balanced Accuracy',
disp_imp_err, '1 - DI')
```
![Balancing Fairness and Accuracy)](https://raw.githubusercontent.com/carpentries-incubator/fair-explainable-ml/main/images/model-eval-and-fairness_balancing-fairness-accuracy.png)

Check warning on line 474 in episodes/3-model-eval-and-fairness.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[image missing alt-text]: https://raw.githubusercontent.com/carpentries-incubator/fair-explainable-ml/main/images/model-eval-and-fairness_balancing-fairness-accuracy.png

**TODO**: After genererating this plot, let's add a short exercise that tests their comprehension of the plot.

If you like, you can plot other metrics, e.g., average odds difference.

In the next cell, we write a function to print out a variety of other metrics. Since we look at 1 - disparate impact, **all of these metrics have a value of 0 if they are perfectly fair**. Again, you can learn more details about the various metrics in the [AIF360 documentation](https://aif360.readthedocs.io/en/stable/modules/generated/aif360.metrics.ClassificationMetric.html#aif360.metrics.ClassificationMetric).

**TODO**: For this next code section, I'm a little confused by the 1-disparate impact equation. Why can't we just take the argmax of all of the metrics individually? Further elaboration may be needed. Also try to talk about why we should bother checking multiple metrics, and why these ones.

```python
def describe_metrics(metrics, thresh_arr):
best_ind = np.argmax(metrics['bal_acc'])
Expand All @@ -485,6 +504,8 @@ lr_metrics = test(dataset=dataset_orig_panel19_test,
describe_metrics(lr_metrics, [thresh_arr[lr_orig_best_ind]])
```
## Mitigate bias with in-processing
**Note**: CME will review this section and add TODO items by 10/2. Stay tuned.

We will use reweighting as an in-processing step to try to increase fairness. AIF360 has a function that performs reweighting that we will use. If you're interested, you can look at details about how it works in [the documentation](https://aif360.readthedocs.io/en/latest/modules/generated/aif360.algorithms.preprocessing.Reweighing.html).

If you look at the documentation, you will see that AIF360 classifies reweighting as a preprocessing, not an in-processing intervention. Technically, AIF360's implementation modifies the dataset, not the learning algorithm so it is pre-processing. But, it is functionally equivalent to modifying the learning algorithm's loss function, so we follow the convention of the fair ML field and call it in-processing.
Expand Down
3 changes: 2 additions & 1 deletion learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ Conda should already be available in your system once you installed Anaconda suc
2. Create the Conda Environment: To create a conda environment called `trustworthy_ML` with the required packages, open a terminal (Mac/Linux) or Anaconda prompt (Windows) and type the below command. This command creates a new conda environment named `trustworthy_ML` and installs the necessary packages from the `conda-forge` and `pytorch` channels. When prompted to Proceed ([y]/n) during environment setup, press y. It may take around 10-20 minutes to complete the full environment setup. Please reach out to the workshop organizers sooner rather than later to fix setup issues prior to the workshop.

```sh
conda create --name trustworthy_ML python=3.9 jupyter scikit-learn pandas matplotlib keras tensorflow pytorch torchvision torchaudio umap-learn aif360 -c conda-forge
conda create --name trustworthy_ML python=3.9 jupyter scikit-learn pandas matplotlib keras tensorflow pytorch torchvision umap-learn aif360 -c conda-forge
```

3. Activate the Conda Environment: After creating the environment, activate it using the following command.
Expand All @@ -124,6 +124,7 @@ Conda should already be available in your system once you installed Anaconda suc
4. Install the `pytorch-ood`, `aif360[Reductions]`, and `aif360[inFairness]` using pip. Make sure to do this AFTER activating the environment.

```sh
pip install torchaudio
pip install pytorch-ood
pip install aif360[Reductions]
pip install aif360[inFairness]
Expand Down

0 comments on commit 89d6b6f

Please sign in to comment.