Skip to content

Commit

Permalink
Merge pull request #10 from qingfengtommy/main
Browse files Browse the repository at this point in the history
Update conda env setup
  • Loading branch information
qualiaMachine authored Nov 13, 2024
2 parents 2602989 + ef9d5ec commit d4f586c
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 57 deletions.
81 changes: 26 additions & 55 deletions episodes/7a-OOD-detection-output-based.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def prep_ID_OOD_datasests(ID_class_labels, OOD_class_labels):
test_labels = test_labels[test_filter]
print(f'test_data.shape={test_data.shape}')

return ood_data, train_data, test_data, train_labels, test_labels
return train_data, test_data, ood_data, train_labels, test_labels, ood_labels


def plot_data_sample(train_data, ood_data):
Expand Down Expand Up @@ -151,11 +151,12 @@ def plot_data_sample(train_data, ood_data):

```
```python
ood_data, train_data, test_data, train_labels, test_labels = prep_ID_OOD_datasests([0,1], [5])
train_data, test_data, ood_data, train_labels, test_labels, ood_labels = prep_ID_OOD_datasests([0,1], [5])
fig = plot_data_sample(train_data, ood_data)
fig.savefig('../images/OOD-detection_image-data-preview.png', dpi=300, bbox_inches='tight')
plt.show()


```
![Preview of image dataset](https://raw.githubusercontent.com/carpentries-incubator/fair-explainable-ml/main/images/OOD-detection_image-data-preview.png)

Check warning on line 161 in episodes/7a-OOD-detection-output-based.md

View workflow job for this annotation

GitHub Actions / Build Full Site

[image missing alt-text]: https://raw.githubusercontent.com/carpentries-incubator/fair-explainable-ml/main/images/OOD-detection_image-data-preview.png

Expand Down Expand Up @@ -532,7 +533,7 @@ In this example, we will train a CNN model on the FashionMNIST dataset. We will

We'll start by fresh by loading our data again. This time, let's treat all remaining classes in the MNIST fashion dataset as OOD. This should yield a more robust model that is more reliable when presented with all kinds of data.
```python
ood_data, train_data, test_data = prep_ID_OOD_datasests([0,1], list(range(2,10))) # use remaining 8 classes in dataset as OOD
train_data, test_data, ood_data, train_labels, test_labels, ood_labels = prep_ID_OOD_datasests([0,1], list(range(2,10))) # use remaining 8 classes in dataset as OOD
fig = plot_data_sample(train_data, ood_data)
fig.savefig('../images/OOD-detection_image-data-preview.png', dpi=300, bbox_inches='tight')
plt.show()
Expand Down Expand Up @@ -571,6 +572,7 @@ if plot_umap:
umap_ood = umap_results[len(train_data_flat):]

```
The warning message indicates that UMAP has overridden the n_jobs parameter to 1 due to the random_state being set. This behavior ensures reproducibility by using a single job. If you want to avoid the warning and still use parallelism, you can remove the random_state parameter. However, removing random_state will mean that the results might not be reproducible.
```python
if plot_umap:
umap_alpha = .02
Expand Down Expand Up @@ -655,7 +657,6 @@ def train_model(model, train_loader, criterion, optimizer, epochs=5):
train_model(model, train_loader, criterion, optimizer)

```
The warning message indicates that UMAP has overridden the n_jobs parameter to 1 due to the random_state being set. This behavior ensures reproducibility by using a single job. If you want to avoid the warning and still use parallelism, you can remove the random_state parameter. However, removing random_state will mean that the results might not be reproducible.
```python
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

Expand Down Expand Up @@ -691,7 +692,7 @@ plot_confusion_matrix(test_labels, test_predictions, "Confusion Matrix for Test

# Evaluate on OOD data
ood_labels, ood_predictions = evaluate_model(model, ood_loader, device)
plot_confusion_matrix(ood_labels, ood_predictions, "Confusion Matrix for Test Data")
plot_confusion_matrix(ood_labels, ood_predictions, "Confusion Matrix for OOD Data")

```
```python
Expand Down Expand Up @@ -840,6 +841,7 @@ plt.show()

```
```python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import precision_recall_fscore_support, accuracy_score, confusion_matrix, ConfusionMatrixDisplay
Expand Down Expand Up @@ -877,16 +879,19 @@ def evaluate_ood_detection(id_scores, ood_scores, id_true_labels, id_predictions
f1_scores = []

# True labels for OOD data (since they are not part of the original labels)
ood_true_labels = np.full(len(ood_scores), -1)
if score_type == "energy":
ood_true_labels = np.full(len(ood_scores), -1)
else:
ood_true_labels = np.full(len(ood_scores[:,0]), -1)

for threshold in thresholds:
# Classify OOD examples based on scores
if score_type == 'energy':
ood_classifications = np.where(ood_scores >= threshold, -1, ood_predictions)
id_classifications = np.where(id_scores >= threshold, -1, id_predictions)
elif score_type == 'softmax':
ood_classifications = np.where(ood_scores <= threshold, -1, ood_predictions)
id_classifications = np.where(id_scores <= threshold, -1, id_predictions)
ood_classifications = np.where(ood_scores[:,0] <= threshold, -1, ood_predictions)
id_classifications = np.where(id_scores[:,0] <= threshold, -1, id_predictions)
else:
raise ValueError("Invalid score_type. Use 'energy' or 'softmax'.")

Expand Down Expand Up @@ -935,74 +940,40 @@ def evaluate_ood_detection(id_scores, ood_scores, id_true_labels, id_predictions
plt.show()

# plot confusion matrix

# Threshold value for the energy score
upper_threshold = best_f1_threshold # Using the best F1 threshold from the previous calculation

# Classifying OOD examples based on energy scores
ood_classifications = np.where(ood_energy_scores >= upper_threshold, -1, # classified as OOD
if score_type == 'energy':
# Classifying OOD examples based on energy scores
ood_classifications = np.where(ood_energy_scores >= upper_threshold, -1, # classified as OOD
np.where(ood_energy_scores < upper_threshold, 0, -1)) # classified as ID

# Classifying ID examples based on energy scores
id_classifications = np.where(id_energy_scores >= upper_threshold, -1, # classified as OOD
# Classifying ID examples based on energy scores
id_classifications = np.where(id_energy_scores >= upper_threshold, -1, # classified as OOD
np.where(id_energy_scores < upper_threshold, id_true_labels, -1)) # classified as ID
elif score_type == 'softmax':
# Classifying OOD examples based on softmax scores
ood_classifications = softmax_thresh_classifications(ood_scores, upper_threshold)

# Classifying ID examples based on softmax scores
id_classifications = softmax_thresh_classifications(id_scores, upper_threshold)
# Combine OOD and ID classifications and true labels
all_predictions = np.concatenate([ood_classifications, id_classifications])
all_true_labels = np.concatenate([ood_true_labels, id_true_labels])

# Confusion matrix
cm = confusion_matrix(all_true_labels, all_predictions, labels=[0, 1, -1])

# Plotting the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Shirt", "Pants", "OOD"])
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix for OOD and ID Classification (Energy-Based)')
plt.title(f'Confusion Matrix for OOD and ID Classification ({score_type.capitalize()}-Based)')
plt.show()


return best_f1_threshold, best_precision_threshold, best_recall_threshold

# Example usage
# Assuming id_energy_scores, ood_energy_scores, id_true_labels, and test_labels are already defined
best_f1_threshold, best_precision_threshold, best_recall_threshold = evaluate_ood_detection(id_energy_scores, ood_energy_scores, id_true_labels, test_labels, score_type='energy')
best_f1_threshold, best_precision_threshold, best_recall_threshold = evaluate_ood_detection(id_softmax_scores[:,0], ood_softmax_scores[:,0], id_true_labels, test_labels, score_type='softmax')

```
```python
ood_softmax_scores[:,0].shape
```
```python

```
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay

# Threshold value for the energy score
upper_threshold = best_f1_threshold # Using the best F1 threshold from the previous calculation

# Classifying OOD examples based on energy scores
ood_classifications = np.where(ood_energy_scores >= upper_threshold, -1, # classified as OOD
np.where(ood_energy_scores < upper_threshold, 0, -1)) # classified as ID

# Classifying ID examples based on energy scores
id_classifications = np.where(id_energy_scores >= upper_threshold, -1, # classified as OOD
np.where(id_energy_scores < upper_threshold, id_true_labels, -1)) # classified as ID

# Combine OOD and ID classifications and true labels
all_predictions = np.concatenate([ood_classifications, id_classifications])
all_true_labels = np.concatenate([ood_true_labels, id_true_labels])

# Confusion matrix
cm = confusion_matrix(all_true_labels, all_predictions, labels=[0, 1, -1])

# Plotting the confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=["Shirt", "Pants", "OOD"])
disp.plot(cmap=plt.cm.Blues)
plt.title('Confusion Matrix for OOD and ID Classification (Energy-Based)')
plt.show()
best_f1_threshold, best_precision_threshold, best_recall_threshold = evaluate_ood_detection(id_energy_scores, ood_energy_scores, test_labels, test_predictions, ood_predictions, score_type='energy')
best_f1_threshold, best_precision_threshold, best_recall_threshold = evaluate_ood_detection(id_softmax_scores, ood_softmax_scores, test_labels, test_predictions, ood_predictions, score_type='softmax')

```
# Limitations of our approach thus far
Expand Down
Binary file modified images/OOD-detection_image-data-preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions learners/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,9 @@ Conda should already be available in your system once you installed Anaconda suc
2. Create the Conda Environment: To create a conda environment called `trustworthy_ML` with the required packages, open a terminal (Mac/Linux) or Anaconda prompt (Windows) and type the below command. This command creates a new conda environment named `trustworthy_ML` and installs the necessary packages from the `conda-forge` and `pytorch` channels. When prompted to Proceed ([y]/n) during environment setup, press y. It may take around 10-20 minutes to complete the full environment setup. Please reach out to the workshop organizers sooner rather than later to fix setup issues prior to the workshop.

```sh
conda create --name trustworthy_ML python=3.9 jupyter scikit-learn pandas matplotlib keras tensorflow pytorch torchvision umap-learn aif360 -c conda-forge
conda create --name trustworthy_ML python=3.9 pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
```
To install other pytorch version based on your cuda version / more detailed instructions, you can checkout https://pytorch.org/get-started/locally/.

3. Activate the Conda Environment: After creating the environment, activate it using the following command.

Expand All @@ -103,7 +104,7 @@ Conda should already be available in your system once you installed Anaconda suc
4. Install `pytorch-ood`, `fairlearn`, `aif360[Reductions]`, and `aif360[inFairness]` using pip. Make sure to do this AFTER activating the environment.

```sh
pip install torchaudio
conda install jupyter scikit-learn pandas matplotlib keras tensorflow umap-learn aif360 -c conda-forge
pip install pytorch-ood
pip install fairlearn
pip install aif360[Reductions]
Expand Down

0 comments on commit d4f586c

Please sign in to comment.