Skip to content

Latest commit

 

History

History
80 lines (56 loc) · 3.32 KB

README.md

File metadata and controls

80 lines (56 loc) · 3.32 KB

cfact-label-inference

This is a repository containing code and data for the paper:

N. Corvelo Benz and M. Gomez-Rodriguez. Counterfactual Inference of Second Opinions. UAI, 2022.

The paper is available here.

Install and download prerequisites

Install required packages with pip install -r requirements.txt

Code Structure

Set Invariant Structural Causal Model (SI-SCM)

  • siscm.py is the SI-SCM class, e.g., implements:
    • class constructor using marginal distribution functions
    • fit finds expert partition into groups of mutually similar experts
    • predict, cf_predict functions to (counterfactually) predict experts' labels
  • PCS_graph.py graph class encoding (pairwise) counterfactual stability, implements:
    • checking for violations of counterfactual stability given data, i.e., checking for dissimilarity between experts
    • greedy clique partitioning algorithm

Preprocess CIFAR-10(h)

  • get_features_vgg19.py generates features for CIFAR-10 test images using VGG19
  • join_feat_cifar10h_labels.py generates and saves a single dataframe with image features from VGG19 and corresponding labels from CIFAR-10h
  • preprocess_data.py resampling of experts and data, train-test-split of data

Experiment Code

  • synthetic_experiment.py implements the synthetic experiment
  • real_experiment.py implements experiment on real data
  • ./data contains preprocessed real datasets, train and test data
  • ./features contains features generated by VGG19 for CIFAR-10 test images

Experiment Evaluation

  • ./results_synthetic contains result files from experiment on synthetic data
  • ./results_real contains result files from experiment on real data
  • evaluation_synthetic.py generates plots for given experiment results on synthetic data
  • evaluation_real.py generates plots for given experiment results on real data
  • helper.py contains helper functions for plotting

Running the Experiment on Synthetic Data

  • Run synthetic_experiment.py
  • Run evaluation_synthetic.py to generate the plots from the evaluation results
  • All experimental results and plots will be stored in directory ./results_synthetic

Running the Experiment on Real Data

Prerequisite

Download CIFAR-10H dataset into the directory ./data from https://github.com/jcpeterson/cifar-10h

Generating the preprocessed data used

  • Run get_features_vgg19.py to generate the features of the data with VGG19
  • Run join_feat_cifar10h_labels.py to join the features and human label predictions of data set CIFAR-10H in one dataframe
  • Run preprocessed_data.py to resample the data and experts to obtain a higher disagreement ratio
  • The training and test set are stored in directory ./data, features and labels are stored in separate matrices

Running the experiment

  • Run real_experiment.py
  • Run evaluation_real.py to generate the plots from the evaluation results
  • All experimental results and plots will be stored in directory ./results_real

Citation

If you use parts of the code in this repository for your own research, please consider citing:

@inproceedings{benz2022counterfactual,
        title={Counterfactual Inference of Second Opinions},
        author={Corvelo Benz, Nina and Gomez-Rodriguez, Manuel},
        booktitle={UAI},
        year={2022}
}