Skip to content

Commit

Permalink
Add research for tad reporting standards (#62)
Browse files Browse the repository at this point in the history
  • Loading branch information
berrydenhartog authored Mar 13, 2024
2 parents afee2b0 + cea90ce commit 8e3f54a
Showing 1 changed file with 70 additions and 0 deletions.
70 changes: 70 additions & 0 deletions docs/Projects/TAD/reporting_standards.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# TAD Reporting standards

This document assesses standards that standardize the way algorithm assessments can be captured.

## Background

There are many algorithm assessments (e.g. IAMA, HUIDERIA, etc.), technical tests on performance (e.g. Accuracy, TP, FP,
F1, etc), fairness and bias of algorithms (e.g. SHAP) and reporting formats available. The goal is to have a way of
standardizing the way these different assessments and tests can be captured.

## Available standards

### Model Cards

The most interesting existing capturing methods seem to be all based on
[Model Cards for Model Reporting](https://arxiv.org/pdf/1810.03993.pdf), which are:
>"Short documents accompanying trained machine learning models that provide benchmarked evaluation in a variety of
>conditions, such as across different cultural, demographic, or phenotypic groups (e.g., race, geographic location,
>sex, Fitzpatrick skin type) and intersectional groups (e.g., age and race, or sex and Fitzpatrick skin type) that are
>relevant to the intended application domains. Model cards also disclose the context in which models are intended to be
>used, details of the performance evaluation procedures, and other relevant information", proposed by Google. Note that
>"The proposed set of sections" in the Model Cards paper "are intended to provide relevant details to consider, but are
>not intended to be complete or exhaustive, and may be tailored depending on the model, context, and stakeholders."
Many companies implement their own version of Model Cards, for example
[Meta System Cards](https://ai.meta.com/blog/system-cards-a-new-resource-for-understanding-how-ai-systems-work/) and the
tools mentioned in the next section.

#### Automatic model card generation

There exist tools to (semi)-automatically generate models cards:

1. [Model Card Generator by US Sensus Bureau](https://bias.xd.gov/resources/model-card-generator/). Basic UI to create
model cards and export to markdown, also hase a command line tool.
2. [Model Card Toolkit by Google](https://blog.research.google/2020/07/introducing-model-card-toolkit-for.html).
Automation only comes from integration with [TensorFlowExtended](https://www.tensorflow.org/tfx) and
[ML Metadata](https://www.tensorflow.org/tfx/guide/mlmd).
3. [VerifyML](https://github.com/cylynx/verifyml?tab=readme-ov-file). Based on the Google toolkit, but is extended to
include specific tests on fairness and bias. Technical tests can be added by users and model card schema (in protobuf)
can be extended by users.
4. [Experimental Model Cards Tool by Hugging Face](https://huggingface.co/spaces/huggingface/Model_Cards_Writing_Tool).
This is the implementation of the Google paper by Hugging Face and provides information on the models available on their
platform. The writing tools guides users through their model card and allows for up- and downloading from and to
markdown.

### Other standards

[A landscape analysis of ML documentation tools](https://huggingface.co/docs/hub/en/model-card-landscape-analysis)
has been performed by Hugging Face and provides a good overview of the current landscape.

Another interesting standard is the Algorithmic Transparency Recording Standard of the United Kingdom Goverment,
which can be found [here](https://www.gov.uk/government/publications/algorithmic-transparency-template).

## Proposal

We need a standard that captures algorithmic assessments and technical tests on model and datasets. The idea of
[model cards](https://arxiv.org/pdf/1810.03993.pdf) can serve as a guiding theoretical principle on how to implement
such a standard. More specifically, we can draw inspiration from the existing model card schema's and implementations of
[VerifyML](https://github.com/cylynx/verifyml?tab=readme-ov-file) and
[Hugging Face](https://huggingface.co/spaces/huggingface/Model_Cards_Writing_Tool). We note the following:

1. None of these two standards capture algorithmic assessments.
2. Only VerifyML has a specific format to capture some technical tests.

Hence in any case we need to extend one of these standards. We propose to:

1. Assess and compare these two standards
2. Chose the most appropriate one to extend
3. Extend (and possibly adjust) this standard to our own standard (in the form of a basic schema) that allows for
capturing algorithmic assessments and standardizes the way technical tests can be captured.

0 comments on commit 8e3f54a

Please sign in to comment.