From 237915f4cb165528ab95cc1989ef380905d87d1d Mon Sep 17 00:00:00 2001 From: Alex Loosley Date: Thu, 31 Aug 2023 13:03:38 +0200 Subject: [PATCH 1/3] Zalando data and model card templates --- templates/Data Card - Zalando Template.md | 2504 ++++++++++++++++++++ templates/Model Card - Zalando Template.md | 207 ++ 2 files changed, 2711 insertions(+) create mode 100644 templates/Data Card - Zalando Template.md create mode 100644 templates/Model Card - Zalando Template.md diff --git a/templates/Data Card - Zalando Template.md b/templates/Data Card - Zalando Template.md new file mode 100644 index 0000000..b361a46 --- /dev/null +++ b/templates/Data Card - Zalando Template.md @@ -0,0 +1,2504 @@ + + +

Dataset Name (Acronym)

+ +Data card contributions and feedback are welcome! + +## Template Version + + +[0.2.1]() + +## Overview + +#### Last Update +DD/MM/YYYY +(See [version section](#dataset-version-and-maintenance) for details) + +#### Author(s) + + +- **Name, Team:** (Owner / Contributor / Manager / etc.) +- **Name, Team:** (Owner / Contributor / Manager / etc.) +- ... + +See the [Authorship section](#authorship) for full list of authors (if more than 5) + +#### Data Summary + + +Write a short summary describing your dataset (limit +200 words). Include information about the content +and topic of the data, sources and motivations for the +dataset, benefits and the problems or use cases it is +suitable for. For readers that only take 10 seconds to look +at this data card, adding one good overview image might also +make the difference between this data being discovered +and going unnoticed. + +#### Status + + +**Status Date:** DD/MM/YYYY + +**Under Preparation** - The dataset is still under active curation +and is not yet ready for use due to active "dev" updates. + +**Regularly Updated** - New versions of the dataset +have been or will continue to be +made available. + +**Actively Maintained** - No new versions will be made +available, but this dataset will +be actively maintained, +including but not limited to +updates to the data. + +**Limited Maintenance** - The data will not be updated, +but any technical issues will be +addressed. + +**Deprecated** - This dataset is obsolete or is +no longer being maintained. + +#### Relevant Links + + +* [Dataset Link] - i.e. link to S3 bucket, SQL or Big Query tables, etc. +* [Initiative Link] - i.e. link to initiative through which the curation of this dataset was carried out +* [Code Repository Links] - i.e. link to GitHub, GitLab or other type of code repository +* [Data Quality Monitoring Link(s)] - i.e. link to Data Quality Framework dashboard +* [Labeling Job Link(s)] - i.e. link to SageMaker Ground Truth job +* [Governance Processes Lin(s)] - i.e. link to review process for obtaining permission to access dataset for use on a specific use case + +## Authorship +### Author(s) + + + +- **Name, Team:** (Owner / Contributor / Manager / etc.) +- **Name, Team:** (Owner / Contributor / Manager / etc.) + +### Dataset Owners +#### Contact Detail(s) + + +- **Main point of contact:** Provide the name and email address of the main point of contact +- **Team:** Provide the team name and email address for the dataset owner team +- **Affiliation:** Provide the affiliation including team and company or university of the dataset owners (e.g. Size and Fit, Zalando SE) +- **Team Website:** Provide a link to the website for the dataset owner team + +### Funding Sources +<-- info: Use these sections if relevant. + +For example, this may be relevant if it is funded by an +NGO and has specific requirements and restriction for use +of the data. Providing information as such allows us to +avoid inappropriate downstream use of the data. --> + +#### Institution(s) + + +- Name of Institution +- Name of Institution +- Name of Institution + +#### Funding or Grant Summary(ies) + + + +*For example, Institution 1 and institution 2 jointly funded this dataset as a +part of the XYZ data program, funded by XYZ grant awarded by institution 3 for +the years YYYY-YYYY.* + +Summarize here. Link to documents if available. + +**Additional Notes:** Add here + +### Publishers + + +#### Publishing Organization(s) + + +Organization Name + +#### Industry Type(s) + + +- Corporate - Tech +- Corporate - Non-Tech (please specify) +- Academic - Tech +- Academic - Non-Tech (please specify) +- Not-for-profit - Tech +- Not-for-profit - Non-Tech (please specify) +- Individual (please specify) +- Others (please specify) + +#### Contact Detail(s) + + +- **Publishing POC:** Provide the name for a POC for this dataset's publishers +- **Affiliation:** Provide the POC's institutional affiliation +- **Contact:** Provide the POC's contact details +- **Mailing List:** Provide a mailing list if available +- **Website:** Provide a website for the dataset if available + +## Dataset Overview +#### Primary Data Modality + + +- Image Data +- Text Data +- Tabular Data +- Audio Data +- Video Data +- Time Series +- Graph Data +- Geospatial Data +- Multimodal (please specify) +- Unknown +- Others (please specify) + +#### Data Subject(s) + + +- Sensitive Data about people +- Non-Sensitive Data about people +- Data about natural phenomena +- Data about places and objects +- Synthetically generated data +- Data about systems or products and their behaviors +- Unknown +- Others (Please specify) + +#### Dataset Description + + + +#### Data Point Description + + + +Add description here. + +See the [Examples of Data Points](#example-of-data-points) section for examples. + +### Dataset Statistics + + +| Category | Data | +|-----------------------------------------|-----------| +| Size of Dataset | 123456 MB | +| Number of Data Points | 123456 | +| Number of Data Points with Missing Data | 12324152 | +| Number of Labels | 123456789 | +| Algorithmic Labels | 123456789 | +| Protected Attribute Labels | 123456789 | +| Other Characteristics | 123456 | + +**Above:** Provide a caption for the above table of visualization. + +**Additional Notes:** Add here. + +### Tables and Fields + + + + +#### TABLE: name_of_table_1 + +- **Primary Key:** ... +- **Description:** ... + +| Field Name | Type | Description | +|------------|------|-------------| +| Field Name | Type | Description | +| Field Name | Type | Description | +| Field Name | Type | Description | + +**Above:** Provide a caption for the above table or visualization if used. + +**Additional Notes:** Add here including reference to the +[sensitive and protected attributes](#sensitive-and-protected-attributes) section, +when relevant. + +| Statistic | Field Name | Field Name | Field Name | Field Name | Field Name | Field Name | +|-----------|------------|------------|------------|------------|------------|------------| +| count | | | | | | | +| mean | | | | | | | +| std | | | | | | | +| min | | | | | | | +| 25% | | | | | | | +| 50% | | | | | | | +| 75% | | | | | | | +| max | | | | | | | +| mode | | | | | | | + +**Additional Notes:** Add here. + +--- + +#### TABLE: name_of_table_2 + +- **Primary Key:** ... +- **Description:** ... + +| Field Name | Type | Description | +|------------|------|-------------| +| Field Name | Type | Description | +| Field Name | Type | Description | +| Field Name | Type | Description | + +**Above:** Provide a caption for the above table or visualization if used. + +**Additional Notes:** Add here including reference to the +[sensitive and protected attributes](#sensitive-and-protected-attributes) section, +when relevant. + +| Statistic | Field Name | Field Name | Field Name | Field Name | Field Name | Field Name | +|-----------|------------|------------|------------|------------|------------|------------| +| count | | | | | | | +| mean | | | | | | | +| std | | | | | | | +| min | | | | | | | +| 25% | | | | | | | +| 50% | | | | | | | +| 75% | | | | | | | +| max | | | | | | | +| mode | | | | | | | + +**Additional Notes:** Add here. + + +### Dataset Version and Maintenance + +#### Version Details + + +**Current Data Version:** 1.0 + +**Data Version Release Date:** DD/MM/YYYY + +**Data Version for last Data Card Update:** 1.0 + +**Last Data Card Update:** DD/MM/YYYY (same as in data overview at top) + +#### Data Change Log + +Link to change log (if exists) + +#### Maintenance Plan + + +Summarize here. Include links and metrics where applicable. + +**Versioning:** Summarize here. Include information about criteria for +versioning the dataset. + +**Updates:** Summarize here. Include information about criteria for refreshing +or updating the dataset. + +**Errors:** Summarize here. Include information about criteria for refreshing +or updating the dataset. + +**Feedback:** Summarize here. Include information about criteria for refreshing +or updating the dataset. + +**Additional Notes:** Add here + +#### Next Planned Update(s) + + +**Version affected:** 1.0 + +**Next data update:** MM/YYYY + +**Next version:** 1.1 + +**Next version update:** MM/YYYY + +#### Expected Change(s) + + +**Updates to Data:** Summarize here. Include links, charts, and visualizations +as appropriate. + +**Updates to Dataset:** Summarize here. Include links, charts, and +visualizations as appropriate. + +**Additional Notes:** Add here + +## Example of Data Points + + +#### Sampling of Data Points + + +- Typical Data Point Link +- Outlier Data Point Link +- Other Data Point Link +- Other Data Point Link + +#### Typical Data Point + + +Summarize here. Include any criteria for typicality of data point. + +``` +{'q_id': '8houtx', + 'title': 'Why does water heated to room temperature feel colder than the air around it?', + 'selftext': '', + 'document': '', + 'subreddit': 'explainlikeimfive', + 'answers': {'a_id': ['dylcnfk', 'dylcj49'], + 'text': ["Water transfers heat more efficiently than air. When something feels cold it's because heat is being transferred from your skin to whatever you're touching. ... Get out of the water and have a breeze blow on you while you're wet, all of the water starts evaporating, pulling even more heat from you."], + 'score': [5, 2]}, + 'title_urls': {'url': []}, + 'selftext_urls': {'url': []}, + 'answers_urls': {'url': []}} +``` + +**Additional Notes:** Add here + +#### Atypical Data Point + + +Summarize here. Include any criteria for atypicality of data point. + +``` +{'q_id': '8houtx', + 'title': 'Why does water heated to room temperature feel colder than the air around it?', + 'selftext': '', + 'document': '', + 'subreddit': 'explainlikeimfive', + 'answers': {'a_id': ['dylcnfk', 'dylcj49'], + 'text': ["Water transfers heat more efficiently than air. When something feels cold it's because heat is being transferred from your skin to whatever you're touching. ... Get out of the water and have a breeze blow on you while you're wet, all of the water starts evaporating, pulling even more heat from you."], + 'score': [5, 2]}, + 'title_urls': {'url': []}, + 'selftext_urls': {'url': []}, + 'answers_urls': {'url': []}} +``` + +**Additional Notes:** Add here + +## Purpose and Motivations + + +### Intended Purpose(s) + + +- Monitoring +- Research +- Evaluation +- Production +- Others (please specify) + +### Motivating Factor(s) + + +For example: + +- Bringing demographic diversity to imagery training data for object-detection models +- Encouraging academics to take on second-order challenges of cultural representation in object detection + +Summarize motivation here. Include links where relevant. + +### Intended Use +#### Dataset Use(s) + + +- Safe for production use +- Safe for research use +- Conditional use - some unsafe applications +- Only approved use +- Others (please specify) + +#### Suitable Use Case(s) + + +**Suitable Use Case:** Summarize here. Include links where necessary. + +**Suitable Use Case:** Summarize here. Include links where necessary. + +**Suitable Use Case:** Summarize here. Include links where necessary. + +**Additional Notes:** Add here + +#### Unsuitable Use Case(s) + + +**Unsuitable Use Case:** Summarize here. Include links where necessary. + +**Unsuitable Use Case:** Summarize here. Include links where necessary. + +**Unsuitable Use Case:** Summarize here. Include links where necessary. + +**Additional Notes:** Add here + +#### Research and Problem Space(s) + + +Summarize here. Include any specific research questions. + + +## Information for Usage + + +### Usage Guideline(s) + + +**Usage Guidelines:** Summarize here. Include links where necessary. + +**Approval Steps:** Summarize here. Include links where necessary. + +**Reviewer:** Provide the name of a reviewer for publications referencing +this dataset. + +**Additional Notes:** Add here + +### Use with Other Data +#### Safety Level + + +- Safe to use with other data +- Conditionally safe to use with other data +- Should not be used with other data +- Unknown +- Others (please specify) + +#### Best Practices + + +Summarize here. Include visualizations, metrics, demonstrative examples, +or links where necessary. + +**Additional Notes:** Add here + +### Forking and Sampling +#### Safety Level + + +- Safe to fork and/or sample +- Conditionally safe to fork and/or sample +- Should not be forked and/or sampled +- Unknown +- Others (please specify) + +#### Acceptable Sampling Method(s) + + +- Cluster Sampling +- Haphazard Sampling +- Multi-stage sampling +- Random Sampling +- Retrospective Sampling +- Stratified Sampling +- Systematic Sampling +- Weighted Sampling +- Unknown +- Unsampled +- Others (please specify) + +#### Best Practice(s) + + +Summarize here. Include links, figures, and demonstrative examples where +available. + +**Additional Notes:** Add here + +#### Risk(s) and Mitigation(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** [Description + Mitigations] + +**Risk Type:** [Description + Mitigations] + +**Risk Type:** [Description + Mitigations] + +**Additional Notes:** Add here + +#### Limitation(s) and Recommendation(s) + + +Summarize here. Include links and metrics where applicable. + +**Limitation Type:** [Description + Recommendation] + +**Limitation Type:** [Description + Recommendation] + +**Limitation Type:** [Description + Recommendation] + +**Additional Notes:** Add here + +### Notable Feature(s) + + + +**Exploration Demo:** [Link to server or demo.] + +**Notable Field Name:** Describe here. Include links, data examples, metrics, +visualizations where relevant. + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +### Distribution(s) + + + +| Set | Recommended number of data points | +|------------|-----------------------------------| +| Train | 62,563 | +| Test | 62,563 | +| Validation | 62,563 | +| Dev | 62,563 | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +### Known Correlation(s) + + +`field_name`, `field_name` + +**Description:** Summarize here. Include +visualizations, metrics, or links where +necessary. + +**Impact on dataset use:** Summarize here. +Include visualizations, metrics, or links +where necessary. + +**Risks from correlation:** Summarize here. +Include recommended mitigative steps if +available. + +**Additional Notes:** Add here + +### Split Statistics + + + + +| Statistic | Train | Test | Valid | Dev | +|-----------------------|--------|--------|--------|--------| +| Count | 123456 | 123456 | 123456 | 123456 | +| Descriptive Statistic | 123456 | 123456 | 123456 | 123456 | +| Descriptive Statistic | 123456 | 123456 | 123456 | 123456 | +| Descriptive Statistic | 123456 | 123456 | 123456 | 123456 | + +**Above:** Caption for table above. + +### Citation Guidelines + + + +**Guidelines:** Summarize citation guidelines with link to data documentation (i.e this data card). + +**BiBTeX:** +``` +@article{kuznetsova2020open, + title={The open images dataset v4}, + author={Kuznetsova, Alina and Rom, Hassan and Alldrin, and others}, + journal={International Journal of Computer Vision}, + volume={128}, + number={7}, + pages={1956--1981}, + year={2020}, + publisher={Springer} +} +``` + +**Additional Notes:** Add here + + +## Known Usages + +#### Models(s) + + + +| **Model** | **Model Task** | **Purpose of Dataset Usage** | +|---------------------|----------------------|------------------------------| +| [Example Model 1]() | Image Segmentation | Fairness evaluation | +| [Example Model 2]() | Skin Tone Classifier | Training and validation | + +Note, this table does not have to be exhaustive. Dataset users and documentation consumers at large +are highly encouraged to contribute known usages. + +#### Application(s) + + + +| **Application** | **Brief Description** | **Purpose of Dataset Usage** | **[AI Act Risk](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai)** | +|---------------------------|------------------------------|--------------------------------------------------------|----------------------------------------------------------------------------------------------| +| [Example Application 1]() | Size and Fit Recommendations | Fairness Evaluation of end to end application pipeline | Limited | + +Note, this table does not have to be exhaustive. Dataset users and documentation consumers at large +are highly encouraged to contribute known usages. + + +## Access, Retention, and Deletion + +### Access +#### Relevant Links +* [Link to filestore] +* [Link to governance processes for data access] +* ... + +#### Data Security Classification + + +- C1 +- C2 +- C3 +- C4 +- Others (please specify) + +#### Prerequisite(s) + + +For example: + +This dataset requires membership in [specific] database groups: + +- Complete the [Mandatory Training] +- Read [Data Usage Policy] +- Initiate a [Data Processing Request] +- Get AWS IAM role with S3 bucket access +- Add Databricks cluster or other technical users to correct roles + +### Retention +#### Duration + + +Specify duration in days, months, or years. + +#### Reasons for Duration + + +**Reason** + +#### Policy Summary + + +**Policy:** Add a link to the policy if it's standardized at your company (i.e. S3 bucket standard retention policies). + +**Summary:** Write summary here if link does not provide all information already. + +**Additional Notes:** Add here + +#### Exception(s) and Exemption(s) + + +**Exemption Code:** `ANONYMOUS_DATA` / +`EMPLOYEE_DATA` / `PUBLIC_DATA` / +`INTERNAL_BUSINESS_DATA` / +`SIMULATED_TEST_DATA` + +**Summary:** Write summary and notes here. + +**Additional Notes:** Add here + +### Deletion +#### Deletion Event Summary + + +**Sequence of deletion and processing events:** + +- Summarize first event here +- Summarize second event here +- Summarize third event here + +**Additional Notes:** Add here + +#### Acceptable Means of Deletion + + +- Write acceptable means of deletion +- Write acceptable means of deletion +- Write acceptable means of deletion + +#### Post-Deletion Obligations + + +**Sequence of post-deletion obligations:** + +- Summarize first obligation here +- Summarize second obligation here +- Summarize third obligation here + +**Additional Notes:** Add here + +#### Operational Requirement(s) + + +**Deletion Integration Operational Requirements:** + +- Write first requirement here +- Write second requirement here +- Write third requirement here + +#### Exceptions and Exemptions + + +**Policy Exception bug:** [bug] + +**Summary:** Write summary and notes here + +**Additional Notes:** Add here + +## Provenance +### Collection +#### Method(s) Used + + +- API +- Artificially generated +- Crowdsourced - Internal Employee +- Crowdsourced - External Paid +- Crowdsourced - Volunteer +- Vendor collection efforts +- Scraped or crawled +- Survey, forms, or polls +- Interviews, focus groups +- Scientific experiment +- Taken from other existing datasets +- Unknown +- To be determined +- Others (please specify) + +#### Methodology Detail(s) + + +**Collection Type** + +**Source:** Describe here. Include links where available. + +**Platform:** [Platform Name], Describe platform here. Include links where relevant. + +**Is this source considered sensitive or high-risk?** [Yes/No] + +**Dates of Collection:** [MMM YYYY - MMM YYYY] + +**Update Frequency for collected data:** + +*Usage Note: Select one for this collection type.* + +- Yearly +- Quarterly +- Monthly +- Biweekly +- Weekly +- Daily +- Hourly +- Static +- Others (please specify) + +**Additional Links for this collection:** + +See section on [Access, Rention, and Deletion](#access-retention-and-deletion) + +**Additional Notes:** Add here + +#### Source Description(s) + + +- **Source:** Describe here. Include links, data examples, metrics, visualizations where relevant. +- **Source:** Describe here. Include links, data examples, metrics, visualizations where relevant. +- **Source:** Describe here. Include links, data examples, metrics, visualizations where relevant. + +**Additional Notes:** Add here + +#### Collection Cadence + + +**Static:** Data was collected once from single or multiple sources. + +**Streamed:** Data is continuously acquired from single or multiple sources. + +**Dynamic:** Data is updated regularly from single or multiple sources. + +**Others:** Please specify + +### Attribute Collection Criteria and Integration +#### Data Integration + + +**Source** + +**Included Fields** + +Data fields that were collected and are included in the dataset. + +| Field Name | Description | +|------------|--------------------------------------------------------------------------------------| +| Field Name | Describe here. Include links, data examples, metrics, visualizations where relevant. | +| Field Name | Describe here. Include links, data examples, metrics, visualizations where relevant. | + +**Additional Notes:** Add here + +**Excluded Fields** + +Data fields that were collected but are excluded from the dataset. + +| Field Name | Description | +|------------|--------------------------------------------------------------------------------------| +| Field Name | Describe here. Include links, data examples, metrics, visualizations where relevant. | +| Field Name | Describe here. Include links, data examples, metrics, visualizations where relevant. | + +**Additional Notes:** Add here + +#### Data Aggregation + + +**[Aggregation step 1]:** [[GitHub Repository Link]()] + +Describe step 1. + +**[Aggregation step 2]:** [[GitHub Repository Link]()] + +Describe step 2. + +... + + +### Data Point Collection Criteria +#### Data Selection + + +- **Collection Method of Source:** Summarize data selection criteria here. Include links where available. +- **Collection Method of Source:** Summarize data selection criteria here. Include links where available. +- **Collection Method of Source:** Summarize data selection criteria here. Include links where available. + +**Additional Notes:** Add here + +#### Data Inclusion + + +- **Collection Method of Source:** Summarize data inclusion criteria here. Include links where available. +- **Collection Method of Source:** Summarize data inclusion criteria here. Include links where available. +- **Collection Method of Source:** Summarize data inclusion criteria here. Include links where available. + +**Additional Notes:** Add here + +#### Data Exclusion + + +- **Collection Method of Source:** Summarize data exclusion criteria here. Include links where available. +- **Collection Method of Source:** Summarize data exclusion criteria here. Include links where available. +- **Collection Method of Source:** Summarize data exclusion criteria here. Include links where available. + +**Additional Notes:** Add here + +### Relationship to Source +#### Use and Utility(ies) + + +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. + +**Additional Notes:** Add here + +#### Benefit and Value(s) + + +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. + +**Additional Notes:** Add here + +#### Limitation(s) and Trade-Off(s) + + +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. +- **Source Type:** Summarize here. Include links where available. + +## Sensitive and Protected Attributes +### Sensitivity of Data + + +#### Sensitivity Type(s) + + +- User Content +- User Metadata +- User Activity Data +- Identifiable Data +- S/PII +- Business Data +- Employee Data +- Pseudonymous Data +- Anonymous Data +- Health Data +- Children’s Data +- None +- Others (Please specify) + +#### Field(s) with Sensitive Data + + +**Intentional Collected Sensitive Data** + +(S/PII were collected as a part of the +dataset creation process.) + +| Field Name | Description | +|------------|---------------| +| Field Name | Type of S/PII | +| Field Name | Type of S/PII | +| Field Name | Type of S/PII | + +**Unintentionally Collected Sensitive Data** + +(S/PII were not explicitly collected as a +part of the dataset creation process but +can be inferred using additional +methods.) + +| Field Name | Description | +|------------|---------------| +| Field Name | Type of S/PII | +| Field Name | Type of S/PII | +| Field Name | Type of S/PII | + +**Additional Notes:** Add here + +#### Security and Privacy Handling + + + +Summarize here. Include relevant links to relevant data governance and security processes + +**Governance Process 1:** description + +**Governance Process 1:** description + +**Privacy Requirements:** description + +**Privacy Preserving Processes:** (i.e. anonymization, masking, etc.) description + +**Additional Notes:** Add here + +#### Risk Type(s) + + +- Direct Risk +- Indirect Risk +- Residual Risk +- No Known Risks +- Others (Please Specify) + +### Protected Attributes +#### Protected Attribute Type(s) + + +- Gender +- Socio-economic status +- Geography +- Language +- Age +- Culture +- Experience or Seniority +- Others (please specify) + +#### Field(s) with Protected Attributes + + +**Intentionally Collected Attributes** + +Protected attributes were labeled or collected as a part of the dataset creation +process. + +| Field Name | Description | +|------------|--------------------------------| +| Field Name | Protected Attributed Collected | +| Field Name | Protected Attributed Collected | + +**Additional Notes:** Add here + +**Unintentionally Collected Attributes** + +Protected attributes were not explicitly collected as a part of the dataset +creation process but can be inferred using additional methods. + +| Field Name | Description | +|------------|--------------------------------| +| Field Name | Protected Attributed Collected | +| Field Name | Protected Attributed Collected | + +**Additional Notes:** Add here + +#### Rationale + + +Summarize here. Include links, table, and media as relevant. + +#### Source(s) + + +- **Protected Attribute:** Sources +- **Protected Attribute:** Sources +- **Protected Attribute:** Sources + +**Additional Notes:** Add here + +#### Methodology Detail(s) + + + +**Protected Attribute Collection Method:** Describe the collection method here. Include links where necessary + +**Collection task:** Describe the task here. Include links where necessary + +**Platforms, tools, or libraries:** + +- [Platform, tools, or libraries]: Write description here +- [Platform, tools, or libraries]: Write description here +- [Platform, tools, or libraries]: Write description here + +**Additional Notes:** Add here + +#### Known Correlations to Protected Attributes + + +[`field_name`, `field_name`] + +**Description:** Summarize here. Include visualizations, metrics, or links +where necessary. + +**Impact on dataset use:** Summarize here. Include visualizations, metrics, or +links where necessary. + +**Additional Notes:** add here + +#### Risk(s) and Mitigation(s) + + +**Protected Attribute** + +Summarize here. Include links and metrics where applicable. + +**Risk type:** [Description + Mitigations] + +**Risk type:** [Description + Mitigations] + +**Risk type:** [Description + Mitigations] + +**Trade-offs, caveats, and other considerations:** Summarize here. Include +visualizations, metrics, or links where necessary. + +**Additional Notes:** Add here + +## Transformations + +### Code Base and Existing Documentation + + +See [[this GitHub repository]()] for more transformation code base and documentation. + +### Synopsis +#### Transformation(s) Applied + + +- Anomaly Detection +- Cleaning Mismatched Values +- Cleaning Missing Values +- Converting Data Types +- Data Aggregation +- Dimensionality Reduction +- Joining Input Sources +- Redaction or Anonymization +- Others (Please specify) + +#### Field(s) Transformed + + +**Transformation Type** + +| Field Name | Source and Target | +|------------|----------------------------| +| Field Name | Source Field: Target Field | +| Field Name | Source Field: Target Field | +| ... | ... | + +**Additional Notes:** Add here + +### Breakdown of Transformations + +#### Cleaning Missing Value(s) + + +Summarize here. Include links where available. + +**Field Name:** Count or description + +**Field Name:** Count or description + +**Field Name:** Count or description + +#### Method(s) Used + + +Summarize here. Include links where necessary. + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +- **Risk Type:** Description + Mitigations +- **Risk Type:** Description + Mitigations +- **Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +#### Cleaning Mismatched Value(s) + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where available. + +#### Comparative Summary + + +Summarize here. Include links where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +#### Anomalies + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where necessary. + +**Platforms, tools, or libraries** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +#### Dimensionality Reduction + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where +necessary. + +**Platforms, tools, or libraries** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risks + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +#### Joining Input Sources + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where necessary. + +**Platforms, tools, or libraries** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where +available. + +#### Additional Considerations + + +Summarize here. Include links where +available. + +#### Redaction or Anonymization + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where necessary. + +**Platforms, tools, or libraries** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +**Risk Type:** Description + Mitigations + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +#### Others (Please Specify) + + +Summarize here. Include links where available. + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +**Field Name:** Count or Description + +#### Method(s) Used + + +Summarize here. Include links where necessary. + +**Platforms, tools, or libraries** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +#### Comparative Summary + + +Summarize here. Include links, tables, visualizations where available. + +| **Field Name** | **Diff** | +|----------------|---------------| +| Field Name | Before: After | +| Field Name | Before: After | +| ... | ... | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Residual and Other Risk(s) + + +Summarize here. Include links and metrics where applicable. + +**Risk type:** [Description + Mitigations] + +**Risk type:** [Description + Mitigations] + +**Risk type:** [Description + Mitigations] + +#### Human Oversight Measure(s) + + +Summarize here. Include links where available. + +#### Additional Considerations + + +Summarize here. Include links where available. + +## Annotations and Labeling + + +### Annotation +#### Task(s) + + +**(Task Type)** + +**Task description:** Summarize here. Include links if available. + +**Task instructions:** Summarize here. Include links if available. + +**Methods used:** Summarize here. Include links if available. + +**Inter-rater adjudication policy:** Summarize here. Include links if +available. + + +**Golden questions:** Summarize here. Include links if available. + +**Additional notes:** Add here + +#### Characteristic(s) + + +| **(Annotation Type)** | **Number** | +|----------------------------------|------------| +| Number of annotated examples | 123456789 | +| Total number of annotations | 123456789 | +| Average annotations per example | 123456789 | +| Number of annotators per example | 123456789 | +| [Quality metric per granuality] | 123456789 | +| [Quality metric per granuality] | 123456789 | +| [Quality metric per granuality] | 123456789 | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Description(s) + + +**(Annotation Type)** + +**Description:** Description of annotations (labels, ratings) produced. +Include how this was created or authored. + +**Link:** Relevant URL link. + +**Platforms, tools, or libraries:** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +**Additional Notes:** Add here + +#### Distribution(s) + + +| **(Annotation Type)** | **Number** | +|------------------------|-------------| +| Annotations (or Class) | 12345 (20%) | +| Annotations (or Class) | 12345 (20%) | +| Annotations (or Class) | 12345 (20%) | +| Annotations (or Class) | 12345 (20%) | +| Annotations (or Class) | 12345 (20%) | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +### Human Annotators + +#### Annotation Workforce Type + + +- Annotation Target in Data +- Machine-Generated +- Annotations +- Protected Attribute Annotations (Expert) +- Protected Attribute Annotations (Non-Expert) +- Protected Attribute Annotations (Employees) +- Protected Attribute Annotations (Contractors) +- Protected Attribute Annotations (Crowdsourcing) +- Protected Attribute Annotations (Outsourced / Managed) +- Teams +- Unlabeled +- Others (Please specify) + +#### Annotator Pool(s) + + +**(Annotation Pool Name)** + +**Number of unique annotators:** Summarize here. Include links if available. + +**Task(s) completed:** Summarize here (See [annotation tasks](#tasks) section for full list of Annotation tasks). + +**Expertise of annotators:** Summarize here. Include links if available. + +**Description of annotators:** Summarize here. Include links if available. + +**Summary of general (non task specific) annotation instructions:** Summarize here. Include links if +available. + +**Summary of annotator's responses to gold questions:** Summarize here. Include links if available. + +**Annotation platforms:** Summarize here. Include links if available. + +**Additional Notes:** Add here + +#### Language(s) + + +**(Annotator Languages Spoken)** + +- Language [Percentage %] +- Language [Percentage %] +- Language [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Location(s) + + +**(Annotator Locations of Upbringing)** + +- Location [Percentage %] +- Location [Percentage %] +- Location [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +**(Annotator Current Locations of Residence)** + +- Location [Percentage %] +- Location [Percentage %] +- Location [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Gender(s) + + +**(Annotator Genders)** + +- Gender [Percentage %] +- Gender [Percentage %] +- Gender [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +## Validation Types + +#### Method(s) + + +- Data Type Validation +- Range and Constraint Validation +- Code/cross-reference Validation +- Structured Validation +- Consistency Validation +- Not Validated +- Others (Please Specify) + +#### Breakdown(s) + + +**(Validation Type)** + +**Number of Data Points Validated:** 12345 + +**Fields Validated** +Field | Count (if available) +--- | --- +Field | 123456 +Field | 123456 +Field | 123456 + +**Above:** Provide a caption for the above table or visualization. + +#### Description(s) + + +**(Validation Type)** + +**Method:** Describe the validation method here. Include links where +necessary. + +**Platforms, tools, or libraries:** + +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here +- Platform, tool, or library: Write description here + +**Validation Results:** Provide results, outcomes, and actions taken because +of the validation. Include visualizations where available. + +**Additional Notes:** Add here + +### Description of Human Validators + +#### Characteristic(s) + + +**(Validation Type)** +- Unique validators: 12345 +- Number of examples per validator: 123456 +- Average cost/task/validator: $$$ +- Training provided: Y/N +- Expertise required: Y/N + +#### Description(s) + + +**(Validation Type)** + +**Validator description:** Summarize here. Include links if available. + +**Training provided:** Summarize here. Include links if available. + +**Validator selection criteria:** Summarize here. Include links if available. + +**Training provided:** Summarize here. Include links if available. + +**Additional Notes:** Add here + +#### Language(s) + + +**(Validation Type)** + +- Language [Percentage %] +- Language [Percentage %] +- Language [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Location(s) + + +**(Validation Type)** + +- Location [Percentage %] +- Location [Percentage %] +- Location [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Gender(s) + + +**(Validation Type)** + +- Gender [Percentage %] +- Gender [Percentage %] +- Gender [Percentage %] + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +## Sampling Methods + +#### Method(s) Used + + +- Cluster Sampling +- Haphazard Sampling +- Multi-stage Sampling +- Random Sampling +- Retrospective Sampling +- Stratified Sampling +- Systematic Sampling +- Weighted Sampling +- Unknown +- Unsampled +- Others (Please specify) + +#### Characteristic(s) + + +| **(Sampling Type)** | **Number** | +|-----------------------|------------------------| +| Upstream Source | Write here | +| Total data sampled | 123m | +| Sample size | 123 | +| Threshold applied | 123k units at property | +| Sampling rate | 123 | +| Sample mean | 123 | +| Sample std. dev | 123 | +| Sampling distribution | 123 | +| Sampling variation | 123 | +| Sample statistic | 123 | + +**Above:** Provide a caption for the above table or visualization. + +**Additional Notes:** Add here + +#### Sampling Criteria + + +- **Sampling method:** Summarize here. Include links where applicable. +- **Sampling method:** Summarize here. Include links where applicable. +- **Sampling method:** Summarize here. Include links where applicable. + +## Glossary +### Concepts and Definitions referenced in this Data Card + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +#### Term of Art +Definition: Write here + +Source: Write here and share link + +Interpretation: Write here + +## Reflections on Data + +### Title +Write notes here. + +### Title +Write notes here. + +### Title +Write notes here. diff --git a/templates/Model Card - Zalando Template.md b/templates/Model Card - Zalando Template.md new file mode 100644 index 0000000..70fd378 --- /dev/null +++ b/templates/Model Card - Zalando Template.md @@ -0,0 +1,207 @@ + + +

Model Name (Acronym)

+ +Model card contributions and feedback are welcome! + +## Template Version + + +[0.2.2]() + +## Overview + +#### Last Update + +**DD/MM/YY** + +#### Model Card Author(s) + + +- **Name, Team:** (Owner / Contributor / Manager) +- **Name, Team:** (Owner / Contributor / Manager) +- **Name, Team:** (Owner / Contributor / Manager) + +## Model Overview +* Description + + +#### Status + + +**Status Date:** DD/MM/YYYY + +**Under Preparation** - The model is still under active development +and is not yet ready for use due to active "dev" updates. + +**Regularly Updated** - New versions of the model +have been or will continue to be made available. + +**Actively Maintained** - No new versions will be made +available, but this model will +be actively maintained. + +**Limited Maintenance** - The model will not be updated, +but any technical issues will be +addressed. + +**Deprecated** - This model is obsolete or is +no longer being maintained. + +### Developers +- **Name, Team** +- **Name, Team** + +### Owners + +- **Team Name, Contact Person** + +### Risk classification + + +- High / Limited / Minimal + +### Version Details and Artifacts + + +**Current Model Version:** + +**Model Version Release Date:** + +**Model Version for last Model Card Update:** + +**Artifacts:** + + +- Model weights (e.g. S3 bucket path) +- Model config (e.g. S3 bucket path) + +### Github + +- Repository_1 +- Repository_2 + +### References +Example references: + +- Paper/Documentation Link +- Initiative Link +- API Link + +## Intended Use and Known Applications +#### Intended Use + + +* Description + +#### Known Applications + + +| **Application** | **Purpose of Model Usage** | **[AI Act Risk](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai)** | +|-------------------|----------------------------------------------------------------------------|----------------------------------------------------------------------------------------------| +| [Application 1]() | Foundation model providing customer embeddings for fraud detection scoring | High | +| [Application 2]() | Customer embeddings used directly as features for recommendation engine | Limited | + +Note, this table may not be exhaustive. Model users and documentation consumers at large +are highly encouraged to contribute known usages. + +#### Out Of Scope Uses +Provide potential applications and/or use cases for which use of the model is not suitable. + +## Risks and Ethical Considerations +This section should cover topics related to known biases, security, privacy and other risks related to use of the model. Please provide an introductory paragraph before listing risks and cosnsiderations. Such introduction should explain +how the risks / ethical considerations were identified +and wheter, and if so which, risks teams were involved in identifying the issues and coming up with the mitigation strategies. + +#### Risk / Consideration + +Description + +**Mitigation strategy**: + +Description + +#### Risk / Consideration + +Description + +**Mitigation strategy**: + +Description + +## Training + + +### Datasets +Datasets should be compiled in a table with columns and rows: + +| Name | Location | Sensitive* | Size | Documentation +|---|---|---|---|---| +Name | Location (e.g. S3 bucket) | Yes / No | # of samples | Link (e.g. Dataset Card) + + +(*): Requires a special Data Processing Request +## Evaluation + +This section should include overview of evaluation process such as evaluation dataset(s), metrics and quantitive results. + + +### Datasets +Datasets should be compiled in a table with columns and rows: + +| Name | Location | Sensitive* | Size | Documentation +|---|---|---|---|---| +Name | Location (e.g S3 bucket) | Yes / No | # of samples | Link (e.g Dataset Card) + +(*): Requires a special Data Processing Request + +### Quantitive Results + +* Dataset: + * **Metric: Result** + * **Metric: Result** + +* Dataset: + * **Metric: Result** + * **Metric: Result** + +## Caveats and Recommendations +This section should cover shortcomings and recommendations related to current approach such as: + +* scalability +* robustness +* deployment safety and readiness +* training / evaluation data + + +Additionaly plans or ideas for future work could be discussed (e.g. incorporate new data source or use a bigger model). + +## Energy Requirements + + + +| Category | Estimated footprint | +|-------------------------------------|---------------------------| +| Total training time | 118 days, 5 hours, 41 min | +| Total number of GPU hours | 1,082,990 hours | +| Total energy used | 433,196 kW | +| GPU models used | Nvidia A100 80GB | +| Carbon intensity of the energy grid | 57 gCO2eq/kWh | + +Example taken directly from [Luccioni et al.](https://arxiv.org/pdf/2211.02001.pdf). From ba11dcdb31c7ff10d187ffc1666e81b811206ddf Mon Sep 17 00:00:00 2001 From: Alex Loosley Date: Thu, 31 Aug 2023 13:45:17 +0200 Subject: [PATCH 2/3] contributors --- templates/Data Card - Zalando Template.md | 2 ++ templates/Model Card - Zalando Template.md | 2 ++ 2 files changed, 4 insertions(+) diff --git a/templates/Data Card - Zalando Template.md b/templates/Data Card - Zalando Template.md index b361a46..8944a91 100644 --- a/templates/Data Card - Zalando Template.md +++ b/templates/Data Card - Zalando Template.md @@ -2,6 +2,8 @@ derived for usage at Zalando based on existing [Huggingface](https://huggingface.co/docs/hub/datasets-cards) and [Google](https://sites.research.google/datacardsplaybook/) templates. + +Contributors: Alex Loosley, Rocco Maresca, Pak-Hang Wong, Håkan Jonsson -->

Dataset Name (Acronym)

diff --git a/templates/Model Card - Zalando Template.md b/templates/Model Card - Zalando Template.md index 70fd378..29391b4 100644 --- a/templates/Model Card - Zalando Template.md +++ b/templates/Model Card - Zalando Template.md @@ -2,6 +2,8 @@ derived for usage at Zalando based on existing [Huggingface](https://huggingface.co/docs/hub/model-cards) and [Google](https://modelcards.withgoogle.com/about) templates. + +Contributors: Alex Loosley, Rocco Maresca, Pak-Hang Wong, Håkan Jonsson -->

Model Name (Acronym)

From 823104559b33260786c3c93aed99fb303573c983 Mon Sep 17 00:00:00 2001 From: Alex Loosley Date: Mon, 4 Sep 2023 10:28:33 +0200 Subject: [PATCH 3/3] switch from unit of power to unit of energy --- templates/Model Card - Zalando Template.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/templates/Model Card - Zalando Template.md b/templates/Model Card - Zalando Template.md index 29391b4..60c4131 100644 --- a/templates/Model Card - Zalando Template.md +++ b/templates/Model Card - Zalando Template.md @@ -202,7 +202,7 @@ Additionaly plans or ideas for future work could be discussed (e.g. incorporate |-------------------------------------|---------------------------| | Total training time | 118 days, 5 hours, 41 min | | Total number of GPU hours | 1,082,990 hours | -| Total energy used | 433,196 kW | +| Total energy used | 433,196 kWh | | GPU models used | Nvidia A100 80GB | | Carbon intensity of the energy grid | 57 gCO2eq/kWh |