add q's and key points

carpentries-incubator · Dec 14, 2023 · ff00c83 · ff00c83
1 parent f3d84ea
commit ff00c83
Show file tree

Hide file tree

Showing 12 changed files with 176 additions and 121 deletions.
diff --git a/config.yaml b/config.yaml
@@ -55,12 +55,12 @@ contact: '[email protected]'
 
 # Order of episodes in your lesson
 episodes:
-- introduction.md
-- problem-definition.md
-- scientific-validity.md
-- fairness.md
-- explainability.md
-- releasing-a-model.md
+- 0-introduction.md
+- 1-preparing-to-train.md
+- 2-model-fitting.md
+- 3-model-eval.md
+- 4-explainability.md
+- 5-releasing-a-model.md
 
 # Information for Learners
 learners: 

diff --git a/episodes/introduction.md → episodes/0-introduction.md b/episodes/introduction.md → episodes/0-introduction.md
@@ -3,7 +3,7 @@ title: "Introduction"
 teaching: 10
 exercises: 2
 ---
-
+ 
 :::::::::::::::::::::::::::::::::::::: questions 
 
 - How do you write a lesson using Markdown and `{sandpaper}`?

diff --git a/episodes/1-preparing-to-train.md b/episodes/1-preparing-to-train.md
@@ -0,0 +1,31 @@
+---
+title: "Preparing to train a model"
+teaching: 0
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- For what prediction tasks is machine learning an appropriate tool?
+- How can inappropriate target variable choice lead to suboptimal outcomes in a machine learning pipeline?
+- What is "biased" training data, and where does this bias come from?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Judge what tasks are appropriate for machine learning
+- Understand why the choice of prediction task / target variable is important.
+- Describe how bias can appear in training data.
+
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Some tasks are not appropriate for machine learning due to ethical concerns.
+- Machine learning tasks should have a valid prediction target that maps clearly to the real-world goal.
+- Training data can be biased due to societal inequities, errors in the data collection process, and lack of attention to careful sampling practices.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/2-model-fitting.md b/episodes/2-model-fitting.md
@@ -0,0 +1,28 @@
+---
+title: "Scientific Validity in the Modeling Process"
+teaching: 0
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- What impact does overfitting and underfitting have on model performance?
+- What is data leakage? 
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Implement at least two types of machine learning models in Python.
+- Describe the risks of, identify, and understand mitigation steps for overfitting and underfitting.
+- Understand why data leakage is harmful to scientific validity and how it can appear in machine learning pipelines.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Overfitting is characterized by worse performance on the test set than on the train set and can be fixed by switching to a simpler model architecture or by adding regularization.
+- Underfitting is characterized by poor performance on both the training and test datasets. It can be fixed by collecting more training data, switching to a more complex model architecture, or improving feature quality.
+- Data leakage occurs when the model has access to the test data during training and results in overconfidence in the model's performance. 
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/3-model-eval.md b/episodes/3-model-eval.md
@@ -0,0 +1,53 @@
+---
+title: "Fairness"
+teaching: 0
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- How do we define fairness and bias in machine learning outcomes?
+- How can we improve the fairness of machine learning models? 
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+- Reason about model performance through standard evaluation metrics.
+- Understand and distinguish between various notions of fairness in machine learning.
+- Describe and implement two different ways of modifying the machine learning modeling process to improve the fairness of a model.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::::::::::::::::::::::::::: challenge
+
+### Matching fairness terminology with definitions
+
+Match the following types of formal fairness with their definitions.
+(A) Individual fairness,
+(B) Equalized odds,
+(C) Demographic parity, and 
+(D) Group-level calibration
+
+1. The model is equally accurate across all demographic groups. 
+2. Different demographic groups have the same true positive rates and false positive rates. 
+3. Similar people are treated similarly.
+4. People from different demographic groups receive each outcome at the same rate.
+::::::::::::::::::::::::::::::::::::::::::::::::::
+
+:::::::::::::: solution
+
+### Solution
+
+A - 3, B - 2, C - 4, D - 1
+
+:::::::::::::::::::::::::
+
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- It's important to consider many dimensions of model performance: a single accuracy score is not sufficient.
+- There is no single definition of "fair machine learning": different notions of fairness are appropriate in different contexts.
+- It is usually not possible to satisfy all possible notions of fairness.
+- The fairness of a model can be improved by using techniques like data reweighting and model postprocessing.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/4-explainability.md b/episodes/4-explainability.md
@@ -0,0 +1,27 @@
+---
+title: "Explainability"
+teaching: 0
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- What is model interpretability? When do we need models to be interpretable?
+- What are some model interpretability techniques?
+- When are model explainability techniques not sufficient for understanding model behavior?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Compare and contrast different interpretability techniques.
+- Explain feature importance.
+- Articulate limitations of explainable machine learning.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- TODO
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/5-releasing-a-model.md b/episodes/5-releasing-a-model.md
@@ -0,0 +1,30 @@
+---
+title: "Releasing a model"
+teaching: 0
+exercises: 0
+---
+
+:::::::::::::::::::::::::::::::::::::: questions 
+
+- What is distribution shift? How can we know if distribution shift has occured?
+- What is a model card?
+- How do I share a model so that others may use it?
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: objectives
+
+- Understand distribution shift and its implications.
+- Apply model-sharing best practices through using model cards.
+- Understand the technical and communication norms around sharing models.
+
+::::::::::::::::::::::::::::::::::::::::::::::::
+
+::::::::::::::::::::::::::::::::::::: keypoints 
+
+- Distribution shift is common. It can be caused by temporal shifts (i.e., using old training data) or by applying a model to new populations.
+- Distribution shift can be addressed by TODO
+- Model cards are the standard technique for communicating information about how machine learning systems were trained and how they should and should not be used.
+- Models can be shared and reused by doing TODO
+
+::::::::::::::::::::::::::::::::::::::::::::::::
diff --git a/episodes/explainability.md b/episodes/explainability.md
diff --git a/episodes/fairness.md b/episodes/fairness.md
diff --git a/episodes/problem-definition.md b/episodes/problem-definition.md
diff --git a/episodes/releasing-a-model.md b/episodes/releasing-a-model.md
diff --git a/episodes/scientific-validity.md b/episodes/scientific-validity.md