Lb comments (#80)

* rewrite learning, predicting, box Signed-off-by: RaphaelS1 <[email protected]> * finish LB comments --------- Signed-off-by: RaphaelS1 <[email protected]>
mlsa-book · Nov 21, 2024 · d529c65 · d529c65
1 parent 5524d0a
commit d529c65
Show file tree

Hide file tree

Showing 7 changed files with 114 additions and 62 deletions.
diff --git a/book/Figures/ml/cv.png b/book/Figures/ml/cv.png
diff --git a/book/_macros.tex b/book/_macros.tex
@@ -223,8 +223,8 @@
 \providecommand{\BB}{\boldsymbol{\Beta}}
 \providecommand{\EE}{\boldsymbol{\Eta}}
 
-\providecommand{\thethe}{\boldsymbol{\theta}}
-\providecommand{\lamlam}{\boldsymbol{\lambda}}
+\providecommand{\bstheta}{\boldsymbol{\theta}}
+\providecommand{\bslambda}{\boldsymbol{\lambda}}
 
 %-----------------------
 % number sets

diff --git a/book/boosting.qmd b/book/boosting.qmd
@@ -4,7 +4,7 @@ abstract: TODO (150-200 WORDS)
 
 {{< include _setup.qmd >}}
 
-# Boosting Methods
+# Boosting Methods {#sec-boost}
 
 {{< include _wip.qmd >}}
 
@@ -81,7 +81,7 @@ $$
 
 GBMs provide a flexible, modular algorithm, primarily comprised of a differentiable loss to minimise, $L$, and the selection of weak learners.
 This chapter focuses on tree-based weak learners, though other weak learners are possible.
-Perhaps the most common alternatives are linear least squares  [@Friedman2001] and smoothing splines  [@Buhlmann2003], we will not discuss these further here as decision trees are primarily used for survival analysis, due the flexibility demonstrated in @sec-surv-ml-models-ranfor.
+Perhaps the most common alternatives are linear least squares  [@Friedman2001] and smoothing splines  [@Buhlmann2003], we will not discuss these further here as decision trees are primarily used for survival analysis, due the flexibility demonstrated in @sec-ranfor.
 See references at the end of the chapter for other weak learners.
 Extension to survival analysis therefore follows by considering alternative losses.
 

diff --git a/book/forests.qmd b/book/forests.qmd
@@ -4,7 +4,7 @@ abstract: TODO (150-200 WORDS)
 
 {{< include _setup.qmd >}}
 
-# Random Forests {#sec-surv-ml-models-ranfor}
+# Random Forests {#sec-ranfor}
 
 {{< include _wip.qmd >}}
 

diff --git a/book/library.bib b/book/library.bib
@@ -9477,3 +9477,31 @@ @misc{Burk2024
       primaryClass={stat.ML},
       url={https://arxiv.org/abs/2406.04098}, 
 }
+
+@article{Benavoli2017,
+  author  = {Alessio Benavoli and Giorgio Corani and Janez Dem{\v{s}}ar and Marco Zaffalon},
+  title   = {Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis},
+  journal = {Journal of Machine Learning Research},
+  year    = {2017},
+  volume  = {18},
+  number  = {77},
+  pages   = {1--36},
+  url     = {http://jmlr.org/papers/v18/16-305.html}
+}
+
+@Inbook{Simon2007,
+author="Simon, Richard",
+editor="Dubitzky, Werner
+and Granzow, Martin
+and Berrar, Daniel",
+title="Resampling Strategies for Model Assessment and Selection",
+bookTitle="Fundamentals of Data Mining in Genomics and Proteomics",
+year="2007",
+publisher="Springer US",
+address="Boston, MA",
+pages="173--186",
+isbn="978-0-387-47509-7",
+doi="10.1007/978-0-387-47509-7_8",
+url="https://doi.org/10.1007/978-0-387-47509-7_8"
+}
+
diff --git a/book/machinelearning.qmd b/book/machinelearning.qmd
diff --git a/book/reductions.qmd b/book/reductions.qmd
@@ -109,7 +109,7 @@ where $h_0$ is the baseline hazard and $\beta$ are the model coefficients.
 This can be seen as a composite model as Cox defines the model in two stages  [@Cox1972]: first fitting the $\beta$-coefficients using the partial likelihood and then by suggesting an estimate for the baseline distribution. This first stage produces a linear predictor return type (@sec-surv-set-types) and the second stage returns a survival distribution prediction. Therefore the Cox model for linear predictions is a single (non-composite) model, however when used to make distribution predictions then it is a composite. Cox implicitly describes the model as a composite by writing ''alternative simpler procedures would be worth having''  [@Cox1972], which implies a decision in fitting (a key feature of composition). This composition is formalised in @sec-car-pipelines-distr as a general pipeline \CDetI. The Cox model utilises the \CDetI pipeline with a PH form and Kaplan-Meier baseline.
 
 #### Example 2: Random Survival Forests {.unnumbered .unlisted}
-Fully discussed in @sec-surv-ml-models-ranfor, random survival forests are composed from many individual decision trees via a prediction composition algorithm (@alg-rsf-pred). In general, random forests perform better than their component decision trees, which tends to be true of all ensemble methods. Aggregation of predictions in survival analysis requires slightly more care than other fields due to the multiple prediction types, however this is still possible and is formalised in @sec-car-pipelines-avg.
+Fully discussed in @sec-ranfor, random survival forests are composed from many individual decision trees via a prediction composition algorithm (@alg-rsf-pred). In general, random forests perform better than their component decision trees, which tends to be true of all ensemble methods. Aggregation of predictions in survival analysis requires slightly more care than other fields due to the multiple prediction types, however this is still possible and is formalised in @sec-car-pipelines-avg.
 
 ## Introduction to Reduction {#sec-car-redux}
 
@@ -233,7 +233,7 @@ i. the composition from the simpler model to the complex one, $M_R \rightarrow M
 In surveying models and measures, several common mistakes in the implementation of reduction and composition were found to be particularly prevalent and problematic throughout the literature. It is assumed that these are indeed mistakes (not deliberate) and result from a lack of prior formalisation. These mistakes were even identified 20 years ago  [@Schwarzer2000] but are provided in more detail in order to highlight their current prevalence and why they cannot be ignored.
 
 RM1. Incomplete reduction. This occurs when a reduction workflow is presented as if it solves the original task but fails to do so and only the reduction strategy is solved. A common example is claiming to solve the survival task by using binary classification, e.g. erroneously claiming that a model predicts survival probabilities (which implies distribution) when it actually predicts a five year probability of death (@box-task-classif). This is a mistake as it misleads readers into believing that the model solves a survival task (@box-task-surv) when it does not. This is usually a semantic not mathematical error and results from misuse of terminology. It is important to be clear about model predict types  (@sec-surv-set-types) and general terms such as 'survival predictions' should be avoided unless they refer to one of the three prediction tasks.
-RM2. Inappropriate comparisons. This is a direct consequence of (RM1) and the two are often seen together. (RM2) occurs when an incomplete reduction is directly compared to a survival model (or complete reduction model) using a measure appropriate for the reduction. This may lead to a reduction model appearing erroneously superior. For example, comparing a logistic regression to a random survival forest (RSF) (@sec-surv-ml-models-ranfor) for predicting survival probabilities at a single time using the accuracy measure is an unfair comparison as the RSF is optimised for distribution predictions. This would be non-problematic if a suitable composition is clearly utilised. For example a regression SSVM predicting survival time cannot be directly compared to a Cox PH. However the SSVM can be compared to a CPH composed with the probabilistic to deterministic compositor \CProb, then conclusions can be drawn about comparison to the composite survival time Cox model (and not simply a Cox PH).
+RM2. Inappropriate comparisons. This is a direct consequence of (RM1) and the two are often seen together. (RM2) occurs when an incomplete reduction is directly compared to a survival model (or complete reduction model) using a measure appropriate for the reduction. This may lead to a reduction model appearing erroneously superior. For example, comparing a logistic regression to a random survival forest (RSF) (@sec-ranfor) for predicting survival probabilities at a single time using the accuracy measure is an unfair comparison as the RSF is optimised for distribution predictions. This would be non-problematic if a suitable composition is clearly utilised. For example a regression SSVM predicting survival time cannot be directly compared to a Cox PH. However the SSVM can be compared to a CPH composed with the probabilistic to deterministic compositor \CProb, then conclusions can be drawn about comparison to the composite survival time Cox model (and not simply a Cox PH).
 RM3. Na\"ive censoring deletion. This common mistake occurs when trying to reduce survival to regression or classification by simply deleting all censored observations, even if censoring is informative. This is a mistake as it creates bias in the dataset, which can be substantial if the proportion of censoring is high and informative. More robust deletion methods are described in @sec-redux-regr.
 RM4. Oversampling uncensored observations. This is often seen when trying to reduce survival to regression or classification, and often alongside (RM3). Oversampling is the process of replicating observations to artificially inflate the sample size of the data. Whilst this process does not create any new information, it can help a model  detect important features in the data. However, by only oversampling uncensored observations, this creates a source of bias in the data and ignores the potentially informative information provided by the proportion of censoring.
 
@@ -733,4 +733,4 @@ Finally, predictive performance is also increased by these methods, which is mos
 
 All compositions in this chapter, as well as (R1)-(R6), have been implemented in `r pkg("mlr3proba")` with the `r pkg("mlr3pipelines")`  [@pkgmlr3pipelines] interface. The reductions to classification will be implemented in a near-future update. Additionally the `r pkg("discSurv")` package  [@pkgdiscsurv] will be interfaced as a `r pkg("mlr3proba")` pipeline to incorporate further discrete-time strategies.
 
-The compositions \CDetI and \CProb are included in the benchmark experiment in @Sonabend2021b so that every tested model can make probabilistic survival distribution predictions as well as deterministic survival time predictions. Future research will benchmark all the pipelines in this chapter and will cover algorithm and model selection, tuning, and comparison of performance. Strategies from other papers will also be explored.
+The compositions \CDetI and \CProb are included in the benchmark experiment in @Sonabend2021b so that every tested model can make probabilistic survival distribution predictions as well as deterministic survival time predictions. Future research will benchmark all the pipelines in this chapter and will cover algorithm and model selection, tuning, and comparison of performance. Strategies from other papers will also be explored.