From 1787e76d609f7ec868ee6b18ab7ce4a7c2653a45 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 22:33:09 +0200
Subject: [PATCH 1/8] clarify emphasis on cv metrics

---
 .../sections/proposed_approach/testing_validation.tex    | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/report_thesis/src/sections/proposed_approach/testing_validation.tex b/report_thesis/src/sections/proposed_approach/testing_validation.tex
index 23564774..a56bfcbb 100644
--- a/report_thesis/src/sections/proposed_approach/testing_validation.tex
+++ b/report_thesis/src/sections/proposed_approach/testing_validation.tex
@@ -208,3 +208,12 @@ \subsubsection{Discussion of Testing and Validation Strategy}
 By evaluating with both cross-validation and a separate test set, we ensure that the model both generalizes well and performs well under typical conditions.
 Cross-validation allows us to evaluate the model's performance across the entire dataset, including extreme values, while the test set provides a measure of the model's performance on unseen, typical data.
 This combination of cross-validation and a separate test set provides a comprehensive assessment of the model's performance, ultimately helping to ensure that the model is both robust and accurate.
+
+In our initial and optimization experiments, we prioritize cross-validation metrics to evaluate the models.
+This strategy mitigates the risk of overfitting to the test set by avoiding a bias towards lower \gls{rmsep} values.
+Conversely, for the stacking ensemble experiment, we emphasize test set metrics to comprehensively assess the ensemble's performance, while still considering cross-validation metrics.
+This approach aligns with standard machine learning conventions.
+In the initial experiments, cross-validation metrics serve as thresholds for model selection.
+During the optimization phase, only cross-validation metrics guide the search for optimal hyperparameters.
+For the stacking ensemble experiment, both cross-validation and test set metrics are evaluated, with a primary focus on the \gls{rmsep} metric.
+This approach aims to make our final model accurate, robust, and generalizable to unseen data, providing a balanced evaluation through both cross-validation and test set metrics.

From a349af6c82327e6ce0c74afc6e1b31ca09844d6f Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 22:35:40 +0200
Subject: [PATCH 2/8] fix spelling errors

---
 .../proposed_approach/optimization_framework.tex       | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/report_thesis/src/sections/proposed_approach/optimization_framework.tex b/report_thesis/src/sections/proposed_approach/optimization_framework.tex
index 4d46af17..7d008e7a 100644
--- a/report_thesis/src/sections/proposed_approach/optimization_framework.tex
+++ b/report_thesis/src/sections/proposed_approach/optimization_framework.tex
@@ -2,8 +2,8 @@ \subsection{Optimization Framework}\label{sec:optimization_framework}
 One of the primary challenges in developing a stacking ensemble is determining the optimal choice of base estimators. \citet{wolpertstacked_1992} highlighted that this can be considered a 'black art' and that the choice usually relies on intelligent guesses.
 In our case, this problem is further exacerbated by the fact that the optimal choice of base estimator may vary depending on the target oxide.
 The complexity of the problem is increased because different oxides require different models, and the optimal preprocessing techniques will depend on both the model and the specific oxide being predicted.
-Due to the challenges highligted in \ref{subsec:challenges}, namely high dimensionality, multicollinearity, and matrix effects, it is difficult to determine which configuration is optimal.
-Selecting the appropriate preprocessing steps for each base estimator is essential, as incorrect preprocessing can significantly degrade performance and undermine the model's effectiveness
+Due to the challenges highlighted in \ref{subsec:challenges}, namely high dimensionality, multicollinearity, and matrix effects, it is difficult to determine which configuration is optimal.
+Selecting the appropriate preprocessing steps for each base estimator is essential, as incorrect preprocessing can significantly degrade performance and undermine the model's effectiveness.
 Furthermore, choosing the right hyperparameters for each base estimator introduces additional complexity, as these decisions also significantly impact model performance and must be carefully tuned for each specific oxide.
 Some estimators might require very little tuning to achieve accurate and robust predictions, while others might require extensive tuning, depending on the target oxide.
 For instance, simpler approaches like \gls{enet} and ridge regression may quickly reach their optimal performance with minimal hyperparameter adjustments. However, due to their simplicity, they often fail to capture the complex patterns in the data that more advanced models can, making them less competitive despite their ease of tuning.
@@ -16,14 +16,14 @@ \subsection{Optimization Framework}\label{sec:optimization_framework}
 To guide this process we have developed a working assumption.
 Specifically, we assume that selecting the top-$n$ best pipelines for each oxide, considering different preprocessors and models for each pipeline, will result in the best pipelines for a given oxide in our stacking ensemble.
 Here, $n$ is a heuristic based on the results and \textit{best} is evaluated in terms of the metrics outlined in Section~\ref{subsec:evaluation_metrics}.
-Additionaly, each permutation will utilize our proposed data partitioning and cross-validation strategy outlined in Section~\ref{subsec:validation_testing_procedures}.
-Utilizing our proposed data partitioning and cross-validation strategy, along with the aformentioned evaluation metrics, will ensure that the top-$n$ pipelines align with our goals of generalization, robustness, and accuracy outlined in Section~\ref{sec:problem_definition}.
+Additionally, each permutation will utilize our proposed data partitioning and cross-validation strategy outlined in Section~\ref{subsec:validation_testing_procedures}.
+Utilizing our proposed data partitioning and cross-validation strategy, along with the aforementioned evaluation metrics, will ensure that the top-$n$ pipelines align with our goals of generalization, robustness, and accuracy outlined in Section~\ref{sec:problem_definition}.
 This narrows our focus to three key tasks: selecting suitable preprocessors and models, finding the optimal hyperparameters, and devising a guided search strategy to evaluate various permutations and identify the top-$n$ pipelines for each oxide.
 First, we curated a diverse set of models and preprocessing techniques, as detailed in Section~\ref{sec:model_selection}.
 Next, we developed an optimization framework to systematically explore and optimize these pipeline configurations, which will be described in the following section.
 
 \subsubsection{The Framework}
-To systematically explore and optimize pipeline configurations, the search process should be guided by an ojective function.
+To systematically explore and optimize pipeline configurations, the search process should be guided by an objective function.
 Based on the evaluation process outlined in Section~\ref{subsec:validation_testing_procedures}, whereby we argue that solely evaluating on the \gls{rmsep} may lead to misleading and poor results, we define the objective function we wish to optimize as a multi-objective optimization on minimizing the \texttt{rmse\_cv} and \texttt{std\_dev\_cv}.
 
 Given these goals, traditional methods like grid search and random search could be used, but they often fall short due to several inherent limitations.

From 36bb927a4a4f55b3caa01814d744b7abc77e14f9 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:09:32 +0200
Subject: [PATCH 3/8] supplementary analysis of stacking results

---
 .../experiments/stacking_ensemble.tex         | 23 +++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/report_thesis/src/sections/experiments/stacking_ensemble.tex b/report_thesis/src/sections/experiments/stacking_ensemble.tex
index abfcd891..b4a43ae6 100644
--- a/report_thesis/src/sections/experiments/stacking_ensemble.tex
+++ b/report_thesis/src/sections/experiments/stacking_ensemble.tex
@@ -59,6 +59,25 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 The evaluation metrics are shown in Table~\ref{tab:stacking_ensemble_results_enet}, Table~\ref{tab:stacking_ensemble_results_enet_01}, and Table~\ref{tab:stacking_ensemble_results_svr}.
 Additionally, we provide 1:1 plots for each ensemble in Figures~\ref{fig:elasticnet_one_to_one}, \ref{fig:enetalpha01_one_to_one}, and \ref{fig:svr_one_to_one}, showing the actual versus predicted values for each oxide.
 
+For the \gls{enet} meta-learner with $\alpha = 1$, the \gls{rmsep} values range from 0.470 for \ce{Na2O} to 3.588 for \ce{SiO2}.
+The \gls{rmsecv} values are generally higher, which could initially suggest overfitting.
+However, considering our testing and validation strategy, this discrepancy is expected.
+Our method for partitioning ensures that extreme values are included in the training folds but not in the test set, making the test set easier to predict.
+This results in lower \gls{rmsep} values compared to \gls{rmsecv} values, which is a deliberate trade-off to provide a fairer assessment of the model's generalization performance.
+The standard deviations of the \gls{rmsecv} are relatively low, suggesting consistent performance across folds.
+However, the \gls{rmsep} values indicate that the model's performance on the test set is not as robust, particularly for \ce{SiO2} and \ce{FeO_T}.
+
+When the \gls{enet} meta-learner's $\alpha$ is reduced to 0.1, there is a noticeable improvement in the \gls{rmsep} for \ce{TiO2}, dropping from 0.571 to 0.319.
+This suggests that reducing the regularization parameter helps in better capturing the variance in the data.
+The \gls{rmsecv} values also show a slight improvement, indicating better generalization.
+However, the standard deviations remain similar, suggesting that the model's consistency across folds is maintained.
+
+The \gls{svr} meta-learner shows the best performance for several oxides, particularly \ce{SiO2} and \ce{Na2O}, with \gls{rmsep} values of 3.473 and 0.369, respectively.
+
+The results presented above indicate a strong performance from the stacking ensemble approach.
+However, it is important to note that some evaluation metrics are worse in the stacking approach than in certain individual configurations.
+We believe that further tuning, particularly of the meta-learner's hyperparameters, could substantially improve these results.
+
 A notable observation from our results is that different meta-learners exhibited varying performance levels across oxides.
 We observed that the final predictions were strongly affected by the meta-learner, going as far as rendering some predictions nonsensical if the wrong meta-learner was chosen.
 Specifically, for \ce{TiO2}, we observed that predictions remained near-constant values despite varying the combination of model configurations in the \ce{TiO2} ensemble.
@@ -86,10 +105,6 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 The improvements are consistent across most oxides, with \gls{enet} and \gls{svr} models both outperforming the \gls{moc} (replica) model.
 This shows that the ensemble approach, particularly with these meta-learners, enhances prediction accuracy for the oxides we tested.
 
-The results presented above indicate a strong performance from the stacking ensemble approach.
-However, it is important to note that some evaluation metrics are worse in the stacking approach than in certain individual configurations.
-We believe that further tuning, particularly of the meta-learner's hyperparameters, could substantially improve these results.
-
 \begin{table}
 \centering
 \caption{Stacking ensemble results using the \gls{enet} model as the meta-learner with $\alpha = 1$.}

From 49a1be7f43722bf671d089bccb9447df84959bb8 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:41:25 +0200
Subject: [PATCH 4/8] std dev observation

---
 .../src/sections/experiments/stacking_ensemble.tex       | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/report_thesis/src/sections/experiments/stacking_ensemble.tex b/report_thesis/src/sections/experiments/stacking_ensemble.tex
index b4a43ae6..a6e29688 100644
--- a/report_thesis/src/sections/experiments/stacking_ensemble.tex
+++ b/report_thesis/src/sections/experiments/stacking_ensemble.tex
@@ -74,6 +74,9 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 
 The \gls{svr} meta-learner shows the best performance for several oxides, particularly \ce{SiO2} and \ce{Na2O}, with \gls{rmsep} values of 3.473 and 0.369, respectively.
 
+We generally observe that the standard deviation metrics are close to the corresponding \gls{rmse} values, indicating low variability in the prediction errors.
+This suggests robustness of the predictions.
+
 The results presented above indicate a strong performance from the stacking ensemble approach.
 However, it is important to note that some evaluation metrics are worse in the stacking approach than in certain individual configurations.
 We believe that further tuning, particularly of the meta-learner's hyperparameters, could substantially improve these results.
@@ -110,7 +113,7 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 \caption{Stacking ensemble results using the \gls{enet} model as the meta-learner with $\alpha = 1$.}
 \begin{tabular}{lcccc}
 \toprule
-Oxide          & \gls{rmsep} & STDDEV & \gls{rmsecv}         & Std. Dev. CV          \\
+Oxide          & \gls{rmsep} & Std. Dev. & \gls{rmsecv}         & Std. Dev. CV          \\
 \midrule
 \ce{SiO2}      & 3.588       & 3.582  & 4.680 $\pm$ 0.500    & 4.670 $\pm$ 0.516     \\
 \ce{TiO2}      & 0.571       & 0.565  & 0.818 $\pm$ 0.111    & 0.814 $\pm$ 0.117     \\
@@ -130,7 +133,7 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 \caption{Stacking ensemble results using the \gls{enet} model as the meta-learner with $\alpha = 0.1$.}
 \begin{tabular}{lcccc}
 \toprule
-Oxide          & \gls{rmsep} & STDDEV & \gls{rmsecv}         & Std. Dev. CV          \\
+Oxide          & \gls{rmsep} & Std. Dev. & \gls{rmsecv}         & Std. Dev. CV          \\
 \midrule
 \ce{SiO2}      & 3.598       & 3.591  & 4.686 $\pm$ 0.489    & 4.677 $\pm$ 0.505     \\
 \ce{TiO2}      & 0.319       & 0.310  & 0.450 $\pm$ 0.083    & 0.448 $\pm$ 0.083     \\
@@ -150,7 +153,7 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 \caption{Stacking ensemble results using the \gls{svr} model as the meta-learner with default hyperparameters.}
 \begin{tabular}{lcccc}
 \toprule
-Oxide          & \gls{rmsep} & STDDEV & \gls{rmsecv}         & Std. Dev. CV          \\
+Oxide          & \gls{rmsep} & Std. Dev. & \gls{rmsecv}         & Std. Dev. CV          \\
 \midrule
 \ce{SiO2}      & 3.473       & 3.478  & 5.064 $\pm$ 0.932    & 5.061 $\pm$ 0.926     \\
 \ce{TiO2}      & 0.340       & 0.333  & 0.442 $\pm$ 0.087    & 0.442 $\pm$ 0.087     \\

From bf80aab0e3f48ba309c210881c203652d00c464e Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:42:32 +0200
Subject: [PATCH 5/8] restore tail

---
 .../src/sections/experiments/stacking_ensemble.tex        | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/report_thesis/src/sections/experiments/stacking_ensemble.tex b/report_thesis/src/sections/experiments/stacking_ensemble.tex
index a6e29688..c825c9cb 100644
--- a/report_thesis/src/sections/experiments/stacking_ensemble.tex
+++ b/report_thesis/src/sections/experiments/stacking_ensemble.tex
@@ -77,10 +77,6 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 We generally observe that the standard deviation metrics are close to the corresponding \gls{rmse} values, indicating low variability in the prediction errors.
 This suggests robustness of the predictions.
 
-The results presented above indicate a strong performance from the stacking ensemble approach.
-However, it is important to note that some evaluation metrics are worse in the stacking approach than in certain individual configurations.
-We believe that further tuning, particularly of the meta-learner's hyperparameters, could substantially improve these results.
-
 A notable observation from our results is that different meta-learners exhibited varying performance levels across oxides.
 We observed that the final predictions were strongly affected by the meta-learner, going as far as rendering some predictions nonsensical if the wrong meta-learner was chosen.
 Specifically, for \ce{TiO2}, we observed that predictions remained near-constant values despite varying the combination of model configurations in the \ce{TiO2} ensemble.
@@ -108,6 +104,10 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 The improvements are consistent across most oxides, with \gls{enet} and \gls{svr} models both outperforming the \gls{moc} (replica) model.
 This shows that the ensemble approach, particularly with these meta-learners, enhances prediction accuracy for the oxides we tested.
 
+The results presented above indicate a strong performance from the stacking ensemble approach.
+However, it is important to note that some evaluation metrics are worse in the stacking approach than in certain individual configurations.
+We believe that further tuning, particularly of the meta-learner's hyperparameters, could substantially improve these results.
+
 \begin{table}
 \centering
 \caption{Stacking ensemble results using the \gls{enet} model as the meta-learner with $\alpha = 1$.}

From 4184e4e92d37612058129aaf1ec59a0070bd72e3 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:45:39 +0200
Subject: [PATCH 6/8] remove wrong statement

---
 report_thesis/src/sections/experiments/stacking_ensemble.tex | 1 -
 1 file changed, 1 deletion(-)

diff --git a/report_thesis/src/sections/experiments/stacking_ensemble.tex b/report_thesis/src/sections/experiments/stacking_ensemble.tex
index c825c9cb..19d7ba73 100644
--- a/report_thesis/src/sections/experiments/stacking_ensemble.tex
+++ b/report_thesis/src/sections/experiments/stacking_ensemble.tex
@@ -65,7 +65,6 @@ \subsubsection{Results for Stacking Ensemble}\label{subsec:stacking_ensemble_res
 Our method for partitioning ensures that extreme values are included in the training folds but not in the test set, making the test set easier to predict.
 This results in lower \gls{rmsep} values compared to \gls{rmsecv} values, which is a deliberate trade-off to provide a fairer assessment of the model's generalization performance.
 The standard deviations of the \gls{rmsecv} are relatively low, suggesting consistent performance across folds.
-However, the \gls{rmsep} values indicate that the model's performance on the test set is not as robust, particularly for \ce{SiO2} and \ce{FeO_T}.
 
 When the \gls{enet} meta-learner's $\alpha$ is reduced to 0.1, there is a noticeable improvement in the \gls{rmsep} for \ce{TiO2}, dropping from 0.571 to 0.319.
 This suggests that reducing the regularization parameter helps in better capturing the variance in the data.

From a3b2c0615372d7937a6786b53563e1cebe47a1f1 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:49:36 +0200
Subject: [PATCH 7/8] add cite for conventions

---
 report_thesis/src/references.bib                   | 14 +++++++++++++-
 .../proposed_approach/testing_validation.tex       |  2 +-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/report_thesis/src/references.bib b/report_thesis/src/references.bib
index 549693b4..8d992684 100644
--- a/report_thesis/src/references.bib
+++ b/report_thesis/src/references.bib
@@ -866,4 +866,16 @@ @article{sirven_pca_ann_plsr
   note = {PMID: 16503595},
   url = {https://doi.org/10.1021/ac051721p},
   eprint = {https://doi.org/10.1021/ac051721p}
-}
\ No newline at end of file
+}
+
+@book{geronHandsonMachineLearning2023,
+  title = {Hands-on Machine Learning with {{Scikit-Learn}}, {{Keras}}, and {{TensorFlow}}: Concepts, Tools, and Techniques to Build Intelligent Systems},
+  shorttitle = {Hands-on Machine Learning with {{Scikit-Learn}}, {{Keras}}, and {{TensorFlow}}},
+  author = {Géron, Aurélien},
+  date = {2023},
+  edition = {Third edition},
+  publisher = {O'Reilly},
+  isbn = {978-1-09-812597-4},
+  langid = {english},
+  pagetotal = {834},
+}
diff --git a/report_thesis/src/sections/proposed_approach/testing_validation.tex b/report_thesis/src/sections/proposed_approach/testing_validation.tex
index a56bfcbb..058d4b0d 100644
--- a/report_thesis/src/sections/proposed_approach/testing_validation.tex
+++ b/report_thesis/src/sections/proposed_approach/testing_validation.tex
@@ -212,7 +212,7 @@ \subsubsection{Discussion of Testing and Validation Strategy}
 In our initial and optimization experiments, we prioritize cross-validation metrics to evaluate the models.
 This strategy mitigates the risk of overfitting to the test set by avoiding a bias towards lower \gls{rmsep} values.
 Conversely, for the stacking ensemble experiment, we emphasize test set metrics to comprehensively assess the ensemble's performance, while still considering cross-validation metrics.
-This approach aligns with standard machine learning conventions.
+This approach aligns with standard machine learning conventions\cite{geronHandsonMachineLearning2023}.
 In the initial experiments, cross-validation metrics serve as thresholds for model selection.
 During the optimization phase, only cross-validation metrics guide the search for optimal hyperparameters.
 For the stacking ensemble experiment, both cross-validation and test set metrics are evaluated, with a primary focus on the \gls{rmsep} metric.

From 58b83de21f4aa52a6716cac2b861ecfff7ec43d5 Mon Sep 17 00:00:00 2001
From: Christian Bager Bach Houmann <christian@bagerbach.com>
Date: Wed, 12 Jun 2024 23:50:04 +0200
Subject: [PATCH 8/8] Update
 report_thesis/src/sections/proposed_approach/testing_validation.tex

Co-authored-by: Pattrigue <57709490+Pattrigue@users.noreply.github.com>
---
 .../src/sections/proposed_approach/testing_validation.tex       | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/report_thesis/src/sections/proposed_approach/testing_validation.tex b/report_thesis/src/sections/proposed_approach/testing_validation.tex
index 058d4b0d..644e9ab0 100644
--- a/report_thesis/src/sections/proposed_approach/testing_validation.tex
+++ b/report_thesis/src/sections/proposed_approach/testing_validation.tex
@@ -213,7 +213,7 @@ \subsubsection{Discussion of Testing and Validation Strategy}
 This strategy mitigates the risk of overfitting to the test set by avoiding a bias towards lower \gls{rmsep} values.
 Conversely, for the stacking ensemble experiment, we emphasize test set metrics to comprehensively assess the ensemble's performance, while still considering cross-validation metrics.
 This approach aligns with standard machine learning conventions\cite{geronHandsonMachineLearning2023}.
-In the initial experiments, cross-validation metrics serve as thresholds for model selection.
+In the initial experiment, cross-validation metrics serve as thresholds for model selection.
 During the optimization phase, only cross-validation metrics guide the search for optimal hyperparameters.
 For the stacking ensemble experiment, both cross-validation and test set metrics are evaluated, with a primary focus on the \gls{rmsep} metric.
 This approach aims to make our final model accurate, robust, and generalizable to unseen data, providing a balanced evaluation through both cross-validation and test set metrics.