From 5aa98159381569f6dbb961079127056d21fdfd15 Mon Sep 17 00:00:00 2001 From: Christian Bager Bach Houmann Date: Thu, 13 Jun 2024 11:28:22 +0200 Subject: [PATCH] fixes --- .../ensemble_learning_models/stacked_generalization.tex | 2 +- .../sections/proposed_approach/optimization_framework.tex | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/report_thesis/src/sections/background/ensemble_learning_models/stacked_generalization.tex b/report_thesis/src/sections/background/ensemble_learning_models/stacked_generalization.tex index 4ea8b787..8e717f9e 100644 --- a/report_thesis/src/sections/background/ensemble_learning_models/stacked_generalization.tex +++ b/report_thesis/src/sections/background/ensemble_learning_models/stacked_generalization.tex @@ -13,7 +13,7 @@ \subsubsection{Stacked Generalization}\label{subsec:stacked-generalization} \mathbf{Z} = [\hat{\mathbf{y}}_1, \hat{\mathbf{y}}_2, \ldots, \hat{\mathbf{y}}_N] $$ -A meta-model $F$ is subsequently trained on this new dataset $\mathbf{Z}$ to predict the target variable $\mathbf{y}$: +A meta-model $F$ is subsequently trained on this new dataset $\mathbf{Z}$ to predict the target variable $\mathbf{\hat{y}}$: $$ \mathbf{\hat{y}} = F(\mathbf{Z}) diff --git a/report_thesis/src/sections/proposed_approach/optimization_framework.tex b/report_thesis/src/sections/proposed_approach/optimization_framework.tex index 85af1de0..d2cd4a77 100644 --- a/report_thesis/src/sections/proposed_approach/optimization_framework.tex +++ b/report_thesis/src/sections/proposed_approach/optimization_framework.tex @@ -90,11 +90,12 @@ \subsubsection{The Framework} As such, we allow for the optimization framework to optionally use these if they are deemed to be beneficial. In Lines~\ref{step:get_data} to~\ref{step:apply_pipeline}, we fetch the data, apply our data partitioning strategy to generate four cross-validation sets, a training set and a test set, and apply the preprocessing to the datasets. +This partitioning is applied with respect to the current oxide. The purpose of fetching the data for each trial is to ensure no modifications leak through trials, corrupting the dataset over time. -This prevents any form of double preprocessing from occuring, which would lead to potential issues. +This prevents any form of double preprocessing from occurring, which would lead to potential issues. As mentioned in Section~\ref{subsec:validation_testing_procedures}, we use both cross-validation and a test set to evaluate the model. -This can be seen in Line~\ref{step:cross_validate} and Lines~\ref{step:train_model} to~\ref{step:evaluate_model}, where cross-validation, training, and evaluation are performed with respect to the current oxide. +This can be seen in Line~\ref{step:cross_validate} and Lines~\ref{step:train_model} to~\ref{step:evaluate_model}, where cross-validation, training, and evaluation are also performed with respect to the current oxide. It is important to note that in practice, the model $m$ is being reinstantiated in each iteration of the cross-validation, and again before the model is trained, so no learned parameters are carried over between them. Once a trial is complete, the metrics are returned in Line~\ref{step:return_metrics} to the \texttt{optimize} function in the \nameref{alg:study_function}, which then determines the next steps in the optimization process.