From 05303f3f0075c6aba9f7b20f104d29f9a447d27d Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 6 Jun 2024 12:14:18 +0200 Subject: [PATCH 1/4] Moved description of ccs data --- .../src/sections/background/data_overview.tex | 24 +++++++++++++++++++ .../src/sections/background/index.tex | 3 ++- report_thesis/src/sections/methodology.tex | 23 ++---------------- 3 files changed, 28 insertions(+), 22 deletions(-) create mode 100644 report_thesis/src/sections/background/data_overview.tex diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex new file mode 100644 index 00000000..889666bd --- /dev/null +++ b/report_thesis/src/sections/background/data_overview.tex @@ -0,0 +1,24 @@ +\subsection{Data Overview}\label{sec:data-overview} +Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}. +\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. +A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. + +\begin{table*}[h] +\centering +\begin{tabular}{llllllll} +\toprule + wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\ +\midrule +240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\ +240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\ +$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\ +905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\ +905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\ +\bottomrule +\end{tabular} +\caption{Example of CCS data for a single location (from \citet{p9_paper})} +\label{tab:ccs_data_example} +\end{table*} + +While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. +Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. \ No newline at end of file diff --git a/report_thesis/src/sections/background/index.tex b/report_thesis/src/sections/background/index.tex index b85edbe6..9a0ec9ec 100644 --- a/report_thesis/src/sections/background/index.tex +++ b/report_thesis/src/sections/background/index.tex @@ -1,8 +1,9 @@ \section{Background}\label{sec:background} -In this section, we provide an overview of the preprocessing techniques and machine learning models used in our proposed pipeline. +In this section, we provide an overview of the data used in this work, preprocessing techniques, and machine learning models used in our proposed pipeline. We outline the various normalization techniques and dimensionality reduction methods, followed by the ensemble learning, linear models, and regularization models used. Finally, we outline stacked generalization. +\input{sections/background/data_overview.tex} \input{sections/background/preprocessing/index.tex} \input{sections/background/ensemble_learning_models/index.tex} \input{sections/background/linear_and_regularization_models/index.tex} \ No newline at end of file diff --git a/report_thesis/src/sections/methodology.tex b/report_thesis/src/sections/methodology.tex index ae555529..fe3ad350 100644 --- a/report_thesis/src/sections/methodology.tex +++ b/report_thesis/src/sections/methodology.tex @@ -5,28 +5,9 @@ \section{Experimental Design}\label{sec:methodology} \subsection{Data Preparation} -Similarly to our previous work \citet{p9_paper}, we used the publicly available \gls{ccs} data from NASA's \gls{pds}~\cite{PDSGeoscienceNode}. -\gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. -A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. +The first step in our methodology is to prepare the datasets for model training and evaluation. +As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from NASA's \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. -\begin{table*}[h] -\centering -\begin{tabular}{llllllll} -\toprule - wave & shot1 & shot2 & $\cdots$ & shot49 & shot50 & median & mean \\ -\midrule -240.81100 & 6.4026649e+15 & 4.0429349e+15 & $\cdots$ & 1.7922483e+15 & 1.7126615e+15 & 1.9892956e+15 & 1.7561699e+15 \\ -240.86501 & 3.8557462e+12 & 2.2923680e+12 & $\cdots$ & 1.1355429e+12 & 8.6930379e+11 & 7.8172542e+11 & 7.2805052e+11 \\ -$\vdots$ & $\vdots$ & $\vdots$ & $\cdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ \\ -905.38062 & 1.8823427e+08 & 58500403. & $\cdots$ & -8449286.6 & 8710775.0 & 4.0513312e+09 & 5.2188327e+09 \\ -905.57349 & 1.9864713e+10 & 1.2956832e+10 & $\cdots$ & 1.9785415e+10 & 7.1994239e+09 & 1.1311150e+10 & 1.2201224e+10 \\ -\bottomrule -\end{tabular} -\caption{Example of CCS data for a single location (from \citet{p9_paper})} -\label{tab:ccs_data_example} -\end{table*} - -While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. The initial five shots from each sample are excluded because they are usually contaminated by dust covering the sample, which is cleared away by the shock waves produced by the laser \cite{cleggRecalibrationMarsScience2017}. The remaining 45 shots from each location are then averaged, yielding a single spectrum $s$ per location $l$ in the Averaged Intensity Tensor\ref{matrix:averaged_intensity}, resulting in a total of five spectra for each sample. From bbf780a3d7b577d5bcb2ae959d07328a4a15651b Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Fri, 7 Jun 2024 10:28:45 +0200 Subject: [PATCH 2/4] maybe? --- report_thesis/src/sections/background/data_overview.tex | 5 ++++- report_thesis/src/sections/methodology.tex | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex index 889666bd..1b4f0a88 100644 --- a/report_thesis/src/sections/background/data_overview.tex +++ b/report_thesis/src/sections/background/data_overview.tex @@ -21,4 +21,7 @@ \subsection{Data Overview}\label{sec:data-overview} \end{table*} While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. -Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. \ No newline at end of file +Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. +The \gls{ccs} data will at this stage have negative values and noise at the edges of the spectrometers. +How these are handled is described in Section~\ref{sec:data-preparation}. +Once prepared, the \gls{ccs} data can now be used for preprocessing and model training. \ No newline at end of file diff --git a/report_thesis/src/sections/methodology.tex b/report_thesis/src/sections/methodology.tex index fe3ad350..823c2c11 100644 --- a/report_thesis/src/sections/methodology.tex +++ b/report_thesis/src/sections/methodology.tex @@ -4,7 +4,7 @@ \section{Experimental Design}\label{sec:methodology} We first describe the datasets used, including their preparation and the method of splitting for model training. Next, we outline the preprocessing steps and the model selection process, followed by a detailed explanation of the experimental setup and evaluation metrics. Finally, we discuss our validation testing procedures and the approach taken to ensure unbiased final model evaluations. -\subsection{Data Preparation} +\subsection{Data Preparation}\label{sec:data-preparation} The first step in our methodology is to prepare the datasets for model training and evaluation. As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from NASA's \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. From 2b045f7a533e4eb2c41dc09b4ec4073237515872 Mon Sep 17 00:00:00 2001 From: Ivikhostrup <56341364+Ivikhostrup@users.noreply.github.com> Date: Fri, 7 Jun 2024 10:37:45 +0200 Subject: [PATCH 3/4] Update report_thesis/src/sections/background/data_overview.tex Co-authored-by: Christian Bager Bach Houmann --- report_thesis/src/sections/background/data_overview.tex | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex index 1b4f0a88..daecb0a7 100644 --- a/report_thesis/src/sections/background/data_overview.tex +++ b/report_thesis/src/sections/background/data_overview.tex @@ -20,7 +20,8 @@ \subsection{Data Overview}\label{sec:data-overview} \label{tab:ccs_data_example} \end{table*} -While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. +While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. This includes handling negative values and noise at the edges of the spectrometers, as we will describe in Section~\ref{sec:data-preparation}. +Additional preprocessing steps will be necessary to further refine the data for subsequent analysis and model training. Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. The \gls{ccs} data will at this stage have negative values and noise at the edges of the spectrometers. How these are handled is described in Section~\ref{sec:data-preparation}. From 80a01d812dcad3ed9a78a0ca5f05b202946a6a86 Mon Sep 17 00:00:00 2001 From: Christian Bager Bach Houmann Date: Fri, 7 Jun 2024 10:40:11 +0200 Subject: [PATCH 4/4] Update report_thesis/src/sections/background/data_overview.tex Co-authored-by: Ivikhostrup <56341364+Ivikhostrup@users.noreply.github.com> --- report_thesis/src/sections/background/data_overview.tex | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex index daecb0a7..1daa3854 100644 --- a/report_thesis/src/sections/background/data_overview.tex +++ b/report_thesis/src/sections/background/data_overview.tex @@ -22,7 +22,4 @@ \subsection{Data Overview}\label{sec:data-overview} While the \gls{ccs} data is in a more suitable form for quantitative analysis, it still requires further preprocessing. This includes handling negative values and noise at the edges of the spectrometers, as we will describe in Section~\ref{sec:data-preparation}. Additional preprocessing steps will be necessary to further refine the data for subsequent analysis and model training. -Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. -The \gls{ccs} data will at this stage have negative values and noise at the edges of the spectrometers. -How these are handled is described in Section~\ref{sec:data-preparation}. -Once prepared, the \gls{ccs} data can now be used for preprocessing and model training. \ No newline at end of file +Table~\ref{tab:ccs_data_example} shows an example of the \gls{ccs} data for a single location of a sample. This corresponds to shots ($s$) and wavelength ($\lambda$) of the Intensity Tensor \ref{matrix:intensity} for this location. \ No newline at end of file