From 90df5e6852d2af15a6f600592dc39118d46c29f4 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 09:45:55 +0200 Subject: [PATCH 1/7] Humble phrasing --- report_thesis/src/index.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/report_thesis/src/index.tex b/report_thesis/src/index.tex index 25de6a6b..48d25d1b 100644 --- a/report_thesis/src/index.tex +++ b/report_thesis/src/index.tex @@ -9,7 +9,7 @@ By integrating machine learning techniques and ensemble regression models, the study addresses challenges like high dimensionality, multicollinearity, and limited data availability. Key innovations include the use of stacked generalization for improved model performance and an automated hyperparameter optimization framework. The research contributes a comprehensive catalog of models and preprocessing techniques, and integrates findings into the \gls{pyhat} by the \gls{usgs}, enhancing its scientific capabilities. -This work lays a robust foundation for future advancements in geochemical analysis and planetary exploration using \gls{libs} data. +This work aims to establish a robust foundation for future advancements in geochemical analysis and planetary exploration using \gls{libs} data. \end{abstract} \maketitle From 2bb6169cbc00277557dd839d140821d926d02224 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 09:53:41 +0200 Subject: [PATCH 2/7] Fixed middle names --- report_thesis/src/index.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/report_thesis/src/index.tex b/report_thesis/src/index.tex index 48d25d1b..677b7609 100644 --- a/report_thesis/src/index.tex +++ b/report_thesis/src/index.tex @@ -16,7 +16,7 @@ \subsubsection*{Acknowledgements:} We would like to thank our supervisors Daniele Dell'Aglio and Juan Manuel Rodriguez for their guidance and support throughout this project. We also thank our external supervisor Jens Frydenvang for his guidance as a domain expert from the ChemCam team, as well as volunteering his time to provide feedback on our work. -Furthermore, we extend our gratitude to Ryan B. Anderson and Travis S. Gabriel for their invaluable discussions regarding calibration and quantification based on \gls{libs} data. We also thank them for the opportunity to contribute to \gls{pyhat}. +Furthermore, we extend our gratitude to Ryan B. Anderson and Travis S.J. Gabriel for their invaluable discussions regarding calibration and quantification based on \gls{libs} data. We also thank them for the opportunity to contribute to \gls{pyhat}. \input{sections/introduction.tex} \input{sections/related_work.tex} From f6d4894ed8509445e25380cb5048b662c3f938c4 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 10:02:18 +0200 Subject: [PATCH 3/7] titles --- report_thesis/src/index.tex | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/report_thesis/src/index.tex b/report_thesis/src/index.tex index 677b7609..565846e8 100644 --- a/report_thesis/src/index.tex +++ b/report_thesis/src/index.tex @@ -14,9 +14,9 @@ \maketitle -\subsubsection*{Acknowledgements:} We would like to thank our supervisors Daniele Dell'Aglio and Juan Manuel Rodriguez for their guidance and support throughout this project. -We also thank our external supervisor Jens Frydenvang for his guidance as a domain expert from the ChemCam team, as well as volunteering his time to provide feedback on our work. -Furthermore, we extend our gratitude to Ryan B. Anderson and Travis S.J. Gabriel for their invaluable discussions regarding calibration and quantification based on \gls{libs} data. We also thank them for the opportunity to contribute to \gls{pyhat}. +\subsubsection*{Acknowledgements:} We would like to thank our supervisors Dr. Daniele Dell'Aglio and Dr. Juan Manuel Rodriguez for their guidance and support throughout this project. +We also thank our external supervisor Dr. Jens Frydenvang for his guidance as a domain expert from the ChemCam team, as well as volunteering his time to provide feedback on our work. +Furthermore, we extend our gratitude to Dr. Ryan B. Anderson and Dr. Travis S.J. Gabriel for their invaluable discussions regarding calibration and quantification based on \gls{libs} data. We also thank them for the opportunity to contribute to \gls{pyhat}. \input{sections/introduction.tex} \input{sections/related_work.tex} From 4492a9dc66d549cf52d7b6f182a8b80a21a56cae Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 10:06:52 +0200 Subject: [PATCH 4/7] word --- report_thesis/src/sections/introduction.tex | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/report_thesis/src/sections/introduction.tex b/report_thesis/src/sections/introduction.tex index d25394de..5550fd2b 100644 --- a/report_thesis/src/sections/introduction.tex +++ b/report_thesis/src/sections/introduction.tex @@ -15,7 +15,7 @@ \section{Introduction}\label{sec:introduction} Tailored approaches have also been developed, where different models are selected based on their performance with specific spectral characteristics~\cite{rezaei_dimensionality_reduction, andersonPostlandingMajorElement2022}. Moreover, models incorporating physical principles have demonstrated improved accuracy by handling residuals that traditional models fail to explain~\cite{song_DF-K-ELM}. However, predicting oxide compositions remains challenging due to the complex, nonlinear nature of \gls{libs} data. -This underscores the need for continued research into more adaptive and robust machine learning strategies to tackle these issues effectively. +This underscores the need for continued research into more accurate and robust machine learning strategies to tackle these issues effectively. This thesis aims to improve upon previous work in the field of \gls{libs} data analysis. Our goal is to develop a machine learning pipeline that is tailored to the unique characteristics of \gls{libs} data, with the goal of achieving higher prediction accuracy and robustness. From 8bee710f5a9488d6b3fe02b3fd55837a47e778e9 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 10:16:30 +0200 Subject: [PATCH 5/7] fixes --- report_thesis/src/sections/introduction.tex | 1 + 1 file changed, 1 insertion(+) diff --git a/report_thesis/src/sections/introduction.tex b/report_thesis/src/sections/introduction.tex index 5550fd2b..55569687 100644 --- a/report_thesis/src/sections/introduction.tex +++ b/report_thesis/src/sections/introduction.tex @@ -49,6 +49,7 @@ \section{Introduction}\label{sec:introduction} Section~\ref{sec:proposed_approach} presents our proposed approach for optimizing pipeline configurations, detailing the selection of models and preprocessing techniques, our approach to data partitioning, validation and testing procedures, and the implementation of the hyperparameter optimization framework. Section~\ref{sec:methodology} presents the design and results of our experiments, as well as the analysis of the results. Our experiments include initial model selection, hyperparameter optimization, and the final evaluation of our proposed stacking ensemble. +Section~\ref{sec:pyhat_contribution} discusses our contribution to \gls{pyhat} and how our work has been integrated into the toolset. Finally, Section~\ref{sec:conclusion} summarizes our key findings and contributions, while Section~\ref{sec:future_work} discusses potential future research directions and improvements. Due to the overlapping nature of terminology used in \gls{libs} data analysis and machine learning, we provide a list of terms in Table~\ref{tab:terms} to clarify their meaning. From aeea5d76aba1ef22931993f8c88767d5f3ec55a3 Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 10:22:54 +0200 Subject: [PATCH 6/7] fix the nasa --- report_thesis/src/sections/background/data_overview.tex | 2 +- report_thesis/src/sections/experiments/data_preparation.tex | 2 +- report_thesis/src/sections/introduction.tex | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex index e5632410..1171105a 100644 --- a/report_thesis/src/sections/background/data_overview.tex +++ b/report_thesis/src/sections/background/data_overview.tex @@ -1,5 +1,5 @@ \subsection{Data Overview}\label{sec:data-overview} -Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from \gls{nasa}'s \gls{pds}~\cite{PDSGeoscienceNode}. +Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from the \gls{nasa}'s \gls{pds}~\cite{PDSGeoscienceNode}. \gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. diff --git a/report_thesis/src/sections/experiments/data_preparation.tex b/report_thesis/src/sections/experiments/data_preparation.tex index a22749d8..5f7f8e7a 100644 --- a/report_thesis/src/sections/experiments/data_preparation.tex +++ b/report_thesis/src/sections/experiments/data_preparation.tex @@ -1,6 +1,6 @@ \subsection{Data Preparation}\label{sec:data-preparation} The first step in our methodology is to prepare the datasets for model training and evaluation. -As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from \gls{nasa}'s \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. +As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from the \gls{nasa}'s \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. The initial five shots from each sample are excluded because they are usually contaminated by dust covering the sample, which is cleared away by the shock waves produced by the laser \cite{cleggRecalibrationMarsScience2017}. The remaining 45 shots from each location are then averaged, yielding a single spectrum $s$ per location $l$ in the Averaged Intensity Tensor\ref{matrix:averaged_intensity}, resulting in a total of five spectra for each sample. diff --git a/report_thesis/src/sections/introduction.tex b/report_thesis/src/sections/introduction.tex index 36879465..b1bec729 100644 --- a/report_thesis/src/sections/introduction.tex +++ b/report_thesis/src/sections/introduction.tex @@ -1,9 +1,9 @@ \section{Introduction}\label{sec:introduction} -\gls{nasa} has been studying the Martian environment for decades through a series of missions, including the Viking missions~\cite{marsnasagov_vikings}, the \gls{mer} mission~\cite{marsnasagov_observer, marsnasagov_spirit_opportunity}, and the \gls{msl} mission~\cite{marsnasagov_msl}, each building on the knowledge gained from the previous ones. +The \gls{nasa} has been studying the Martian environment for decades through a series of missions, including the Viking missions~\cite{marsnasagov_vikings}, the \gls{mer} mission~\cite{marsnasagov_observer, marsnasagov_spirit_opportunity}, and the \gls{msl} mission~\cite{marsnasagov_msl}, each building on the knowledge gained from the previous ones. Today, the rovers exploring Mars are equipped with sophisticated instruments for analyzing the chemical composition of Martian soil in search of past life and habitable environments. Part of this research is facilitated through interpretation of spectral data gathered by \gls{libs} instruments, which fire a high-powered laser at soil samples to create a plasma. -The emitted light is captured by spectrometers and analyzed using machine learning models to assess the presence and concentration of certain major oxides, informing \gls{nasa}'s understanding of Mars' geology~\cite{cleggRecalibrationMarsScience2017}. +The emitted light is captured by spectrometers and analyzed using machine learning models to assess the presence and concentration of certain major oxides, informing the \gls{nasa}'s understanding of Mars' geology~\cite{cleggRecalibrationMarsScience2017}. However, predicting major oxide compositions from \gls{libs} data still presents significant computational challenges. These include the high dimensionality and non-linearity of the data, compounded by issues of multicollinearity and matrix effects~\cite{andersonImprovedAccuracyQuantitative2017}. From d1f0e488e5e7b6ffc8ce8cafca55bc4eb4a134fa Mon Sep 17 00:00:00 2001 From: Ivikhostrup Date: Thu, 13 Jun 2024 10:31:32 +0200 Subject: [PATCH 7/7] fine... --- report_thesis/src/sections/background/data_overview.tex | 2 +- report_thesis/src/sections/experiments/data_preparation.tex | 2 +- report_thesis/src/sections/introduction.tex | 2 +- report_thesis/src/sections/summary.tex | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/report_thesis/src/sections/background/data_overview.tex b/report_thesis/src/sections/background/data_overview.tex index 1171105a..e5632410 100644 --- a/report_thesis/src/sections/background/data_overview.tex +++ b/report_thesis/src/sections/background/data_overview.tex @@ -1,5 +1,5 @@ \subsection{Data Overview}\label{sec:data-overview} -Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from the \gls{nasa}'s \gls{pds}~\cite{PDSGeoscienceNode}. +Similarly to our previous work (\citet{p9_paper}), we used the publicly available \gls{ccs} data from \gls{nasa}'s \gls{pds}~\cite{PDSGeoscienceNode}. \gls{ccs} refers to \gls{libs} data that has been through a series of preprocessing steps such as subtracting the ambient light background, noise removal and removing the electron continuum to derive data that is more suitable for quantitative analysis. A comprehensive description of this preprocessing procedure is available in \citet{wiensPreflightCalibrationInitial2013}. diff --git a/report_thesis/src/sections/experiments/data_preparation.tex b/report_thesis/src/sections/experiments/data_preparation.tex index 5f7f8e7a..a22749d8 100644 --- a/report_thesis/src/sections/experiments/data_preparation.tex +++ b/report_thesis/src/sections/experiments/data_preparation.tex @@ -1,6 +1,6 @@ \subsection{Data Preparation}\label{sec:data-preparation} The first step in our methodology is to prepare the datasets for model training and evaluation. -As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from the \gls{nasa}'s \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. +As mentioned in Section~\ref{sec:data-overview}, the data used in this study was obtained from \gls{nasa}'s \gls{pds} and consists of \gls{ccs} data and major oxide compositions for various samples. The initial five shots from each sample are excluded because they are usually contaminated by dust covering the sample, which is cleared away by the shock waves produced by the laser \cite{cleggRecalibrationMarsScience2017}. The remaining 45 shots from each location are then averaged, yielding a single spectrum $s$ per location $l$ in the Averaged Intensity Tensor\ref{matrix:averaged_intensity}, resulting in a total of five spectra for each sample. diff --git a/report_thesis/src/sections/introduction.tex b/report_thesis/src/sections/introduction.tex index b1bec729..3cedd402 100644 --- a/report_thesis/src/sections/introduction.tex +++ b/report_thesis/src/sections/introduction.tex @@ -3,7 +3,7 @@ \section{Introduction}\label{sec:introduction} Today, the rovers exploring Mars are equipped with sophisticated instruments for analyzing the chemical composition of Martian soil in search of past life and habitable environments. Part of this research is facilitated through interpretation of spectral data gathered by \gls{libs} instruments, which fire a high-powered laser at soil samples to create a plasma. -The emitted light is captured by spectrometers and analyzed using machine learning models to assess the presence and concentration of certain major oxides, informing the \gls{nasa}'s understanding of Mars' geology~\cite{cleggRecalibrationMarsScience2017}. +The emitted light is captured by spectrometers and analyzed using machine learning models to assess the presence and concentration of certain major oxides, informing \gls{nasa}'s understanding of Mars' geology~\cite{cleggRecalibrationMarsScience2017}. However, predicting major oxide compositions from \gls{libs} data still presents significant computational challenges. These include the high dimensionality and non-linearity of the data, compounded by issues of multicollinearity and matrix effects~\cite{andersonImprovedAccuracyQuantitative2017}. diff --git a/report_thesis/src/sections/summary.tex b/report_thesis/src/sections/summary.tex index 407e4b53..bd3d13bb 100644 --- a/report_thesis/src/sections/summary.tex +++ b/report_thesis/src/sections/summary.tex @@ -5,7 +5,7 @@ \section*{Summary} \vspace{0.5em} -For decades, \gls{nasa} has deployed rovers equipped with advanced instruments to analyze the Martian environment. +For decades, the \gls{nasa} has deployed rovers equipped with advanced instruments to analyze the Martian environment. The two most recent rovers, Curiosity and Perseverance, are equipped with the \gls{chemcam} and SuperCam \gls{libs} instruments, respectively. \gls{libs} is a powerful technique for analyzing the chemical composition of Martian soil, offering valuable insights into the planet's geology and potential for past habitability. This technique involves firing high-powered lasers at soil samples to create plasma, which emits light that is captured by spectrometers aboard the rovers.