Raman_spectra_data

In this repository, we publish the data used in the Master Thesis Data-driven classification of minerals from their Raman spectroscopy scans for autonomous science on Mars (collaboration with NASA project InVADER). The initial dataset is from the RRUFF database. From this database, we consider the LR-Raman subset frow which we keep only raw and the excellent_oriented subset from which we keep only processed samples. For both subsets, we prune classes with less than 8 samples. We only considers shifts from 200 to 1600 cm-1. We perform upsampling and interpolation to have a constant number of features among samples. Finally, the intensities are normalized. The different sets that we obtain are the following:

Fully supervised set

In the data_fully_supervised/ directory, we provide the fully supervised data, that is data with labels. The directory is organized into 2 subdirectories:

raw/ :
- fullsup.pkl correspond to the raw training set
- holdouttest.pkl correspond to the raw test set
processed/ :
- fullsup_processed.pkl correspond to the processed training set
- holdouttest_processed.pkl correspond to the processed test set

Semi-supervised set

In the data_semi_supervised/ directory, we provide the data for the semi-supervised experiments. What is done here is diving the supervised raw dataset (fullsup.pkl) into two subdataset, one with the labels (called supervised_train) and one without the labels (called unsupervised_train). This division is done for several proportion : 10%, 20%, 30%, 40% and 50% of supervised samples.

10_per_cent_of_supervised/ : correspond to subset with 10% of supervised samples and 90% of unsupervised samples
- supervised_train0.1.pkl correspond to the supervised samples
- unsupervised_train0.9.pkl correspond to the unsupervised samples
20_per_cent_of_supervised/ : correspond to subset with 20% of supervised samples and 80% of unsupervised samples
- supervised_train0.2.pkl correspond to the supervised samples
- unsupervised_train0.8.pkl correspond to the unsupervised samples
30_per_cent_of_supervised/ : correspond to subset with 30% of supervised samples and 70% of unsupervised samples
- supervised_train0.3.pkl correspond to the supervised samples
- unsupervised_train0.7.pkl correspond to the unsupervised samples
40_per_cent_of_supervised/ : correspond to subset with 40% of supervised samples and 60% of unsupervised samples
- supervised_train0.4.pkl correspond to the supervised samples
- unsupervised_train0.6.pkl correspond to the unsupervised samples
50_per_cent_of_supervised/ : correspond to subset with 50% of supervised samples and 50% of unsupervised samples
- supervised_train0.5.pkl correspond to the supervised samples
- unsupervised_train0.5.pkl correspond to the unsupervised samples

Shifted test set

In the data_shift_experiment/ directory, we provide the data that we used for investigating the inductive bias of Convolutional Neural Network. The data in this directory is constructed from the raw test set holdouttest.pkl. We shift the intensities by 15 and 30 cm-1 to the right.

holdouttest_shifted_15.pkl : correspond to test set with intensities shifted by 15cm-1
holdouttest_shifted_30.pkl : correspond to test set with intensities shifted by 30cm-1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data_fully_supervised		data_fully_supervised
data_semi_supervised		data_semi_supervised
data_shift_experiment		data_shift_experiment
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Raman_spectra_data

Fully supervised set

Semi-supervised set

Shifted test set

About

Releases

Packages

denizsoysal/Raman_spectra_data

Folders and files

Latest commit

History

Repository files navigation

Raman_spectra_data

Fully supervised set

Semi-supervised set

Shifted test set

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages