J. Hugh Wright
Vignesh Muthukumar
Gabriel Silva de Oliviera
This project used the following datasets collected from DataShop:
- https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=92
- https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=120
- https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=339
Sadly, these datasets are too large to be included within this repository. However, we do include CSV files containing the data we extracted from them.
The following is a list of the files included in this rep, and what they were used for.
- baseline_model.py: A python script for training and testing our non-neural network predictive models.
- CorrelationCalculation.ipynb: A notebook for calculating the Pearson's correlation coefficients and p-values of our data.
- dataHotEncoding.csv: A CSV file containing the final form of our data, with all KCs one hot encoded. Used to train the predictive models.
- Graphs.xlsx: An Excel file that we used to create our data visualizations.
- HotEncoding.ipynb: A python script for one hot encoding our knowledge components. Produced the "dataHotencoding.csv" file.
- language_features.py: A python script for extracting language features and adding them to our data set. Produced the "Language_Processed.csv" file in the ProcessedData folder
- LowFrequencyCalculation.ipynb: A notebook for detecting low frequency words. Due to time constraints, we ultimately did not use this data in our analysis.
- neural_networks.py: A Python script for training and testing our neural network models.
- preprocess.py: A simple python script to extract information about individual questions from our three original datasets. Produced the "More_Processed_Data.csv" in the ProcessedData folder.