Correctness-Predictor

A series of different machine learning model attempting to model if a student can correctly answer a diagnostic question based on their previous response.

Built by Ivan Ye and Boaz Cheung

k-Nearest Neighbor (kNN): Analysis of kNN, comparing 'impute by user' and 'impute by item'. Includes accuracy data and discussion on assumptions and limitations.
Item Response Theory (IRT): Mathematical formulations in IRT, training/validation log-likelihoods, and accuracy metrics.
Neural Networks: Comparison of Alternating Least Squares and neural networks. Details on implementation and accuracy.
Ensemble Techniques: Ensemble process, dataset creation, IRT model training, and performance evaluation.
Extensions and Modifications of the Neural Network: Improvements using deep neural networks, Leaky ReLU activation, and Adam optimizer. Discusses limitations like sample size and hyperparameter sensitivity.

Models

This section provides an examination of various machine learning models, such as k-Nearest Neighbor, Item Response Theory, Neural Networks, and Ensemble Techniques.

k-Nearest Neighbour

kNN performs collaborative filtering using the other students' answers to predict whether the specific student can correctly answer some diagnostic questions. Both user-based and item-based collaborative filtering was used.

User-based assumption: if student X has the same answers on other diagnositc questions as student Y, student X's correctness on specific diagnostic questions matches that of student Y.
Item-based assumption: if question A is answered similarly by other sutdents as question B, question A's difficulty for specific students matches that of question B.

Comparison figure between user-based and item-based

Item Response Theory

IRT assigns each student an ability value, θ_i for each student i, and each question a difficulty value, β_j for each student j, to formulate a probability distribution. After calculating the log-likelihood and the derivative of it for the gradient descent, three questions were selected to be further analyzed. The trained θ and β were used to plot the probability of a correct response as a function of θ given a question j.

The shape of the curve is in the form of the sigmoid function, as that is the probability of a student answering correctly, offset by the randomly chosen weights for θ and β. These cruves represent the predicted probability that a student i with ability θ_i can answer questions j₁, j₂, j₃ correctly.

Student correctness figure

Neural Network

A neural network model utilizing an AutoEncoder architecture, designed for binary classification of student responses. It comprises two main linear layers for encoding and decoding, with sigmoid activation functions indicating binary classification.

The model is trained using Stochastic Gradient Descent (SGD) with a regularized objective function to prevent overfitting. It is evaluated on both validation and test datasets, focusing on accuracy metrics.

Ensemble

The ensemble process that was implemented was done through first creating 3 datasets from the given train data, each the same size as the original with replacement. The base model used was IRT, so there were 3 IRT models, each one trained on one of the bootstrapped training sets. Each IRT model then outputs its own θ and β which were each used to predict the correctness, using the given evaluate function, the average of which was taken.

Extensions and Modifications of the neural network

This section discusses the enhancement of the algorithm with a deep neural network, addressing potential underfitting in earlier models. Key points include:

Deep Neural Network Implementation: To better capture complex data relationships, hidden layers are added with Leaky ReLU activation functions.
Adaptive Moment Estimation (Adam) Optimizer: Used for a robust approach with an adaptive learning rate and momentum, aiding in handling noisy gradients.
Utilizing CUDA for Training Optimization: Leveraging GPU's parallel computing capabilities, the CUDA model is employed to manage the increased computational demands of this more complex model.

Limitations

Small Sample Size: The model's complexity increases the risk of overfitting due to a limited data sample.
Overfitting Mitigation: Strategies include sampling more data, employing bagging techniques, and using a L2 regularizer.
Hyper-Parameter Sensitivity: Numerous parameters (alpha, k, learning rate, num epoch, lambda) require careful tuning to avoid sub-optimal solutions.
Vulnerability to Adversarial Attacks: The deep neural network's sensitivity to data changes makes it prone to such attacks. Regularizing gradients and introducing randomness in training can help mitigate this.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
src		src
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Correctness-Predictor

Contents

Models

k-Nearest Neighbour

Comparison figure between user-based and item-based

Item Response Theory

Student correctness figure

Neural Network

Ensemble

Extensions and Modifications of the neural network

Limitations

Comparison figure with model and updated model

About

Releases

Packages

Contributors 2

Languages

License

rethegreat/Student-Correctness-Predictor

Folders and files

Latest commit

History

Repository files navigation

Correctness-Predictor

Contents

Models

k-Nearest Neighbour

Comparison figure between user-based and item-based

Item Response Theory

Student correctness figure

Neural Network

Ensemble

Extensions and Modifications of the neural network

Limitations

Comparison figure with model and updated model

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages