SC-DataStorm-2/README.md at main · Gajithra/SC-DataStorm-2 · GitHub

This is the repo of Stat_Collective for the workings in Datastorm 2.0

Approach

Load Dataset directly as csv into panda dataframe
Train set, validation set and test set is loaded seperately.

1. Data Cleaning and EDA

Duplicate dataset for undersampled (Skip this for first fit)
Check for Missing Data
Check for ordinal data masked us numerical data
Plots and charts for Data

2. Data Preprocessing

Dummy variables
Impute variables?
Feature Selection
Scaling

3. Model Building

3.1 Logistic Regression

3.2 Random Forest Classification

3.3 K-Nearest Neighbours

3.4 Gradient Boosting

3.5 XG Boost

3.6 Support Vector Machine

3.7 Neural Network

4. Model Evaluation (Hyper Parameter Tuning)

F1 score
confusion matrix

5. Model Testing

F1 score
Confusion matrix