Skip to content

Latest commit

 

History

History
40 lines (30 loc) · 847 Bytes

README.md

File metadata and controls

40 lines (30 loc) · 847 Bytes

This is the repo of Stat_Collective for the workings in Datastorm 2.0

Approach

  • Load Dataset directly as csv into panda dataframe
  • Train set, validation set and test set is loaded seperately.

1. Data Cleaning and EDA

  • Duplicate dataset for undersampled (Skip this for first fit)
  • Check for Missing Data
  • Check for ordinal data masked us numerical data
  • Plots and charts for Data

2. Data Preprocessing

  • Dummy variables
  • Impute variables?
  • Feature Selection
  • Scaling

3. Model Building

3.1 Logistic Regression

3.2 Random Forest Classification

3.3 K-Nearest Neighbours

3.4 Gradient Boosting

3.5 XG Boost

3.6 Support Vector Machine

3.7 Neural Network

4. Model Evaluation (Hyper Parameter Tuning)

  • F1 score
  • confusion matrix

5. Model Testing

  • F1 score
  • Confusion matrix