Skip to content

Repository containing the codes for the Datastorm 2.0

Notifications You must be signed in to change notification settings

Gajithra/SC-DataStorm-2

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is the repo of Stat_Collective for the workings in Datastorm 2.0

Approach

  • Load Dataset directly as csv into panda dataframe
  • Train set, validation set and test set is loaded seperately.

1. Data Cleaning and EDA

  • Duplicate dataset for undersampled (Skip this for first fit)
  • Check for Missing Data
  • Check for ordinal data masked us numerical data
  • Plots and charts for Data

2. Data Preprocessing

  • Dummy variables
  • Impute variables?
  • Feature Selection
  • Scaling

3. Model Building

3.1 Logistic Regression

3.2 Random Forest Classification

3.3 K-Nearest Neighbours

3.4 Gradient Boosting

3.5 XG Boost

3.6 Support Vector Machine

3.7 Neural Network

4. Model Evaluation (Hyper Parameter Tuning)

  • F1 score
  • confusion matrix

5. Model Testing

  • F1 score
  • Confusion matrix

About

Repository containing the codes for the Datastorm 2.0

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%