Skip to content

M007-lab/Projet_ML_Pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Projet3A_ML

Fait par :

Maachou Marouane
Bensaid Reda
Jallouli Mouad
Taoufik Moad

To use the main on both datasets :

You should create a data folder where you should put the two data sets

for the kidney disease dataset :

python3 main.py "data/kidney_disease.csv" "classification" "True" "0.2" "mean" "label" "Mean" "0.95" "f1_score"

for the banknote dataset:

python3 main.py "data/data_banknote_authentication.txt" "4" "False" "0.2" "mean" "label" "Mean" "0.95" "f1_score"

To modify the parameters of the pipeline:

the meaning of each value in order:

  • path to dataset
  • name of the columns corresponding to the annotation
  • if there are headers in the dataset corresponding to column names : True for kidney disease dataset , False for banknote dataset
  • test size
  • the missing data strategy used in preprocessing can be one of ["","mean", "median","radical"]
  • the encoding strategy for categorical columns can be one of ["","label","onehot"]
  • the normalizinf strategy used in preprocessing can be one of ["","Mean", "MinMax"]
  • the percentage of variance retained after PCA or number of dimension retained after PCA should be between 0 and 1 or an integer
  • the metric used for evaluating the model

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •