Fait par :
Maachou Marouane
Bensaid Reda
Jallouli Mouad
Taoufik Moad
To use the main on both datasets :
You should create a data folder where you should put the two data sets
for the kidney disease dataset :
python3 main.py "data/kidney_disease.csv" "classification" "True" "0.2" "mean" "label" "Mean" "0.95" "f1_score"
for the banknote dataset:
python3 main.py "data/data_banknote_authentication.txt" "4" "False" "0.2" "mean" "label" "Mean" "0.95" "f1_score"
To modify the parameters of the pipeline:
the meaning of each value in order:
- path to dataset
- name of the columns corresponding to the annotation
- if there are headers in the dataset corresponding to column names : True for kidney disease dataset , False for banknote dataset
- test size
- the missing data strategy used in preprocessing can be one of ["","mean", "median","radical"]
- the encoding strategy for categorical columns can be one of ["","label","onehot"]
- the normalizinf strategy used in preprocessing can be one of ["","Mean", "MinMax"]
- the percentage of variance retained after PCA or number of dimension retained after PCA should be between 0 and 1 or an integer
- the metric used for evaluating the model