Projet3A_ML

Fait par :

Maachou Marouane
Bensaid Reda
Jallouli Mouad
Taoufik Moad

To use the main on both datasets :

You should create a data folder where you should put the two data sets

for the kidney disease dataset :

python3 main.py "data/kidney_disease.csv" "classification" "True" "0.2" "mean" "label" "Mean" "0.95" "f1_score"

for the banknote dataset:

python3 main.py "data/data_banknote_authentication.txt" "4" "False" "0.2" "mean" "label" "Mean" "0.95" "f1_score"

To modify the parameters of the pipeline:

the meaning of each value in order:

path to dataset
name of the columns corresponding to the annotation
if there are headers in the dataset corresponding to column names : True for kidney disease dataset , False for banknote dataset
test size
the missing data strategy used in preprocessing can be one of ["","mean", "median","radical"]
the encoding strategy for categorical columns can be one of ["","label","onehot"]
the normalizinf strategy used in preprocessing can be one of ["","Mean", "MinMax"]
the percentage of variance retained after PCA or number of dimension retained after PCA should be between 0 and 1 or an integer
the metric used for evaluating the model

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.idea		.idea
package		package
.gitignore		.gitignore
README.md		README.md
main.py		main.py
notebook.ipynb		notebook.ipynb

Provide feedback