Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #1

Open
wants to merge 19 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: Python application

on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]

permissions:
contents: read

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install flake8 pytest
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
pytest
- name: Run main
run: |
python main.py
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
__pycache__/
data/derived/test.csv
data/derived/train.csv
titanic-env/
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Probabilité de survie sur le Titanic

Pour pouvoir utiliser ce projet, il
est recommandé de créer un fichier `config.yaml`
ayant la structure suivante:

```yaml
jeton_api: ####
data_path: ####
```
4 changes: 4 additions & 0 deletions configuration/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
jeton_api: "$trotskitueleski1917"
data_path: "https://minio.lab.sspcloud.fr/meilametayebjee/ensae-reproductibilite/data/raw/data.csv"
train_path: "data/derived/train.csv"
test_path: "data/derived/test.csv"
892 changes: 0 additions & 892 deletions data.csv

This file was deleted.

57 changes: 57 additions & 0 deletions docs/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Prediction de la survie d'un individu sur le Titanic
"""

import argparse

from titanicml.data.import_data import import_yaml_config, process_data
from titanicml.pipeline.build_pipeline import split, build_pipeline
from titanicml.models.train_evaluate import evaluate

parser = argparse.ArgumentParser(description="Paramètres du random forest")
parser.add_argument("--n_trees", type=int, default=20, help="Nombre d'arbres")
args = parser.parse_args()

N_TREES = args.n_trees
print("Nombre d'arbres : ", N_TREES)

config = import_yaml_config("configuration/config.yaml")

DATA_PATH = config.get("data_path", "data.csv")
TRAIN_PATH = config.get("train_path", "train.csv")
TEST_PATH = config.get("test_path", "test.csv")
TEST_FRACTION = config.get("test_fraction", 0.1)
MAX_DEPTH = None
MAX_FEATURES = "sqrt"


# IMPORT ET EXPLORATION DONNEES --------------------------------

TrainingData = process_data(DATA_PATH)

# SPLIT TRAIN/TEST --------------------------------

X_train, X_test, y_train, y_test = split(
TrainingData,
test_fraction=TEST_FRACTION,
train_path=TRAIN_PATH,
test_path=TEST_PATH,
)
# PIPELINE ----------------------------

# Définition des variables
numeric_features = ["Age", "Fare"]
categorical_features = ["Embarked", "Sex"]

pipe = build_pipeline(
numeric_features,
categorical_features,
n_trees=N_TREES,
max_depth=MAX_DEPTH,
max_features=MAX_FEATURES,
)

# ESTIMATION ET EVALUATION ----------------------

pipe.fit(X_train, y_train)
evaluate(pipe, X_test, y_test)
File renamed without changes.
9 changes: 9 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash
# Install Python
apt-get -y update
apt-get install -y python3-pip python3-venv
# Create empty virtual environment
python3 -m venv titanic
source titanic/bin/activate
# Install project dependencies
pip install -r requirements.txt
17 changes: 17 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[tool.poetry]
name = "titanicml"
version = "0.0.1"
description = "Awesome Machine Learning project"
authors = ["Daffy Duck <[email protected]>", "Mickey Mouse"]
license = "MIT"
readme = "README.md"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.pytest.ini_options]
log_cli = true
log_cli_level = "WARNING"
log_cli_format = "%(asctime)s [%(levelname)8s] %(message)s (%(filename)s:%(lineno)s)"
log_cli_date_format = "%Y-%m-%d %H:%M:%S"
25 changes: 25 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
contourpy==1.3.0
cycler==0.12.1
fonttools==4.53.1
joblib==1.4.2
kiwisolver==1.4.7
matplotlib==3.9.2
numpy==2.1.1
packaging==24.1
pandas==2.2.2
pillow==10.4.0
protobuf==4.25.3
pyarrow==17.0.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.2
scikit-learn==1.5.1
scipy==1.14.1
seaborn==0.13.2
setuptools==69.5.1
six==1.16.0
threadpoolctl==3.5.0
tzdata==2024.1
wheel==0.44.0
zstandard==0.23.0
181 changes: 0 additions & 181 deletions test.csv

This file was deleted.

Loading