Skip to content

bilalltf/kerdro-mlflow

Repository files navigation

MLOps projet unsing Kedro and MLFlow


Kedro: is an open-source Python framework for creating reproducible, maintainable and modular data science code. It is defined by a structure of nodes and pipelines. Nodes are the functions that perform any operations on the data. A set of nodes executed in a sequence is called a pipeline. The most common pipelines are data engineering and data science pipelines.


MLFlow: is an open-source platform for managing the end-to-end machine learning lifecycle. It provides a central place to track experiments, compare results, and share models.


Used dataset YELLOW TRIPDATA 2018-12 from nyc.gov website.

After downloading the dataset, save it in the data folder:

├── data
│   ├── 01_raw
│   │   └── yellow_tripdata_2018-12.csv


Setup kedro env:

conda create --name kedro_env python=3.7
conda activate kedro_env
pip install kedro

Install kedro-viz:

pip install kedro-viz

Install kedro-mlflow:

pip install kedro-mlflow

Install requirements:

pip install -r src/requirements.txt

For now everything is installed and ready to go. Let's start by creating a new kedro project and run it:

Create a new project:

kedro new

Run the project:

kedro run

Run the project with a specific pipeline:

kedro run --pipeline data_engineering

Run the project with a specific node:

kedro run --node create_report

To visualize the project pipelines, run the following command:

Run kedro-viz:

kedro viz

To track the project metrics with MLFlow, run the following command:

Run kedro-mlflow:

kedro mlflow init
kedro mlflow run

Run kedro-mlflow with a specific pipeline:

kedro mlflow run --pipeline data_engineering

Data engineering pipeline:

data_engineering_pipeline


Data science pipeline:

data_engineering_pipeline


The tracking UI:

data_engineering_pipeline

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published