MLPath is an MLOPs library on Python that makes tracking machine learning experiments and organizing machine learning projects easier. It consists of two subpackages so far, MLQuest for tracking and MLDir for directory structure.
Check this for documentation and this for a full version of the quick start below.
pip install mlpath
MLPath isn't "just another" machine learning tracking library:
- Unlike other libraries,
MLPath
requires minimal boilerplate for tracking and infers hyperparameter names automatically - Does not restrict developers to using a web interface. Logs can be shown in the notebook itself!
- Less abstraction: Logs can be treated as
Pandas
tables for additional operations or visualizations - Comes with
MLDir
which automatically generates and sets standards for directory structure such as to maximize organization and reproducibility MLDir
also makes it easier to wrap models that map files to outputs in a web interface
This is your code without mlquest
# Preprocessing
x_data_p = Preprocessing(x_data=[1, 2, 3], alpha=1024, beta_param=7, c=12)
# Feature Extraction
x_data_f = FeatureExtraction(x_data_p, 14, 510, 4)
# Model Initialization
model = RadialBasisNet(x_data_f, 12, 2, 3)
# Model Training
accuracy = train_model(model)
This is your code with mlquest
# 1. Import the Package
from mlpath import mlquest as mlq
l = mlq.l
# 2. Start a new quest, this simply create a table or loads an existing one to log your next run
mlq.start_quest('Radial Basis Pipeline', log_defs=False)
# 3. Wrap function calls to be logged with `l()`
# Preprocessing
x_data_p = l(Preprocessing)(x_data=[1, 2, 3], alpha=1114, beta_param=2, c=925)
# Feature Extraction
x_data_f = l(FeatureExtraction)(x_data_p, 32, 50, 4) # x_data_p is an array so it won't be logged.
# Model Initialization
model = l(RadialBasisNet)(x_data_f, 99, 19, 31)
# Model Training
accuracy = train_model(model)
# 4. log any metrics if needed
mlq.log_metrics(accuracy) # can also do mlq.log_metric(acc=accuracy) so its logged as acc
# 5. End the quest to push the experiment to the table and save as markdown at './'
mlq.end_quest('./')
# 6. View the table (only for notebooks)
mlq.show_logs(last_k=10) # show the table for the last 10 runs
This results in the following after three runs shown below the cell in the notebook or the separate markdown file.
info | Preprocessing | FeatureExtraction | RadialBasisNet | metrics | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | date | duration | id | alpha | beta_param | c | x_param | y_param | z_param | p_num | k_num | l_num | accuracy |
16:31:16 | 02/11/23 | 1.01 min | 1 | 74 | 12 | 95 | 13 | 530 | 4 | 99 | 99 | 3 | 50 |
16:32:40 | 02/11/23 | 4.91 ms | 2 | 14 | 2 | 95 | 132 | 530 | 4 | 99 | 19 | 3 | 70 |
16:32:57 | 02/11/23 | 4.93 ms | 3 | 1114 | 2 | 925 | 32 | 50 | 4 | 99 | 19 | 31 | 70 |
Editors like VSCode support viewing markdown out-of-the-box. You may need to press CTRL/CMD+Shift+V
. You can see a fuller version of this quick start in the documentation which corresponds to the Full-Example
notebook found here which you can also run locally.
Check Example.ipynb
or equivalently the following Colab notebook.
More examples with sci-kit learn
and an example with PyTorch
could be found by running mldir --example
as will be illustrated down below.
Simply run mlq.run_server()
after mlq.end_quest
⦿ You can search for specific runs, an example would be metrics.accuracy>50
(similar syntax to MLFlow)
⦿ You can customize the columns to show in the table by clicking on columns
(in lieu of doing it throughjson
config file)
MLDir is a simple CLI that creates a standard directory structure for your machine learning project. It provides a folder structure that is comprehensive, highly scalable (development-wise) and apt for collaboration.
⦿ Although it integrates well with MLQuest, neither MLQuest nor MLDir require the other to function.
⦿ Suppose your project has very few people working on it (only you) or does not require trying many models with many other preprocessing methods and features, then you may not really need MLDir. A notebook and MLQuest should be enough. Otherwise use MLDir to prevent your directory from becoming a spaghetti soup of Python files.
The directory structure generated by MLDir complies with the MLDir manifesto ( a set of 'soft' standards) which attempts to enforce seperation of concerns among different stages of the machine learning pipeline and among writing code and running experiments (hyperparameter tuning). We recommend that you read more about the manifesto here.
MLDir is part of MLPath. So you don't need to install it separately. To create a simple folder structure, run:
mldir --name <project_name>
⦿ If mldir is ran without a name, it uses the name 'Project'
This generates the following folder structure (with dummy names for features and models):
.
├── DataPreparation
│ ├── Ingestion.py
│ └── Preprocessing.py
├── FeatureExtraction
│ ├── BoW
│ │ └── BoW.py
│ ├── GLCM
│ │ └── GLCM.py
│ └── OneHot
│ └── OneHot.py
├── GIT-README.md
├── ModelPipelines
│ ├── GRU
│ │ └── OneHot-GRU.ipynb
│ ├── GradientBoost
│ │ ├── BoW-GB.ipynb
│ │ └── GLCM-GB.ipynb
│ └── SVM
│ └── BoW-SVM.ipynb
├── ModelScoring
│ ├── Pipeline.py
│ └── Scoring.py
├── README.md
└── Sandbox.ipynb
The file in each folder has instructions on how to use it. These are all grouped in the README.md
for a more detailed explanation.
mldir --name<project-name> --full
⦿ The --full option generates an even more comprehensive folder structure. Including folders such as ModelImplementations
, References
and most importantly Production
.
⦿ The Production
folder contains a Flask app that can be used to serve your model as an API. All you need is only to import your final model into app.py and replace the dummy model with it. The Flask app assumes that your model takes a file via path and returns a prediction but it can be easily extended otherwise to suit your needs
mldir --name <project-name> --example
⦿ The --example option generates a complete example on a tiny dataset (and real models) that should be helpful for you to understand more about the folder structure and how to use it (e.g., you can use it as a template for your own project).
Thanks to Abdullah for all his startling work on the mlweb module and for all the time he spent with me to discuss or test the library.
Thanks to Jimmy for all his help in testing the library.
Essam Wisam |
Abdullah Adel |