-
-
Notifications
You must be signed in to change notification settings - Fork 69
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First draft of Chapter 7. Observability (#5)
- Loading branch information
Showing
11 changed files
with
1,009 additions
and
175 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
--- | ||
description: Explore the crucial concept of reproducibility in MLOps and learn how to achieve it using tools and practices. Understand the role of versioning, environment management, and experiment tracking in ensuring consistent and verifiable results. | ||
--- | ||
|
||
# 7.0. Reproducibility | ||
|
||
## What is reproducibility in MLOps? | ||
|
||
[Reproducibility in MLOps](https://neptune.ai/blog/how-to-solve-reproducibility-in-ml) means being able to reliably recreate the results of an AI/ML experiment or workflow. This capability is crucial for validating findings, debugging models, and ensuring consistent behavior across different environments and over time. Reproducibility helps build trust and transparency in AI/ML projects, allowing for independent verification and accelerating future development by providing a stable foundation to build upon. | ||
|
||
## Why is reproducibility important in MLOps? | ||
|
||
Reproducibility is a cornerstone in any scientific endeavor, and machine learning is no exception. It ensures that results are not due to chance or specific environmental configurations. This rigor builds trust in the models, making them more reliable for deployment. Additionally, reproducibility is crucial for debugging and fixing issues. If a model's performance degrades unexpectedly, having a reproducible setup allows you to isolate the changes that caused the issue and quickly restore the model's effectiveness. | ||
|
||
## How can you implement reproducibility in your MLOps projects? | ||
|
||
Implementing reproducibility in MLOps projects necessitates a combination of tools and practices: | ||
|
||
- **Code Versioning**: Utilizing tools like [Git](https://git-scm.com/) to track code changes and revert to specific versions allows you to precisely reproduce the code that generated particular results. This is essential for understanding the evolution of a model and recreating previous experiments. | ||
- **Environment Management**: Ensuring that the environment (e.g., Python version, libraries, dependencies) in which an experiment is conducted is consistent is vital. Employing tools like [Docker](https://www.docker.com/) or [Poetry](https://python-poetry.org/) to encapsulate dependencies and manage environments promotes consistency and portability. | ||
- **Dataset Versioning**: Tracking changes to the dataset used for training or evaluation is crucial. This could involve storing multiple versions of the dataset or logging metadata about the dataset's source with [MLflow Data](https://mlflow.org/docs/latest/python_api/mlflow.data.html). | ||
- **Randomness Control**: Inherently, AI/ML tasks often involve randomness in model initialization, data shuffling, or algorithm execution. To achieve reproducibility, you must control this randomness by fixing [random seeds](https://en.wikipedia.org/wiki/Random_seed), which ensures that random number generators produce the same sequence of numbers, thereby leading to consistent results. | ||
- **Experiment Tracking**: Employing tools like [MLflow](https://mlflow.org/) to log experiment parameters, metrics, and artifacts allows you to systematically document your experiments. This meticulous logging enables you to review past experiments, compare results, and identify the precise configurations that led to certain outcomes. | ||
|
||
## How can you fix randomness in AI/ML frameworks? | ||
|
||
By setting a specific seed, you ensure that the generator always produces the same sequence of "random" numbers, leading to consistent results across different executions of your code, even if those executions occur on different machines or at different times. | ||
|
||
Here is how you can fix the randomness in your project for several popular machine learning frameworks. | ||
|
||
### Python | ||
|
||
```python | ||
import random | ||
|
||
random.seed(42) | ||
``` | ||
|
||
### NumPy | ||
|
||
```python | ||
import numpy as np | ||
|
||
np.random.seed(42) | ||
``` | ||
|
||
### Scikit-learn | ||
|
||
```python | ||
from sklearn.ensemble import RandomForestRegressor | ||
|
||
model = RandomForestRegressor(random_state=42) | ||
``` | ||
|
||
### PyTorch | ||
|
||
```python | ||
import torch | ||
|
||
torch.manual_seed(42) | ||
``` | ||
|
||
You can also fix the randomness for CUDA operations by using: | ||
|
||
```python | ||
if torch.cuda.is_available(): | ||
torch.cuda.manual_seed_all(42) | ||
``` | ||
|
||
For additional reproducibility in multi-GPU environments, consider setting: | ||
|
||
```python | ||
torch.backends.cudnn.deterministic = True | ||
torch.backends.cudnn.benchmark = False | ||
``` | ||
|
||
### TensorFlow | ||
|
||
```python | ||
import tensorflow as tf | ||
|
||
tf.random.set_seed(42) | ||
``` | ||
|
||
## How can you use MLflow Projects to improve the reproducibility of your project? | ||
|
||
[MLflow Projects](https://mlflow.org/docs/latest/projects.html) is a component of MLflow that provides a standard format for packaging data science code in a reusable and reproducible way. An MLflow Project is defined by an `MLproject` file that specifies the project's dependencies, environment, and entry points. This standardized format makes it easier to share and execute projects across different environments and platforms, promoting both collaboration and consistency in project execution. | ||
|
||
### Defining an MLflow Project | ||
|
||
To define an [MLflow project](https://mlflow.org/docs/latest/projects.html), you can create an `MLproject` file in your project's root directory. This file uses YAML syntax to define the project structure. Below is an example of an [`MLproject`](https://github.com/fmind/mlops-python-package/blob/main/MLproject) file that specifies the project name, environment, and entry point: | ||
|
||
```yaml | ||
# https://mlflow.org/docs/latest/projects.html | ||
|
||
name: bikes | ||
python_env: python_env.yaml | ||
entry_points: | ||
main: | ||
parameters: | ||
conf_file: path | ||
command: "PYTHONPATH=src python -m bikes {conf_file}" | ||
``` | ||
In this example: | ||
- `name` defines the project name as "bikes". | ||
- `python_env` specifies the path to the [python environment file](https://github.com/fmind/mlops-python-package/blob/main/python_env.yaml). | ||
- `entry_points` defines entry points, which specify how to run parts of the project. | ||
- `main` is an entry point that accepts one parameters: `conf_file` as a file path. | ||
- The `command` specifies how to execute the entry point, which in this case runs the `bikes` module with the provided parameters. | ||
|
||
### Executing an MLflow Project | ||
|
||
To run an MLflow Project: | ||
|
||
```bash | ||
mlflow run --experiment-name=bikes --run-name=Training -P conf_file=confs/training.yaml ." | ||
``` | ||
|
||
This command instructs MLflow to run the current directory (`.`) as a project. The `-P` flag allows you to pass parameters to the entry points defined in your `MLproject` file. In this case, it passes `confs/training.yaml` as the main configuration file. | ||
|
||
### Benefits of Using MLflow Projects | ||
|
||
- **Simplified Sharing**: It's easier to share and distribute projects. | ||
- **Consistent Execution**: Ensures consistent execution across different environments. | ||
- **Reduced Setup Time**: Minimizes the time and effort required to set up and run projects. | ||
- **Collaboration**: Facilitates collaboration among team members. | ||
|
||
By leveraging MLflow Projects, you can significantly enhance the reproducibility of your MLOps projects, making it easier to share, execute, and validate your experiments, contributing to the overall robustness and trustworthiness of your ML solutions. | ||
|
||
## Reproducibility additional resources | ||
|
||
- **[MLflow Project example from the MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/MLproject)** | ||
- **[MLflow Project execution from the MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/tasks/projects.py)** | ||
- [MLflow Projects](https://mlflow.org/docs/latest/projects.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
--- | ||
description: AI/ML monitoring helps maintain the performance and reliability of models in production. Learn how to effectively track metrics, set up alerts, and gain insights into model behavior. | ||
--- | ||
|
||
# 7.1. Monitoring | ||
|
||
## Why do you need AI/ML Monitoring? | ||
|
||
[AI/ML Monitoring](https://www.evidentlyai.com/ml-in-production/model-monitoring) is the continuous process of overseeing the performance, behavior, and health of machine learning models in production environments. It's an essential aspect of MLOps that extends beyond the initial training and deployment stages, ensuring that models remain effective and reliable throughout their operational lifecycle. Effective monitoring involves several key tasks, including: | ||
|
||
- **Tracking Metrics**: Capturing and analyzing key performance indicators (KPIs) that reflect the model's accuracy, precision, recall, and other relevant metrics over time. | ||
- **Setting up Alerts**: Establishing trigger mechanisms that alert stakeholders when specific conditions are met, such as a significant drop in model accuracy or the detection of data drift. | ||
- **Gaining Insights**: Providing a means to understand and diagnose issues through visualizations, logs, and other diagnostic tools. | ||
|
||
Effective AI/ML monitoring helps prevent model decay, which occurs when model performance deteriorates over time, by quickly identifying issues, enabling timely interventions. It is crucial for both technical teams and business stakeholders to maintain confidence in the accuracy and value of AI/ML solutions. | ||
|
||
## How does AI/ML Monitoring differ from Traditional Software Monitoring? | ||
|
||
While AI/ML monitoring shares similarities with traditional software monitoring, it also presents unique challenges: | ||
|
||
- **Non-Deterministic Behavior:** Machine learning models can exhibit unpredictable behavior due to their reliance on data patterns. These patterns may change over time, leading to performance degradation or unexpected outputs. | ||
- **Complex Dependencies:** AI/ML applications often depend on multiple external factors, including data sources, feature engineering pipelines, and serving infrastructure. Monitoring must encompass all these dependencies to identify potential sources of issues. | ||
- **Black Box Nature:** The internal workings of some machine learning models can be opaque, making it harder to directly diagnose the reasons for incorrect predictions or behavior changes. | ||
|
||
These unique challenges call for specialized tools and strategies, as traditional monitoring systems may not be adequately equipped to address the complexities inherent in AI/ML applications. | ||
|
||
## What are the Benefits of AI/ML Monitoring? | ||
|
||
1. **Early Problem Detection**: By continuously monitoring metrics and setting up alerts, teams can detect problems such as model drift, data quality issues, or biases before they significantly impact business outcomes. | ||
2. **Improved Model Performance**: Tracking metrics helps identify opportunities for model retraining, hyperparameter tuning, or other optimizations to enhance model performance over time. | ||
3. **Increased Reliability**: Effective monitoring safeguards against model failures and downtime, ensuring that AI/ML solutions remain stable and dependable, maintaining business continuity. | ||
4. **Enhanced User Trust**: By ensuring model accuracy and fairness, monitoring practices build trust and confidence in AI/ML solutions among users, promoting their adoption and use. | ||
|
||
AI/ML monitoring plays a crucial role in bridging the gap between model development and production, ensuring that these solutions remain valuable and reliable assets for businesses. | ||
|
||
## Which Metrics Should You Track for AI/ML Monitoring? | ||
|
||
The metrics tracked for AI/ML monitoring should align with the specific objectives and performance requirements of the model: | ||
|
||
### Common Metrics | ||
|
||
- **Accuracy, Precision, Recall, F1 Score**: These metrics assess the model's overall performance and its ability to correctly classify or predict outcomes. | ||
- **Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE)**: These metrics quantify the magnitude of prediction errors, crucial for regression problems. | ||
- **Area Under the Receiver Operating Characteristic Curve (AUC-ROC)**: This metric evaluates the model's ability to distinguish between different classes, particularly useful for binary classification problems. | ||
|
||
### Business Metrics | ||
|
||
Beyond technical metrics, aligning monitoring with business goals through relevant metrics is crucial: | ||
|
||
- **Conversion Rate, Customer Churn, Revenue Impact**: These metrics measure the direct impact of the model on business outcomes, offering a clear view of its effectiveness. | ||
|
||
### Data Quality Metrics | ||
|
||
- **Data Drift, Missing Values, Outliers**: Monitoring data quality metrics helps detect changes in input data distribution or format that could impact model performance. | ||
|
||
## How can you implement Monitoring in the MLOps Python Package? | ||
|
||
The [MLOps Python Package](https://github.com/fmind/mlops-python-package) utilizes [Mlflow's `evaluate` API](https://mlflow.org/docs/latest/model-evaluation/index.html) for comprehensive model evaluation, including the validation of results with user-defined thresholds. This capability allows for a standardized approach to model monitoring, ensuring that model quality is consistently assessed and monitored. | ||
|
||
Here's how you can implement this monitoring functionality: | ||
|
||
1. **Define Metrics**: First, define the metrics you want to track using a class from the [`bikes.core.metrics` module](https://github.com/fmind/mlops-python-package/blob/main/src/bikes/core/metrics.py). This class allows you to specify the metric name and whether a higher or lower score indicates better performance. | ||
|
||
```python | ||
from bikes.core import metrics | ||
|
||
metrics = [ | ||
metrics.SklearnMetric(name="mean_squared_error", greater_is_better=False), | ||
metrics.SklearnMetric(name="r2_score", greater_is_better=True), | ||
] | ||
``` | ||
|
||
2. **Set Thresholds (Optional)**: If you want to establish thresholds for specific metrics, use the `Threshold` class, again from [`bikes.core.metrics`](https://github.com/fmind/mlops-python-package/blob/main/src/bikes/core/metrics.py), to define the absolute threshold value and whether a higher or lower score is desired. These thresholds serve as benchmarks for model performance, potentially triggering alerts if violated. | ||
|
||
```python | ||
thresholds = { | ||
"r2_score": metrics.Threshold(threshold=0.5, greater_is_better=True) | ||
} | ||
``` | ||
|
||
3. **Integrate with the `EvaluationsJob`**: The `EvaluationsJob` in the [`bikes.jobs.evaluations`](https://github.com/fmind/mlops-python-package/blob/main/src/bikes/jobs/evaluations.py) module is responsible for loading the registered model, reading the evaluation dataset, and calculating the specified metrics. You can configure this job to use the defined metrics and thresholds. | ||
|
||
```python | ||
from bikes import jobs | ||
from bikes.io import datasets | ||
|
||
evaluations_job = jobs.EvaluationsJob( | ||
inputs=datasets.ParquetReader(path="data/inputs_test.parquet"), | ||
targets=datasets.ParquetReader(path="data/targets_test.parquet"), | ||
metrics=metrics, | ||
thresholds=thresholds, | ||
) | ||
``` | ||
|
||
4. **Execute the Job**: Run the `EvaluationsJob` to compute the metrics and assess them against the thresholds. If any thresholds are violated, MLflow will raise a `ModelValidationFailedException`, which can be handled appropriately in your workflow. | ||
|
||
```python | ||
with evaluations_job as runner: | ||
runner.run() | ||
``` | ||
|
||
## How to integrate Monitoring in your notebooks? | ||
|
||
You can use the [Evidently](https://www.evidentlyai.com/) library to generate interactive reports for model monitoring and analysis. Evidently supports data and target drift detection, model performance monitoring, and the creation of visual reports to understand changes and issues within an ML pipeline. It simplifies the tracking and analysis of changes in model behavior, making the monitoring process more efficient and effective. | ||
|
||
Here are the steps to integrate Evidently into your Jupyter notebooks for model monitoring: | ||
|
||
1. **Install the Evidently library**: | ||
|
||
```bash | ||
pip install evidently | ||
``` | ||
|
||
2. **Import necessary modules**: | ||
|
||
```python | ||
import pandas as pd | ||
from evidently.report import Report | ||
from evidently.metric_preset import DataDriftPreset | ||
``` | ||
|
||
3. **Load your reference and current data**: These datasets represent the data the model was trained on (reference) and the data the model is currently making predictions on (current). | ||
|
||
```python | ||
reference_data = pd.read_csv('reference.csv') | ||
current_data = pd.read_csv('current.csv') | ||
``` | ||
|
||
4. **Generate an Evidently report**: | ||
|
||
```python | ||
report = Report(metrics=[DataDriftPreset()]) | ||
report.run(reference_data=reference_data, current_data=current_data) | ||
report.show() # or report.save_html('my_report.html') | ||
``` | ||
|
||
This example will generate a report highlighting any data drift between the reference and current datasets, which can be crucial in identifying why a model's performance might be declining. | ||
|
||
## What are the Best Practices for AI/ML Monitoring? | ||
|
||
1. **Establish Clear Monitoring Goals**: Define the objectives of your monitoring efforts, which might include detecting drift, maintaining performance, or ensuring fairness. | ||
2. **Choose Relevant Metrics**: Select metrics that align with the specific objectives of your model and your business goals. | ||
3. **Set up Meaningful Alerts**: Design alerts that are actionable and relevant, avoid alert fatigue, and ensure timely responses to critical issues. | ||
4. **Integrate Monitoring into CI/CD Pipelines**: Incorporate monitoring steps into your continuous integration and continuous deployment (CI/CD) workflows to ensure that changes are thoroughly evaluated and monitored. | ||
5. **Visualize Results**: Utilize visual dashboards and reports to present monitoring results in an easily understandable format for both technical and business stakeholders. | ||
6. **Regularly Review and Update**: Periodically review your monitoring setup and adjust metrics, thresholds, and alerts as needed based on experience and changing requirements. | ||
|
||
## Additional Monitoring Resources | ||
|
||
- **[Example from the MLOps Python Package](https://github.com/fmind/mlops-python-package/blob/main/src/bikes/jobs/evaluations.py)** | ||
- **[MLflow Evaluate API](https://mlflow.org/docs/latest/model-evaluation/index.html)** | ||
- **[EvidentlyAI](https://www.evidentlyai.com/)** | ||
- [Model Monitoring: What it is and why it's so hard](https://christophergs.com/machine%20learning/2020/03/14/how-to-monitor-machine-learning-models/) |
Oops, something went wrong.