Skip to content

Commit

Permalink
done 5.5
Browse files Browse the repository at this point in the history
  • Loading branch information
fmind committed Apr 6, 2024
1 parent 0e64780 commit 1936732
Show file tree
Hide file tree
Showing 2 changed files with 194 additions and 3 deletions.
49 changes: 49 additions & 0 deletions docs/5. Refining/5.2. Pre-Commit Hooks.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,55 @@ repos:
Additional hooks are available at the pre-commit website, offering a wide range of checks for different needs.
## How can you improve your commit messages?
Commit messages play a crucial role in software development, offering insights into what changes have been made and why. To enhance the quality of your commit messages and ensure consistency across contributions, Commitizen, a Python tool, can be extremely helpful. It not only formats commit messages but also helps in converting these messages into a comprehensive CHANGELOG. Here's how you can leverage Commitizen to streamline your commit messages:
To get started with Commitizen, you can install it using the following command:
```bash
poetry add -G commits commitizen
```

This command adds Commitizen to your project as a development dependency, ensuring that it is available for formatting commit messages.

Once installed, Commitizen offers several commands to assist with your commits:

```bash
# display a guide to help you format your commit messages
poetry run cz info
# bump your package version according to semantic versioning
poetry run cz bump
# interactively create a new commit message following best practices
poetry run cz commit
```

These commands are designed to guide you through creating structured and informative commit messages, bumping your project version appropriately, and even updating your CHANGELOG automatically.

To configure Commitizen to fit your project needs, you can set it up in your `pyproject.toml` file as shown below:

```toml
[tool.commitizen]
name = "cz_conventional_commits" # Uses the conventional commits standard
tag_format = "v$version" # Customizes the tag format
version_scheme = "pep440" # Follows the PEP 440 version scheme
version_provider = "poetry" # Uses poetry for version management
update_changelog_on_bump = true # Automatically updates the CHANGELOG when the version is bumped
```

Integrating Commitizen into your pre-commit workflow ensures that all commits adhere to a consistent format, which is crucial for collaborative projects. You can add it to your `.pre-commit-config.yaml` like this:

```yaml
- repo: https://github.com/commitizen-tools/commitizen
rev: v3.18.3 # The version of Commitizen you're using
hooks:
- id: commitizen # Ensures your commit messages follow the conventional format
- id: commitizen-branch
stages: [push] # Optionally, enforce commit message checks on push
```
By incorporating Commitizen into your development workflow, you not only standardize commit messages across your project but also create a more readable and navigable project history. This practice is invaluable for team collaboration and can significantly improve the maintenance and understanding of your project over time.
## Is there a way to bypass a hook validation?
There are occasions when bypassing a pre-commit hook is necessary, such as when you need to make a quick fix or are confident the commit does not introduce any issues. To bypass the pre-commit hooks, you can use the following commands:
Expand Down
148 changes: 145 additions & 3 deletions docs/5. Refining/5.5. AI-ML Experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,158 @@

## What is an AI/ML experiment?

An AI/ML experiment is a structured process where data scientists and machine learning engineers systematically apply various algorithms, tune parameters, and manipulate datasets to develop predictive models. The goal is to find the most effective model that solves a given problem or enhances the performance of existing solutions. These experiments are iterative, involving trials of different configurations to evaluate their impact on model accuracy, efficiency, and other relevant metrics.

## Why do you need AI/ML experiment?

ML has special artifacts
- **Improve reproducibility**: Reproducibility is crucial in AI/ML projects to ensure that experiments can be reliably repeated and verified by others. Experiment tracking helps in documenting the setup, code, data, and outcomes, making it easier to replicate results.
- **Find the best hyperparameters**: Hyperparameter tuning is a fundamental step in enhancing model performance. AI/ML experiments allow you to systematically test different hyperparameter configurations to identify the ones that yield the best results.
- **Assign tags to organize your experiments**: Tagging helps in categorizing experiments based on various criteria such as the type of model, dataset used, or the objective of the experiment. This organization aids in navigating and analyzing experiments efficiently.
- **Track the performance improvement during a run**: Continuous monitoring of model performance metrics during experiment runs helps in understanding the impact of changes and guiding further adjustments.
- **Integrate with several AI/ML frameworks**: Experiment tracking tools support integration with popular AI/ML frameworks, streamlining the experimentation process across different environments and tools.

AI/ML experimentation is distinct in MLOps due to the inherent complexity and non-deterministic nature of machine learning tasks. Leveraging experiment tracking tools equips teams with a structured approach to manage this complexity, akin to how scientists document their research findings.

## Which AI/ML experiment solution should you use?

There are various AI/ML experiment tracking solutions available, each offering unique features. Major cloud providers like Azure (Azure ML), AWS (SageMaker), and Google Cloud (Vertex AI) offer integrated MLOps platforms that include experiment tracking capabilities. There are also vendor-specific tools such as Weights & Biases and Neptune AI that specialize in experiment tracking. Among the open-source options, MLflow stands out as a robust and versatile choice for tracking experiments, integrating with a wide range of ML libraries and frameworks.

To install MLflow, execute:

```bash
poetry install mlflow
```

To verify the installation and start the MLflow server:

```bash
poetry run mlflow doctor
poetry run mlflow server
```

For Docker Compose users, the following configuration can launch an MLflow server:

```yaml
services:
mlflow:
image: ghcr.io/mlflow/mlflow:v2.11.0
ports:
- 5000:5000
environment:
- MLFLOW_HOST=0.0.0.0
command: mlflow server
```
Run `docker compose up` to start the server.

Further deployment details for broader access can be found in MLflow's documentation.

## How should you configure MLflow in your project?

Configuring MLflow in your project enables efficient tracking of experiments. Initially, set the tracking and registry URIs, and specify the experiment name. Enabling autologging will automatically record metrics, parameters, and models without manual instrumentation.

```python
import mlflow
mlflow.set_tracking_uri("file://./mlruns")
mlflow.set_registry_uri("file://./mlruns")
mlflow.set_experiment(experiment_name="Bike Sharing Demand Prediction")
mlflow.autolog()
```

To begin tracking, wrap your code in an MLflow run context, specifying details like the run name and description:

```python
with mlflow.start_run(
run_name="Demand Forecast Model Training",
description="Training with enhanced feature set",
log_system_metrics=True,
) as run:
# Your model training code here
```

## Which information can you track in an AI/ML experiment?

Metric, tags, metadata ...
MLflow's autologging capability simplifies the tracking of experiments by automatically recording several informations. You can complement autologging by manually logging additional information:

- Parameters with `mlflow.log_param()` for individual key-value pairs, or `mlflow.log_params()` for multiple parameters.
- Metrics using `mlflow.log_metric()` for single key-value metrics, capturing the evolution of metrics over time, or `mlflow.log_metrics()` for multiple metrics.
- Input datasets and context with `mlflow.log_input()`, including tags for detailed categorization.
- Tags for the active run through `mlflow.set_tag()` for single tags or `mlflow.set_tags()` for multiple tags.
- Artifacts such as files or directories with `mlflow.log_artifact()` or `mlflow.log_artifacts()` for logging multiple files.

## How can you compare AI/ML experiments in your project?

## What are some tips and tricks for using AI/ML experiments?
Comparing AI/ML experiments is crucial for identifying the most effective models and configurations. MLflow offers two primary methods for comparing experiments: through its web user interface (UI) and programmatically. Here's how you can use both methods:

### Comparing Experiments via the MLflow Web UI

1. **Launch the MLflow Tracking Server**: Start the MLflow tracking server if it isn't running already.

2. **Navigate to the Experiments Page**: Navigate to the experiments page where all your experiments are listed.

3. **Select Experiments to Compare**: Find the experiments you're interested in comparing and use the checkboxes to select them. You can select multiple experiments for comparison.

4. **Use the Compare Button**: After selecting the experiments, click on the "Compare" button. This will take you to a comparison view where you can see the runs side-by-side.

5. **Analyze the Results**: The comparison view will display key metrics, parameters, and other logged information for each run. Use this information to analyze the performance and characteristics of each model or configuration.

### Comparing Experiments Programmatically

Comparing experiments programmatically offers more flexibility and can be integrated into your analysis or reporting tools.

1. **Search Runs**: Use the `mlflow.search_runs()` function to query the experiments you want to compare. You can filter experiments based on experiment IDs, metrics, parameters, and tags. For example:

```python
import mlflow
# Assuming you know the experiment IDs or names
experiment_ids = ["1", "2"]
runs_df = mlflow.search_runs(experiment_ids)
```

2. **Filter and Sort**: Once you have the dataframe with runs, you can use pandas operations to filter, sort, and manipulate the data to focus on the specific metrics or parameters you're interested in comparing.

3. **Visualize the Comparison**: For a more intuitive comparison, consider visualizing the results using libraries such as Matplotlib or Seaborn. For example, plotting the performance metrics of different runs can help in visually assessing which configurations performed better.


```python
import matplotlib.pyplot as plt
# Example: Comparing validation accuracy of different runs
plt.figure(figsize=(10, 6))
for _, row in runs_df.iterrows():
plt.plot(row['metrics.validation_accuracy'], label=f"Run {row['run_id'][:7]}")
plt.title("Comparison of Validation Accuracy Across Runs")
plt.xlabel("Epoch")
plt.ylabel("Validation Accuracy")
plt.legend()
plt.show()
```

These methods enable you to conduct thorough comparisons between different experiments, helping guide your decisions on model improvements and selections.

## What are some tips and tricks for using AI/ML experiments?

To maximize the efficacy of AI/ML experiments:

- Align logged information with relevant business metrics to ensure experiments are focused on meaningful outcomes.
- Use nested runs to structure experiments hierarchically, facilitating organized exploration of parameter spaces.
```python
with mlflow.start_run() as parent_run:
param = [0.01, 0.02, 0.03]
# Create a child run for each parameter setting
for p in param:
with mlflow.start_run(nested=True) as child_run:
mlflow.log_param("p", p)
...
mlflow.log_metric("val_loss", val_loss)
```
- Employ tagging extensively to enhance the searchability and categorization of experiments.
- Track detailed progress by logging steps and timestamps, providing insights into the evolution of model performance.
```python
mlflow.log_metric(key="train_loss", value=train_loss, step=epoch, timestamp=now)
```
- Regularly log models to the model registry for versioning and to facilitate deployment processes.

0 comments on commit 1936732

Please sign in to comment.