diff --git a/docs/5. Refining/5.2. Pre-Commit Hooks.md b/docs/5. Refining/5.2. Pre-Commit Hooks.md index d4c2302..91bfc37 100644 --- a/docs/5. Refining/5.2. Pre-Commit Hooks.md +++ b/docs/5. Refining/5.2. Pre-Commit Hooks.md @@ -87,6 +87,55 @@ repos: Additional hooks are available at the pre-commit website, offering a wide range of checks for different needs. +## How can you improve your commit messages? + +Commit messages play a crucial role in software development, offering insights into what changes have been made and why. To enhance the quality of your commit messages and ensure consistency across contributions, Commitizen, a Python tool, can be extremely helpful. It not only formats commit messages but also helps in converting these messages into a comprehensive CHANGELOG. Here's how you can leverage Commitizen to streamline your commit messages: + +To get started with Commitizen, you can install it using the following command: + +```bash +poetry add -G commits commitizen +``` + +This command adds Commitizen to your project as a development dependency, ensuring that it is available for formatting commit messages. + +Once installed, Commitizen offers several commands to assist with your commits: + +```bash +# display a guide to help you format your commit messages +poetry run cz info +# bump your package version according to semantic versioning +poetry run cz bump +# interactively create a new commit message following best practices +poetry run cz commit +``` + +These commands are designed to guide you through creating structured and informative commit messages, bumping your project version appropriately, and even updating your CHANGELOG automatically. + +To configure Commitizen to fit your project needs, you can set it up in your `pyproject.toml` file as shown below: + +```toml +[tool.commitizen] +name = "cz_conventional_commits" # Uses the conventional commits standard +tag_format = "v$version" # Customizes the tag format +version_scheme = "pep440" # Follows the PEP 440 version scheme +version_provider = "poetry" # Uses poetry for version management +update_changelog_on_bump = true # Automatically updates the CHANGELOG when the version is bumped +``` + +Integrating Commitizen into your pre-commit workflow ensures that all commits adhere to a consistent format, which is crucial for collaborative projects. You can add it to your `.pre-commit-config.yaml` like this: + +```yaml + - repo: https://github.com/commitizen-tools/commitizen + rev: v3.18.3 # The version of Commitizen you're using + hooks: + - id: commitizen # Ensures your commit messages follow the conventional format + - id: commitizen-branch + stages: [push] # Optionally, enforce commit message checks on push +``` + +By incorporating Commitizen into your development workflow, you not only standardize commit messages across your project but also create a more readable and navigable project history. This practice is invaluable for team collaboration and can significantly improve the maintenance and understanding of your project over time. + ## Is there a way to bypass a hook validation? There are occasions when bypassing a pre-commit hook is necessary, such as when you need to make a quick fix or are confident the commit does not introduce any issues. To bypass the pre-commit hooks, you can use the following commands: diff --git a/docs/5. Refining/5.5. AI-ML Experiments.md b/docs/5. Refining/5.5. AI-ML Experiments.md index 946f09f..86823be 100644 --- a/docs/5. Refining/5.5. AI-ML Experiments.md +++ b/docs/5. Refining/5.5. AI-ML Experiments.md @@ -2,16 +2,158 @@ ## What is an AI/ML experiment? +An AI/ML experiment is a structured process where data scientists and machine learning engineers systematically apply various algorithms, tune parameters, and manipulate datasets to develop predictive models. The goal is to find the most effective model that solves a given problem or enhances the performance of existing solutions. These experiments are iterative, involving trials of different configurations to evaluate their impact on model accuracy, efficiency, and other relevant metrics. + ## Why do you need AI/ML experiment? -ML has special artifacts +- **Improve reproducibility**: Reproducibility is crucial in AI/ML projects to ensure that experiments can be reliably repeated and verified by others. Experiment tracking helps in documenting the setup, code, data, and outcomes, making it easier to replicate results. +- **Find the best hyperparameters**: Hyperparameter tuning is a fundamental step in enhancing model performance. AI/ML experiments allow you to systematically test different hyperparameter configurations to identify the ones that yield the best results. +- **Assign tags to organize your experiments**: Tagging helps in categorizing experiments based on various criteria such as the type of model, dataset used, or the objective of the experiment. This organization aids in navigating and analyzing experiments efficiently. +- **Track the performance improvement during a run**: Continuous monitoring of model performance metrics during experiment runs helps in understanding the impact of changes and guiding further adjustments. +- **Integrate with several AI/ML frameworks**: Experiment tracking tools support integration with popular AI/ML frameworks, streamlining the experimentation process across different environments and tools. + +AI/ML experimentation is distinct in MLOps due to the inherent complexity and non-deterministic nature of machine learning tasks. Leveraging experiment tracking tools equips teams with a structured approach to manage this complexity, akin to how scientists document their research findings. ## Which AI/ML experiment solution should you use? +There are various AI/ML experiment tracking solutions available, each offering unique features. Major cloud providers like Azure (Azure ML), AWS (SageMaker), and Google Cloud (Vertex AI) offer integrated MLOps platforms that include experiment tracking capabilities. There are also vendor-specific tools such as Weights & Biases and Neptune AI that specialize in experiment tracking. Among the open-source options, MLflow stands out as a robust and versatile choice for tracking experiments, integrating with a wide range of ML libraries and frameworks. + +To install MLflow, execute: + +```bash +poetry install mlflow +``` + +To verify the installation and start the MLflow server: + +```bash +poetry run mlflow doctor +poetry run mlflow server +``` + +For Docker Compose users, the following configuration can launch an MLflow server: + +```yaml +services: + mlflow: + image: ghcr.io/mlflow/mlflow:v2.11.0 + ports: + - 5000:5000 + environment: + - MLFLOW_HOST=0.0.0.0 + command: mlflow server +``` + +Run `docker compose up` to start the server. + +Further deployment details for broader access can be found in MLflow's documentation. + +## How should you configure MLflow in your project? + +Configuring MLflow in your project enables efficient tracking of experiments. Initially, set the tracking and registry URIs, and specify the experiment name. Enabling autologging will automatically record metrics, parameters, and models without manual instrumentation. + +```python +import mlflow + +mlflow.set_tracking_uri("file://./mlruns") +mlflow.set_registry_uri("file://./mlruns") +mlflow.set_experiment(experiment_name="Bike Sharing Demand Prediction") +mlflow.autolog() +``` + +To begin tracking, wrap your code in an MLflow run context, specifying details like the run name and description: + +```python +with mlflow.start_run( + run_name="Demand Forecast Model Training", + description="Training with enhanced feature set", + log_system_metrics=True, +) as run: + # Your model training code here +``` + ## Which information can you track in an AI/ML experiment? -Metric, tags, metadata ... +MLflow's autologging capability simplifies the tracking of experiments by automatically recording several informations. You can complement autologging by manually logging additional information: + +- Parameters with `mlflow.log_param()` for individual key-value pairs, or `mlflow.log_params()` for multiple parameters. +- Metrics using `mlflow.log_metric()` for single key-value metrics, capturing the evolution of metrics over time, or `mlflow.log_metrics()` for multiple metrics. +- Input datasets and context with `mlflow.log_input()`, including tags for detailed categorization. +- Tags for the active run through `mlflow.set_tag()` for single tags or `mlflow.set_tags()` for multiple tags. +- Artifacts such as files or directories with `mlflow.log_artifact()` or `mlflow.log_artifacts()` for logging multiple files. ## How can you compare AI/ML experiments in your project? -## What are some tips and tricks for using AI/ML experiments? \ No newline at end of file +Comparing AI/ML experiments is crucial for identifying the most effective models and configurations. MLflow offers two primary methods for comparing experiments: through its web user interface (UI) and programmatically. Here's how you can use both methods: + +### Comparing Experiments via the MLflow Web UI + +1. **Launch the MLflow Tracking Server**: Start the MLflow tracking server if it isn't running already. + +2. **Navigate to the Experiments Page**: Navigate to the experiments page where all your experiments are listed. + +3. **Select Experiments to Compare**: Find the experiments you're interested in comparing and use the checkboxes to select them. You can select multiple experiments for comparison. + +4. **Use the Compare Button**: After selecting the experiments, click on the "Compare" button. This will take you to a comparison view where you can see the runs side-by-side. + +5. **Analyze the Results**: The comparison view will display key metrics, parameters, and other logged information for each run. Use this information to analyze the performance and characteristics of each model or configuration. + +### Comparing Experiments Programmatically + +Comparing experiments programmatically offers more flexibility and can be integrated into your analysis or reporting tools. + +1. **Search Runs**: Use the `mlflow.search_runs()` function to query the experiments you want to compare. You can filter experiments based on experiment IDs, metrics, parameters, and tags. For example: + +```python +import mlflow + +# Assuming you know the experiment IDs or names +experiment_ids = ["1", "2"] +runs_df = mlflow.search_runs(experiment_ids) +``` + +2. **Filter and Sort**: Once you have the dataframe with runs, you can use pandas operations to filter, sort, and manipulate the data to focus on the specific metrics or parameters you're interested in comparing. + +3. **Visualize the Comparison**: For a more intuitive comparison, consider visualizing the results using libraries such as Matplotlib or Seaborn. For example, plotting the performance metrics of different runs can help in visually assessing which configurations performed better. + + +```python +import matplotlib.pyplot as plt + +# Example: Comparing validation accuracy of different runs +plt.figure(figsize=(10, 6)) +for _, row in runs_df.iterrows(): + plt.plot(row['metrics.validation_accuracy'], label=f"Run {row['run_id'][:7]}") + +plt.title("Comparison of Validation Accuracy Across Runs") +plt.xlabel("Epoch") +plt.ylabel("Validation Accuracy") +plt.legend() +plt.show() +``` + +These methods enable you to conduct thorough comparisons between different experiments, helping guide your decisions on model improvements and selections. + +## What are some tips and tricks for using AI/ML experiments? + +To maximize the efficacy of AI/ML experiments: + +- Align logged information with relevant business metrics to ensure experiments are focused on meaningful outcomes. +- Use nested runs to structure experiments hierarchically, facilitating organized exploration of parameter spaces. +```python +with mlflow.start_run() as parent_run: + param = [0.01, 0.02, 0.03] + + # Create a child run for each parameter setting + for p in param: + with mlflow.start_run(nested=True) as child_run: + mlflow.log_param("p", p) + ... + mlflow.log_metric("val_loss", val_loss) +``` +- Employ tagging extensively to enhance the searchability and categorization of experiments. +- Track detailed progress by logging steps and timestamps, providing insights into the evolution of model performance. +```python +mlflow.log_metric(key="train_loss", value=train_loss, step=epoch, timestamp=now) +``` +- Regularly log models to the model registry for versioning and to facilitate deployment processes. \ No newline at end of file