5.0 and 5.1 done

MLOps-Courses · Apr 4, 2024 · fa5a3d5 · fa5a3d5
1 parent 978826c
commit fa5a3d5
Show file tree

Hide file tree

Showing 11 changed files with 357 additions and 55 deletions.
diff --git a/docs/5. Refining/5.0. Design Patterns.md b/docs/5. Refining/5.0. Design Patterns.md
@@ -2,16 +2,166 @@
 
 ## What is a software design pattern?
 
-## Why do I need software design patterns?
+Software design patterns are proven solutions to common problems encountered during software development. These patterns provide a template for how to solve a problem in a way that has been validated by other developers over time. The concept originated from architecture and was adapted to computer science to help developers design more efficient, maintainable, and reliable code. In essence, design patterns serve as blueprints for solving specific software design issues.
 
-Why needed in Python
+## Why do you need software design patterns?
 
-## What are the top design patterns to known?
+- **Freedom of choice**: In the realm of AI/ML, flexibility and adaptability are paramount. Design patterns enable solutions to remain versatile, allowing for the integration of various options and methodologies without locking into a single approach.
+- **Code Robustness**: Python's dynamic nature demands discipline from developers to ensure code robustness. Design patterns provide a structured approach to coding that enhances stability and reliability.
+- **Developer productivity**: Employing the right design patterns can significantly boost developer productivity. These patterns facilitate the exploration of diverse solutions, enabling developers to achieve more in less time and enhance the overall value of their projects.
 
-## How can I define software interfaces with Python?
+While Python's flexibility is one of its strengths, it can also lead to challenges in maintaining code robustness and reliability. Design patterns help to mitigate these challenges by improving code quality and leveraging proven strategies to refine and enhance the codebase.
 
-ABC vs protocols
+## What are the top design patterns to know?
 
-## How can I better validate and instantiate my objects?
+Design patterns are typically categorized into three types:
 
-Pydantic
+### Strategy Pattern (Behavioral)
+
+The Strategy pattern is crucial in MLOps for decoupling the objectives (what to do) from the methodologies (how to do it). For example, it allows for the interchange of different algorithms or frameworks (such as TensorFlow, XGBoost, or PyTorch) for model training without altering the underlying code structure. This pattern upholds the Open/Closed Principle, providing the flexibility needed to adapt to changing requirements, such as switching models or data sources based on runtime conditions.
+
+### Factory Pattern (Creational)
+
+After establishing common interfaces, the Factory pattern plays a vital role in enabling runtime behavior modification of programs. It controls object creation, allowing for dynamic adjustments through external configurations. In MLOps, this translates to the ability to alter AI/ML pipeline settings without code modifications. Python's dynamic features, combined with utilities like Pydantic, facilitate the implementation of the Factory pattern by simplifying user input validation and object instantiation.
+
+### Adapter Pattern (Structural)
+
+The Adapter pattern is indispensable in MLOps due to the diversity of standards and interfaces in the field. It provides a means to integrate various external components, such as training and inference systems across different platforms (e.g., Databricks and Kubernetes), by bridging incompatible interfaces. This ensures seamless integration and the generalization of external components, allowing for smooth communication and operation between disparate systems.
+
+## How can you define software interfaces with Python?
+
+Python supports two primary methods for defining interfaces: Abstract Base Classes (ABC) and Protocols.
+
+ABCs utilize Nominal Typing to establish clear class hierarchies and relationships, such as a RandomForestModel being a subtype of a Model. This approach makes the connection between classes explicit:
+
+```python
+from abc import ABC, abstractmethod
+
+import pandas as pd
+
+class Model(ABC):
+ @abstractmethod
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        pass
+
+ @abstractmethod
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        pass
+
+class RandomForestModel(Model):
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        print("Fitting RandomForestModel...")
+
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        print("Predicting with RandomForestModel...")
+        return pd.DataFrame()
+
+class SVMModel(Model):
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        print("Fitting SVMModel...")
+
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        print("Predicting with SVMModel...")
+        return pd.DataFrame()
+```
+
+Conversely, Protocols adhere to the Structural Typing principle, embodying Python's duck typing philosophy where a class is considered compatible if it implements certain methods, regardless of its place in the class hierarchy. This means a RandomForestModel is recognized as a Model by merely implementing the expected behaviors.
+
+```python
+from typing import Protocol, runtime_checkable
+import pandas as pd
+
+@runtime_checkable
+class Model(Protocol):
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        ...
+
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        ...
+
+class RandomForestModel:
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        print("Fitting RandomForestModel...")
+
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        print("Predicting with RandomForestModel...")
+        return pd.DataFrame()
+
+class SVMModel:
+    def fit(self, X: pd.DataFrame, y: pd.DataFrame) -> None:
+        print("Fitting SVMModel...")
+
+    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
+        print("Predicting with SVMModel...")
+        return pd.DataFrame()
+```
+
+Choosing between ABCs and Protocols depends on your project's needs. ABCs offer a more explicit, structured approach suitable for applications, while Protocols offer flexibility and are more aligned with library development.
+
+## How can you better validate and instantiate your objects?
+
+Pydantic is a valuable tool for defining, validating, and instantiating objects according to specified requirements. It utilizes type annotations to ensure inputs meet predefined criteria, significantly reducing the risk of errors in data-driven operations, such as in MLOps processes.
+
+### Validating Objects with Pydantic
+
+Pydantic utilizes Python's type hints to validate data, ensuring that the objects you create adhere to your specifications from the get-go. This feature is particularly valuable in MLOps, where data integrity is crucial for the success of machine learning models. Here's how you can leverage Pydantic for object validation:
+
+```python
+from typing import Optional
+from pydantic import BaseModel, Field
+
+class RandomForestClassifierModel(BaseModel):
+    n_estimators: int = Field(default=100, gt=0)
+    max_depth: Optional[int] = Field(default=None, gt=0, allow_none=True)
+    random_state: Optional[int] = Field(default=None, gt=0, allow_none=True)
+
+# Instantiate the model with validated parameters
+model = RandomForestClassifierModel(n_estimators=120, max_depth=5, random_state=42)
+```
+
+In this example, Pydantic ensures that `n_estimators` is greater than 0, `max_depth` is either greater than 0 or `None`, and similarly for `random_state`. This kind of validation is essential for maintaining the integrity of your model training processes.
+
+### Streamlining Object Instantiation with Discriminated Union
+
+Pydantic's Discriminated Union feature further simplifies object instantiation, allowing you to dynamically select a class based on a specific attribute (e.g., `KIND`). This approach can serve as an efficient alternative to the traditional Factory pattern, reducing the need for boilerplate code:
+
+```python
+from typing import Literal, Union
+from pydantic import BaseModel, Field
+
+class Model(BaseModel):
+    KIND: str
+
+class RandomForestModel(Model):
+    KIND: Literal["RandomForest"]
+    n_estimators: int = 100
+    max_depth: int = 5
+    random_state: int = 42
+
+class SVMModel(Model):
+    KIND: Literal["SVM"]
+    C: float = 1.0
+    kernel: str = "rbf"
+    degree: int = 3
+
+# Define a Union of model configurations
+ModelKind = Union[RandomForestModel, SVMModel]
+
+class Job(BaseModel):
+    model: ModelKind = Field(..., discriminator="KIND")
+
+# Initialize a job from configuration
+config = {
+    "model": {
+        "KIND": "RandomForest",
+        "n_estimators": 100,
+        "max_depth": 5,
+        "random_state": 42,
+    }
+}
+job = Job.model_validate(config)
+```
+
+This pattern not only makes the instantiation of objects based on dynamic input straightforward but also ensures that each instantiated object is immediately validated against its respective schema, further enhancing the robustness of your application.
+
+Incorporating these practices into your MLOps projects can significantly improve the reliability and maintainability of your code, ensuring that your machine learning pipelines are both efficient and error-resistant.
diff --git a/docs/5. Refining/5.1. Task Automation.md b/docs/5. Refining/5.1. Task Automation.md
@@ -2,16 +2,168 @@
 
 ## What is task automation?
 
-## Why do I need task automation?
+Task automation refers to the process of automating repetitive and manual command-line tasks using software tools. This enables tasks to be performed with minimal human intervention, increasing efficiency and accuracy. A common example of task automation in software development is the use of `make`, a utility that automates the execution of predefined tasks like `configure`, `build`, and `install` within a project repository. By executing a simple command:
 
-Don't repeat action
+```bash
+make configure build install
+```
 
-## Which tools should I use to automate my tasks?
+developers can streamline the compilation and installation process of software projects, saving time and reducing the likelihood of errors.
 
-Makefile
+## Why do you need task automation?
 
-PyInvoke
+Task automation is essential for several reasons:
 
-## How can I configure my task automation system?
+- **Don't repeat yourself**: Automating tasks helps in avoiding the repetition of similar tasks, ensuring that you spend your time on tasks that require your unique skills and insights.
+- **Share common actions**: It enables teams to share a common set of tasks, ensuring consistency and reliability across different environments and among different team members.
+- **Avoid typing mistakes**: Automation reduces the chances of errors that can occur when manually typing commands or performing repetitive tasks, leading to more reliable outcomes.
 
-## How should I organize my tasks in my project folder?
+Embracing task automation is a step towards improving efficiency for programmers. The initial effort in setting up automation pays off by saving time and reducing errors, making it a valuable practice in software development.
+
+## Which tools should you use to automate your tasks?
+
+While `Make` is a ubiquitous and powerful tool for task automation, its syntax can be challenging due to its use of unique symbols (e.g., $*, $%, :=, ...) and strict formatting rules, such as the requirement for tabs instead of spaces. This complexity can make `Make` intimidating for newcomers.
+
+For those seeking a more approachable alternative, `PyInvoke` offers a simpler, Python-based syntax for defining and running tasks. Here is an example showcasing how to build a Python package (wheel file) using PyInvoke:
+
+```python
+"""Package tasks for pyinvoke."""
+from invoke.context import Context
+from invoke.tasks import task
+from . import cleans
+
+BUILD_FORMAT = "wheel"
+
+@task(pre=[cleans.dist])
+def build(ctx: Context, format: str = BUILD_FORMAT) -> None:
+    """Build a python package with the given format."""
+    ctx.run(f"poetry build --format={format}")
+
+@task(pre=[build], default=True)
+def all(_: Context) -> None:
+    """Run all package tasks."""
+```
+
+This example illustrates how tasks can be easily defined and automated using Python, making it accessible for those already familiar with the language. Developers can then execute the task from their terminal:
+
+```bash
+# execute the build task
+inv build
+```
+
+## How can you configure your task automation system?
+
+Configuring your task automation system with PyInvoke is straightforward. It can be installed as a Python dependency through:
+
+```bash
+poetry add -G dev invoke
+```
+
+Then, to configure PyInvoke for your project, create an `invoke.yaml` file in your repository:
+
+```yaml
+run:
+  echo: true
+project:
+  name: bikes
+```
+
+This configuration file allows you to define general settings under `run` and project-specific variables under `project`. Detailed documentation and more configuration options can be found on [PyInvoke's website](https://docs.pyinvoke.org/en/latest/index.html).
+
+## How should you organize your tasks in your project folder?
+
+For an MLOps project, it's advisable to organize tasks into categories and place them within a `tasks/` directory at the root of your repository. This directory can include files for different task categories such as cleaning, commits, container management, and more. Here's an example structure:
+
+- tasks
+- tasks/**init**.py
+- tasks/cleans.py
+- tasks/commits.py
+- tasks/containers.py
+- tasks/dags.py
+- tasks/docs.py
+- tasks/installs.py
+- tasks/mlflow.py
+- tasks/packages.py
+- tasks/checks.py
+- tasks/formats.py
+
+In the `tasks/__init__.py` file, you should import and add all task modules to a collection:
+
+```python
+"""Task collections for the project."""
+from invoke import Collection
+from . import checks, cleans, commits, containers, dags, docs, formats, installs, mlflow, packages
+
+ns = Collection()
+
+ns.add_collection(checks)
+ns.add_collection(cleans)
+ns.add_collection(commits)
+ns.add_collection(containers)
+ns.add_collection(dags, default=True)
+ns.add_collection(docs)
+ns.add_collection(formats)
+ns.add_collection(installs)
+ns.add_collection(mlflow)
+ns.add_collection(packages)
+```
+
+Each module, like `checks`, can define multiple tasks. For example:
+
+```python
+"""Check tasks for pyinvoke."""
+from invoke.context import Context
+from invoke.tasks import task
+
+@task
+def poetry(ctx: Context) -> None:
+    """Check poetry config files."""
+    ctx.run("poetry check --lock")
+
+@task
+def format(ctx: Context) -> None:
+    """Check the formats with ruff."""
+    ctx.run("poetry run ruff format --check src/ tasks/ tests/")
+
+@task
+def type(ctx: Context) -> None:
+    """Check the types with mypy."""
+    ctx.run("poetry run mypy src/ tasks/ tests/")
+
+@task
+def code(ctx: Context) -> None:
+    """Check the codes with ruff."""
+    ctx.run("poetry run ruff check src/ tasks/ tests/")
+
+@task
+def test(ctx: Context) -> None:
+    """Check the tests with pytest."""
+    ctx.run("poetry run pytest --numprocesses='auto' tests/")
+
+@task
+def security(ctx: Context) -> None:
+    """Check the security with bandit."""
+    ctx.run("poetry run bandit --recursive --configfile=pyproject.toml src/")
+
+@task
+def coverage(ctx: Context) -> None:
+    """Check the coverage with coverage."""
+    ctx.run("poetry run pytest --numprocesses='auto' --cov=src/ --cov-fail-under=80 tests/")
+
+@task(pre=[poetry, format, type, code, security, coverage], default=True)
+def all(_: Context) -> None:
+    """Run all check tasks."""
+```
+
+These tasks can then be invoked from the command line as needed, providing a structured and efficient way to manage and execute project-related tasks.
+
+```bash
+# run the code checker
+inv checks.code
+# run the code and format checker
+inv checks.code checks.format
+# run all the check tasks in the module
+inv checks
+```
+
+You can explore more tasks for your AI/ML project from the [MLOps Python Package tasks folder](https://github.com/fmind/mlops-python-package/tree/main/tasks) on GitHub.
diff --git a/docs/5. Refining/5.2. Pre-Commit Hooks.md b/docs/5. Refining/5.2. Pre-Commit Hooks.md
@@ -2,13 +2,13 @@
 
 ## What are pre-commit hooks?
 
-## Why do I need pre-commit hooks?
+## Why do you need pre-commit hooks?
 
 you should avoid commiting bad code, and letting CI/CD do the work if it is not necessary.
 
 ## Which tool should as use to setup pre-commit hooks?
 
-## Which hooks should I use for an MLOps project?
+## Which hooks should you use for an MLOps project?
 
 ## Is there a way to bypass a hook validation?
 

diff --git a/docs/5. Refining/5.3 CI-CD Workflows.md b/docs/5. Refining/5.3 CI-CD Workflows.md