From fb088f2fa5dfdf2f33a90cc9a372d67bf5d87710 Mon Sep 17 00:00:00 2001 From: jasonjabbour Date: Fri, 15 Nov 2024 18:07:51 -0500 Subject: [PATCH] group fragmented topics throughout chapter --- contents/core/ops/ops.qmd | 74 +++++++++------------------------------ 1 file changed, 17 insertions(+), 57 deletions(-) diff --git a/contents/core/ops/ops.qmd b/contents/core/ops/ops.qmd index f588e8b9..14b49cce 100644 --- a/contents/core/ops/ops.qmd +++ b/contents/core/ops/ops.qmd @@ -157,55 +157,43 @@ CI/CD pipelines empower teams to iterate and deliver ML models rapidly by connec ### Model Training -In the model training phase, data scientists actively experiment with different ML architectures and algorithms to create optimized models that extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible. +Model training is a critical phase where data scientists experiment with various ML architectures and algorithms to optimize models that extract insights from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible. Modern ML frameworks like [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/), and [Keras](https://keras.io/) provide pre-built components that simplify designing neural networks and other model architectures. These tools allow data scientists to focus on creating high-performing models using built-in modules for layers, activations, and loss functions. -Modern ML frameworks like [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/) and [Keras](https://keras.io/) provide pre-built components that simplify designing neural networks and other model architectures. Data scientists leverage built-in modules for layers, activations, losses, etc., and high-level APIs like Keras to focus more on model architecture. +To make the training process efficient and reproducible, MLOps introduces best practices such as version-controlling training code using Git and hosting it in repositories like GitHub. Reproducible environments, often managed through interactive tools like [Jupyter](https://jupyter.org/) notebooks, allow teams to bundle data ingestion, preprocessing, model development, and evaluation in a single document. These notebooks are not only version-controlled but can also be integrated into automated pipelines for continuous retraining. -MLOps enables teams to package model training code into reusable, tracked scripts and notebooks. As models are developed, capabilities like [hyperparameter tuning](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview), [neural architecture search](https://arxiv.org/abs/1808.05377) and [automatic feature selection](https://scikit-learn.org/stable/modules/feature_selection.html) rapidly iterate to find the best-performing configurations. +Automation plays a significant role in standardizing training workflows. Capabilities such as [hyperparameter tuning](https://cloud.google.com/ai-platform/training/docs/hyperparameter-tuning-overview), [neural architecture search](https://arxiv.org/abs/1808.05377), and [automatic feature selection](https://scikit-learn.org/stable/modules/feature_selection.html) are commonly integrated into MLOps pipelines to iterate rapidly and find optimal configurations. CI/CD pipelines orchestrate training workflows by automating tasks like data preprocessing, model training, evaluation, and registration. For example, a Jenkins pipeline can trigger a Python script to retrain a TensorFlow model, validate its performance against pre-defined metrics, and deploy it if thresholds are met. -Teams use Git to version control training code and host it in repositories like GitHub to track changes over time. This allows seamless collaboration between data scientists. - -Notebooks like [Jupyter](https://jupyter.org/) create an excellent interactive model development environment. The notebooks contain data ingestion, preprocessing, model declaration, training loop, evaluation, and export code in one reproducible document. - -Finally, teams orchestrate model training as part of a CI/CD pipeline for automation. For instance, a Jenkins pipeline can trigger a Python script to load new training data, retrain a TensorFlow classifier, evaluate model metrics, and automatically register the model if performance thresholds are met. +Cloud-managed training services have revolutionized the accessibility of high-performance hardware for training models. These services provide on-demand access to GPU-accelerated infrastructure, making advanced training feasible even for small teams. Depending on the provider, developers may manage the training workflow themselves or rely on fully managed options like [Vertex AI Fine Tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models), which can automatically finetune a base model using a labeled dataset. However, it is important to note that GPU hardware demand often exceeds supply, and availability may vary based on region or contractual agreements, posing potential bottlenecks for teams relying on cloud services. An example workflow has a data scientist using a PyTorch notebook to develop a CNN model for image classification. The [fastai](https://www.fast.ai/) library provides high-level APIs to simplify training CNNs on image datasets. The notebook trains the model on sample data, evaluates accuracy metrics, and tunes hyperparameters like learning rate and layers to optimize performance. This reproducible notebook is version-controlled and integrated into a retraining pipeline. -Automating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed to produce ML systems. +By automating and standardizing model training, leveraging managed cloud services, and integrating modern frameworks, teams can accelerate experimentation and build robust, production-ready ML models. ### Model Evaluation -Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing, and [canary testing](https://martinfowler.com/bliki/CanaryRelease.html). - -Teams typically evaluate models against holdout [test datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) that are not used during training. The test data originates from the same distribution as production data. Teams calculate metrics like [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), [precision](https://en.wikipedia.org/wiki/Precision_and_recall), [recall](https://en.wikipedia.org/wiki/Precision_and_recall), and [F1 score](https://en.wikipedia.org/wiki/F1_score). +Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps provides best practices for model validation, auditing, and controlled testing methods to minimize risks during deployment. -Teams also track the same metrics over time against test data samples. If evaluation data comes from live production streams, this catches [data drifts](https://www.ibm.com/cloud/learn/data-drift) that degrade model performance over time. +The evaluation process begins with testing models against holdout [test datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets) that are independent of the training data but originate from the same distribution as production data. Key metrics such as [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), [AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve), [precision](https://en.wikipedia.org/wiki/Precision_and_recall), [recall](https://en.wikipedia.org/wiki/Precision_and_recall), and [F1 score](https://en.wikipedia.org/wiki/F1_score) are calculated to quantify model performance. Tracking these metrics over time helps teams identify trends and potential degradation in model behavior, particularly when evaluation data comes from live production streams. This is vital for detecting [data drift](https://www.ibm.com/cloud/learn/data-drift), where changes in input data distributions can erode model accuracy. -Human oversight for model release remains important. Data scientists review performance across key segments and slices. Error analysis helps identify model weaknesses to guide enhancement. Teams apply [fairness](https://developers.google.com/machine-learning/fairness-overview) and [bias detection](https://developers.google.com/machine-learning/fairness-overview) techniques. +To validate real-world performance, [canary testing](https://martinfowler.com/bliki/CanaryRelease.html) deploys the model to a small subset of users. This gradual rollout allows teams to monitor metrics in a live environment and catch potential issues before full-scale deployment. By incrementally increasing traffic to the new model, teams can confidently evaluate its impact on end-user experience. For instance, a retailer might test a personalized recommendation model by comparing its accuracy and diversity metrics against historical data. During the testing phase, the team tracks live performance metrics and identifies a slight accuracy decline over two weeks. To ensure stability, the model is initially deployed to 5% of web traffic, monitored for potential issues, and only rolled out widely after proving robust in production. -Canary testing releases a model to a small subset of users to evaluate real-world performance before wide deployment. Teams incrementally route traffic to the canary release while monitoring for issues. +ML models deployed to the cloud benefit from constant internet connectivity and the ability to log every request and response. This makes it feasible to replay or generate synthetic requests for comparing different models and versions. Some providers offer tools that automate parts of the evaluation process, such as tracking hyperparameter experiments or comparing model runs. For instance, platforms like [Weights and Biases](https://wandb.ai/) streamline this process by automating experiment tracking and generating artifacts from training runs. -For example, a retailer evaluates a personalized product recommendation model against historical test data, reviewing accuracy and diversity metrics. Teams also calculate metrics on live customer data over time, detecting decreased accuracy over the last 2 weeks. Before full rollout, the new model is released to 5% of web traffic to ensure no degradation. - -Automating evaluation and canary releases reduces deployment risks. However, human review still needs to be more critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production. +Automating evaluation and testing processes, combined with careful canary testing, reduces deployment risks. While automated evaluation processes catch many issues, human oversight remains essential for reviewing performance across specific data segments and identifying subtle weaknesses. This combination of rigorous pre-deployment validation and real-world testing provides teams with confidence when putting models into production. ### Model Deployment Teams need to properly package, test, and track ML models to reliably deploy them to production. MLOps introduces frameworks and procedures for actively versioning, deploying, monitoring, and updating models in sustainable ways. -Teams containerize models using [Docker](https://www.docker.com/), which bundles code, libraries, and dependencies into a standardized unit. Containers enable smooth portability across environments. - -Frameworks like [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and [BentoML](https://bentoml.org/) help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring. - -Teams first deploy updated models to staging or QA environments for testing before full production rollout. Shadow or canary deployments route a sample of traffic to test model variants. Teams incrementally increase access to new models. +One common approach to deployment involves containerizing models using tools like [Docker](https://www.docker.com/), which package code, libraries, and dependencies into standardized units. Containers ensure smooth portability across environments, making deployment consistent and predictable. Frameworks like [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and [BentoML](https://bentoml.org/) help serve predictions from deployed models via performance-optimized APIs. These frameworks handle versioning, scaling, and monitoring. -Teams build robust rollback procedures in case issues emerge. Rollbacks revert to the last known good model version. Integration with CI/CD pipelines simplifies redeployment if needed. +Before full-scale rollout, teams deploy updated models to staging or QA environments to rigorously test performance. Techniques such as shadow or canary deployments are used to validate new models incrementally. For instance, canary deployments route a small percentage of traffic to the new model while closely monitoring performance. If no issues arise, traffic to the new model gradually increases. Robust rollback procedures are essential to handle unexpected issues, reverting systems to the previous stable model version to ensure minimal disruption. Integration with CI/CD pipelines further automates the deployment and rollback process, enabling efficient iteration cycles. -Teams carefully track model artifacts, such as scripts, weights, logs, and metrics, for each version with ML metadata tools like [MLflow](https://mlflow.org/). This maintains lineage and auditability. +To maintain lineage and auditability, teams track model artifacts, including scripts, weights, logs, and metrics, using tools like [MLflow](https://mlflow.org/). Model registries, such as [Vertex AI's model registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction), act as centralized repositories for storing and managing trained models. These registries not only facilitate version comparisons but also often include access to base models, which may be open source, proprietary, or a hybrid (e.g., [LLAMA](https://ai.meta.com/llama/)). Deploying a model from the registry to an inference endpoint is streamlined, handling resource provisioning, model weight downloads, and hosting. -For example, a retailer containerizes a product recommendation model in [TensorFlow Serving](https://www.tensorflow.org/tfx/guide/serving) and deploys it to a [Kubernetes](https://kubernetes.io/) staging cluster. After monitoring and approving performance on sample traffic, Kubernetes shifts 10% of production traffic to the new model. If no issues are detected after a few days, the new model takes over 100% of traffic. However, teams should keep the previous version accessible for rollback if needed. +Inference endpoints typically expose the deployed model via REST APIs for real-time predictions. Depending on performance requirements, teams can configure resources, such as GPU accelerators, to meet latency and throughput targets. Some providers also offer flexible options like serverless or batch inference, eliminating the need for persistent endpoints and enabling cost-efficient, scalable deployments. For example, [AWS SageMaker Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) supports such configurations. -Model deployment processes enable teams to make ML systems resilient in production by accounting for all transition states. +By leveraging these tools and practices, teams can deploy ML models resiliently, ensuring smooth transitions between versions, maintaining production stability, and optimizing performance across diverse use cases. ### Model Serving @@ -539,7 +527,7 @@ The volume of aggregated data is much lower, often requiring techniques like fed Furthermore, the models must use simplified architectures optimized for low-power edge hardware. Given the computing limitations, high-end GPUs are inaccessible for intensive deep learning. Training leverages lower-powered edge servers and clusters with distributed approaches to spread load. -Transfer learning emerges as a crucial strategy to address data scarcity and irregularity in machine learning, particularly in edge computing scenarios. As illustrated in @fig-transfer-learning-mlops, this approach involves pre-training models on large public datasets and then fine-tuning them on limited domain-specific edge data. The figure depicts a neural network where initial layers (W_{A1} to W_{A4}), responsible for general feature extraction, are frozen (indicated by a green dashed line). These layers retain knowledge from previous tasks, accelerating learning and reducing resource requirements. The latter layers (W_{A5} to W_{A7}), beyond the blue dashed line, are fine-tuned for the specific task, focusing on task-specific feature learning. +Transfer learning emerges as a crucial strategy to address data scarcity and irregularity in machine learning, particularly in edge computing scenarios. As illustrated in @fig-transfer-learning-mlops, this approach involves pre-training models on large public datasets and then fine-tuning them on limited domain-specific edge data. The figure depicts a neural network where initial layers ($W_{A1}$ to $W_{A4}$), responsible for general feature extraction, are frozen (indicated by a green dashed line). These layers retain knowledge from previous tasks, accelerating learning and reducing resource requirements. The latter layers ($W_{A5}$ to $W_{A7}$), beyond the blue dashed line, are fine-tuned for the specific task, focusing on task-specific feature learning. ![Transfer learning in MLOps. Source: HarvardX.](images/png/transfer_learning.png){#fig-transfer-learning-mlops} @@ -671,35 +659,7 @@ Embedded MLOps governance must encompass privacy, security, safety, transparency So, while Embedded MLOps shares foundational MLOps principles, it faces unique constraints in tailoring workflows and infrastructure specifically for resource-constrained edge devices. -### Traditional MLOps - -Google, Microsoft, and Amazon offer their version of managed ML services. These include services that manage model training and experimentation, model hosting and scaling, and monitoring. These offerings are available via an API and client SDKs, as well as through web UIs. While it is possible to build your own end-to-end MLOps solutions using pieces from each, the greatest ease of use benefits come by staying within a single provider ecosystem to take advantage of interservice integrations. - -The following sections present a quick overview of the services that fit into each part of the MLOps life cycle described above, providing examples of offerings from different providers. It's important to note that the MLOps space is evolving rapidly; new companies and products are entering the scene at a swift pace. The examples mentioned are not meant to serve as endorsements of particular companies' offerings but rather to illustrate the types of solutions available in the market. - -#### Data Management - -Data storage and versioning are table stakes for any commercial offering, and most take advantage of existing general-purpose storage solutions such as S3. Others use more specialized options such as git-based storage (Example: [Hugging Face's Dataset Hub](https://huggingface.co/datasets)). This is an area where providers make it easy to support their competitors' data storage options, as they don't want this to be a barrier for adoptions of the rest of their MLOps services. For example, Vertex AI's training pipeline seamlessly supports datasets stored in S3, Google Cloud Buckets, or Hugging Face's Dataset Hub. - -#### Model Training - -Managed training services are where cloud providers shine, as they provide on-demand access to hardware that is out of reach for most smaller companies. They bill only for hardware during training time, putting GPU-accelerated training within reach of even the smallest developer teams. The control developers have over their training workflow can vary widely depending on their needs. Some providers have services that provide little more than access to the resources and rely on the developer to manage the training loop, logging, and model storage themselves. Other services are as simple as pointing to a base model and a labeled data set to kick off a fully managed finetuning job (example: [Vertex AI Fine Tuning](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models)). - -A word of warning: As of 2023, GPU hardware demand well exceeds supply, and as a result, cloud providers are rationing access to their GPUs. In some data center regions, GPUs may be unavailable or require long-term contracts. - -#### Model Evaluation - -Model evaluation tasks typically involve monitoring models' accuracy, latency, and resource usage in both the testing and production phases. Unlike embedded systems, ML models deployed to the cloud benefit from constant internet connectivity and unlimited logging capacities. As a result, it is often feasible to capture and log every request and response. This makes replaying or generating synthetic requests to compare different models and versions tractable. - -Some providers also offer services that automate the experiment tracking of modifying model hyperparameters. They track the runs and performance and generate artifacts from these model training runs. Example: [WeightsAndBiases](https://wandb.ai/) - -#### Model Deployment - -Each provider typically has a service referred to as a "model registry," where training models are stored and accessed. Often, these registries may also provide access to base models that are either open source or provided by larger technology companies (or, in some cases, like [LLAMA](https://ai.meta.com/llama/), both!). These model registries are a common place to compare all the models and their versions to allow easy decision-making on which to pick for a given use case. Example: [Vertex AI's model registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) - -From the model registry, deploying a model to an inference endpoint is quick and simple, and it handles the resource provisioning, model weight downloading, and hosting of a given model. These services typically give access to the model via a REST API where inference requests can be sent. Depending on the model type, specific resources can be configured, such as which type of GPU accelerator may be needed to hit the desired performance. Some providers may also offer serverless inference or batch inference options that do not need a persistent endpoint to access the model. Example: [AWS SageMaker Inference](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html) - -### Embedded MLOps +### Embedded MLOps Services Despite the proliferation of new MLOps tools in response to the increase in demand, the challenges described earlier have constrained the availability of such tools in embedded systems environments. More recently, new tools such as Edge Impulse [@janapa2023edge] have made the development process somewhat easier, as described below.