Skip to content

Commit

Permalink
Manual linting + reference fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
profvjreddi committed Nov 14, 2023
1 parent 8e28201 commit 5f6afda
Showing 1 changed file with 8 additions and 19 deletions.
27 changes: 8 additions & 19 deletions ops.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ Other MLOps benefits include enhanced model lineage tracking, reproducibility, a

Major organizations adopt MLOps to boost productivity, increase collaboration, and accelerate machine learning outcomes. It provides the frameworks, tools, and best practices to manage ML systems throughout their lifecycle effectively. This results in better-performing models, faster time-to-value, and sustained competitive advantage. As we explore MLOps further, consider how implementing these practices can help address embedded machine learning challenges today and in the future.


## Historical Context

MLOps has its roots in DevOps, which is a set of practices that combines software development (Dev) and IT operations (Ops) to shorten the development lifecycle and provide continuous delivery of high-quality software. The parallels between MLOps and DevOps are evident in their focus on automation, collaboration, and continuous improvement. In both cases, the goal is to break down silos between different teams (developers, operations, and, in the case of MLOps, data scientists and machine learning engineers) and to create a more streamlined and efficient process. It is useful to understand the history of this evolution to better understand MLOps in the context of traditional systems.
Expand Down Expand Up @@ -71,12 +70,10 @@ MLOps also requires collaboration between various stakeholders, including data s

While DevOps and MLOps share similarities in their goals and principles, they differ in their focus and challenges. DevOps focuses on improving the collaboration between development and operations teams and automating software delivery. In contrast, MLOps focuses on streamlining and automating the machine learning lifecycle and facilitating collaboration between data scientists, data engineers, and IT operations.


## Key Components of MLOps

In this chapter, we will provide an overview of the core components of MLOps, an emerging set of practices that enables robust delivery and lifecycle management of machine learning models in production. While some MLOps elements like automation and monitoring were covered in previous chapters, we will integrate them into an integrated framework and expand on additional capabilities like governance. By the end, we hope that you will understand the end-to-end MLOps methodology that takes models from ideation to sustainable value creation within organizations.


### Data Management

Robust data management and data engineering actively empower successful [MLOps](https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning) implementations. Teams properly ingest, store, and prepare raw data from sensors, databases, apps, and other systems for model training and deployment.
Expand All @@ -93,7 +90,6 @@ For instance, a pipeline may ingest data from [PostgreSQL](https://www.postgresq

In an industrial predictive maintenance use case, sensor data is ingested from devices into S3. A Prefect pipeline processes the sensor data, joining it with maintenance records. The enriched dataset is stored in Feast so models can easily retrieve the latest data for training and predictions.


### CI/CD Pipelines

![A sample CI/CD pipeline](images/ai_ops/cicd_pipelines.png)
Expand All @@ -106,7 +102,6 @@ For example, when a data scientist checks improvements to an image classificatio

By connecting the disparate steps from development to deployment under continuous automation, CI/CD pipelines empower teams to iterate and deliver ML models rapidly. Integrating MLOps tools like MLflow enhances model packaging, versioning, and pipeline traceability. CI/CD is integral for progressing models beyond prototypes into sustainable business systems.


### Model Training

In the model training phase, data scientists actively experiment with different machine learning architectures and algorithms to create optimized models that effectively extract insights and patterns from data. MLOps introduces best practices and automation to make this iterative process more efficient and reproducible.
Expand All @@ -125,7 +120,6 @@ An example workflow has a data scientist using a PyTorch notebook to develop a C

Automating and standardizing model training empowers teams to accelerate experimentation and achieve the rigor needed for production of ML systems.


### Model Evaluation

Before deploying models, teams perform rigorous evaluation and testing to validate meeting performance benchmarks and readiness for release. MLOps introduces best practices around model validation, auditing and [canary testing](https://martinfowler.com/bliki/CanaryRelease.html).
Expand All @@ -142,7 +136,6 @@ For example, a retailer evaluates a personalized product recommendation model ag

Automating evaluation and canary releases reduces deployment risks. But human review remains critical to assess less quantifiable dynamics of model behavior. Rigorous pre-deployment validation provides confidence in putting models into production.


### Model Deployment

To reliably deploy machine learning models to production, teams need to properly package, test and track them. MLOps introduces frameworks and procedures to actively version, deploy, monitor and update models in sustainable ways.
Expand All @@ -161,7 +154,6 @@ For example, a retailer containerizes a product recommendation model in TensorFl

Model deployment processes enable teams to make ML systems resilient in production by accounting for all transition states.


### Infrastructure Management

MLOps teams heavily leverage [infrastructure as code (IaC)](https://www.infoworld.com/article/3271126/what-is-iac-infrastructure-as-code-explained.html) tools and robust cloud architectures to actively manage the resources needed for development, training and deployment of machine learning systems.
Expand All @@ -180,7 +172,6 @@ For example, a Terraform config may deploy a GCP Kubernetes cluster to host trai

Carefully managing infrastructure through IaC and monitoring enables teams to prevent bottlenecks in operationalizing ML systems at scale.


### Monitoring

MLOps teams actively maintain robust monitoring to sustain visibility into machine learning models deployed in production. Monitoring continuously provides insights into model and system performance so teams can rapidly detect and address issues to minimize disruption.
Expand All @@ -195,7 +186,6 @@ Teams configure alerting for key monitoring metrics like accuracy declines and s

Comprehensive monitoring enables teams to maintain confidence in model and system health after deployment. It empowers teams to catch and resolve deviations through data-driven alerts and dashboards preemptively. Active monitoring is essential for maintaining highly available, trustworthy ML systems.


### Governance

MLOps teams actively establish proper governance practices as a critical component. Governance provides oversight into machine learning models to ensure they are trustworthy, ethical, and compliant. Without governance, significant risks exist of models behaving in dangerous or prohibited ways when deployed in applications and business processes.
Expand All @@ -206,7 +196,6 @@ Once in production, teams monitor [concept drift](https://en.wikipedia.org/wiki/

Platforms like [Watson OpenScale](https://www.ibm.com/cloud/watson-openscale) incorporate governance capabilities like bias monitoring and explainability directly into model building, testing and production monitoring. The key focus areas of governance are transparency, fairness, and compliance. This minimizes risks of models behaving incorrectly or dangerously when integrated into business processes. Embedding governance practices into MLOps workflows enables teams to ensure trustworthy AI.


### Communication & Collaboration

MLOps actively breaks down silos and enables free flow of information and insights between teams through all machine learning lifecycle stages. Tools like [MLflow](https://mlflow.org/), [Weights & Biases](https://wandb.ai/), and data contexts provide traceability and visibility to improve collaboration.
Expand All @@ -229,7 +218,6 @@ Technical debt is an increasingly pressing issue for machine learning systems. T
![Components of a production ML system with "ML code" highlighted as a small portion of system](images/ai_ops/hidden_debt.png)
*Components of a production ML system - the ML code is only a very small portion of the system*


### Model Boundary Erosion
Unlike traditional software, ML lacks clear boundaries between components as seen in the diagram above. This erosion of abstraction creates entanglements that exacerbate technical debt in several ways:

Expand Down Expand Up @@ -288,7 +276,6 @@ Although financial debt is a good metaphor to understand the tradeoffs, it diffe

The [Hidden Technical Debt of Machine Learning Systems](https://papers.nips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf) paper spreads awareness of the nuances of ML system specific tech debt and encourages additional development in the broad area of maintainable ML.


## Roles and Responsibilities

Given the vastness of MLOps, successfully implementing machine learning systems requires diverse skills and close collaboration between people with different areas of expertise. While data scientists build the core ML models, it takes cross-functional teamwork to successfully deploy these models into production environments and enable them to deliver business value in a sustainable way.
Expand Down Expand Up @@ -370,7 +357,6 @@ For example, a project manager would create a project plan for the development a

Skilled project managers enable MLOps teams to work synergistically to deliver maximum business value from ML investments rapidly. Their leadership and organization align with diverse teams.


## Challenges in Embedded MLOps

### Limited Compute Resources
Expand Down Expand Up @@ -415,7 +401,6 @@ Rigorously testing edge cases is difficult with constrained embedded resources f

With limited monitoring data from each remote device, detecting changes in the input data over time is much harder. Drift can lead to degraded model performance. Lightweight methods are needed to identify when retraining is necessary. A model predicting power grid loads shows declining performance as usage patterns change over time. With only local device data, this trend is difficult to spot.


## Traditional MLOps vs. Embedded MLOps

In traditional MLOps, machine learning models are typically deployed in cloud-based or server environments, where resources like computing power and memory are abundant. These environments facilitate the smooth operation of complex models that require significant computational resources. For instance, a cloud-based image recognition model might be used by a social media platform to tag photos with relevant labels automatically. In this case, the model can leverage the extensive resources available in the cloud to process vast data efficiently.
Expand Down Expand Up @@ -445,7 +430,6 @@ This time we will group the subtopics under broader categories to streamline the
* Monitoring: Techniques for monitoring model performance, data drift, and operational health.
* Governance: Implementing policies for model auditability, compliance, and ethical considerations.


### Model Lifecycle Management

![Diagram showing the components of Data and Model Management](images/ai_ops/mlops_flow.png)
Expand Down Expand Up @@ -496,7 +480,6 @@ However, some deeply embedded devices lack WiFi, Bluetooth or cellular connectiv

Moreover, specialized OTA protocols optimized for IoT networks are often used rather than standard WiFi or Bluetooth protocols. Key factors include efficiency, reliability, security, and telemetry like progress tracking. Solutions like Mender.io provide embedded-focused OTA services handling differential updates across device fleets.


### Development and Operations Integration

#### CI/CD Pipelines
Expand All @@ -513,6 +496,7 @@ With traditional CI/CD tools less applicable, embedded MLOps relies more on cust
Therefore, embedded MLOps can't leverage centralized cloud infrastructure for CI/CD. Companies cobble together custom pipelines, testing infrastructure and OTA delivery to deploy models across fragmented and disconnected edge systems.

#### Infrastructure Management

In traditional centralized MLOps, infrastructure entails provisioning cloud servers, GPUs and high-bandwidth networks for intensive workloads like model training and serving predictions at scale. However, embedded MLOps requires more heterogeneous infrastructure spanning edge devices, gateways, and cloud.

Edge devices like sensors capture and preprocess data locally before intermittent transmission to avoid overloading networks. Gateways aggregate and process data from devices before sending select subsets to the cloud for training and analysis. The cloud provides centralized management and supplemental compute.
Expand All @@ -526,6 +510,7 @@ For example, an industrial plant may perform basic signal processing on sensors
In summary, embedded MLOps requires holistic management of distributed infrastructure spanning constrained edge, gateways, and centralized cloud. Workloads are balanced across tiers while accounting for connectivity, compute and security challenges.

#### Communication & Collaboration

In traditional MLOps, collaboration tends to be centered around data scientists, ML engineers and DevOps teams. But embedded MLOps requires tighter cross-functional coordination between additional roles to address system constraints.

Edge engineers optimize model architectures for target hardware environments. They provide feedback to data scientists during development so models fit device capabilities early on. Similarly, product teams define operational requirements informed by end-user contexts.
Expand All @@ -538,9 +523,10 @@ For example, data scientists may collaborate with field teams managing fleets of

In essence, embedded MLOps mandates continuous coordination between data scientists, engineers, end customers and other stakeholders throughout the machine learning lifecycle. Only through close collaboration can models be tailored and optimized for targeted edge devices.


### Operational Excellence

#### Monitoring

In traditional MLOps, monitoring focuses on tracking model accuracy, performance metrics and data drift centrally. But embedded MLOps must account for decentralized monitoring across diverse edge devices and environments.

Edge devices require optimized data collection to transmit key monitoring metrics without overloading networks. Metrics help assess model performance, data patterns, resource usage and other behaviors on remote devices.
Expand All @@ -556,6 +542,7 @@ For example, an automaker may monitor autonomous vehicles for indicators of mode
Embedded MLOps monitoring provides observability into model and system performance across decentralized edge environments. Careful data collection, analysis and collaboration delivers meaningful insights to maintain reliability.

#### Governance

In traditional MLOps, governance focuses on model explainability, fairness and compliance for centralized systems. But embedded MLOps must also address device-level governance challenges around data privacy, security and safety.

With sensors collecting personal and sensitive data, local data governance on devices is critical. Data access controls, anonymization, and encrypted caching help address privacy risks and compliance like HIPAA and GDPR. Updates must maintain security patches and settings.
Expand All @@ -569,6 +556,7 @@ For example, a medical device may scrub personal data on-device before transmiss
In essence, embedded MLOps governance must span the dimensions of privacy, security, safety, transparency, and ethics. Specialized techniques and team collaboration are needed to help establish trust and accountability within decentralized environments.

### Comparison

Here is a comparison table highlighting similarities and differences between Traditional MLOps and Embedded MLOps based on all the things we have learned thus far:

| Area | Traditional MLOps | Embedded MLOps |
Expand All @@ -584,6 +572,7 @@ Here is a comparison table highlighting similarities and differences between Tra
So while Embedded MLOps shares foundational MLOps principles, it faces unique constraints to tailor workflows and infrastructure specifically for resource-constrained edge devices.

## Commercial Offerings

While no replacement for understanding the principles, there are an increasing number of commercial offerings that help ease the burden of building ML pipelines and integrating tools together to build, test, deploy, and monitor ML models in production.

### Traditional MLOps
Expand Down Expand Up @@ -657,7 +646,7 @@ While most ML research focuses on the model-dominant steps such as training and

Let’s take a look at MLOps in the context of medical health monitoring to better understand how MLOps “matures” in the context of a real world deployment. Specifically, let’s consider continuous therapeutic monitoring (CTM) enabled by wearable devices and sensors , providing the opportunity for more frequent and personalized adjustments to treatments by capturing detailed physiological data from patients.

Wearable machine learning enabled sensors enable continuous physiological and activity monitoring outside of clinics, opening up possibilities for timely, data-driven adjustments of therapies. For example, wearable insulin biosensors[@wearableinsulin] and wrist-worn ECG sensors for glucose monitoring[@glucosemonitor] can automate insulin dosing for diabetes, wrist-worn ECG and PPG sensors can adjust blood thinners based on atrial fibrillation patterns[@plasma][@afib], and accelerometers tracking gait can trigger preventative care for declining mobility in the elderly [@gaitathome]. The variety of signals that can now be captured passively and continuously allows therapy titration and optimization tailored to each patient’s changing needs. By closing the loop between physiological sensing and therapeutic response with TinyML and ondevice learning, wearables are poised to transform many areas of personalized medicine.
Wearable machine learning enabled sensors enable continuous physiological and activity monitoring outside of clinics, opening up possibilities for timely, data-driven adjustments of therapies. For example, wearable insulin biosensors [@wearableinsulin] and wrist-worn ECG sensors for glucose monitoring [@glucosemonitor] can automate insulin dosing for diabetes, wrist-worn ECG and PPG sensors can adjust blood thinners based on atrial fibrillation patterns [@plasma; @afib], and accelerometers tracking gait can trigger preventative care for declining mobility in the elderly [@gaitathome]. The variety of signals that can now be captured passively and continuously allows therapy titration and optimization tailored to each patient’s changing needs. By closing the loop between physiological sensing and therapeutic response with TinyML and ondevice learning, wearables are poised to transform many areas of personalized medicine.

ML holds great promise in analyzing CTM data to provide data-driven recommendations for therapy adjustments. But simply deploying AI models in silos, without integrating them properly into clinical workflows and decision making, can lead to poor adoption or suboptimal outcomes. In other words, thinking about MLOps alone is simply insufficient to make them useful in practice. What is needed are frameworks to seamlessly incorporate AI and CTM into real-world clinical practice as this study shows.

Expand Down

0 comments on commit 5f6afda

Please sign in to comment.