Summary

"Edited file ml_systems.qmd. Several images have been cropped to fit the width of the text.
harvard-edge · Jan 11, 2025 · 224c6a2 · 224c6a2
1 parent a2a5ad0
commit 224c6a2
Show file tree

Hide file tree

Showing 6 changed files with 8 additions and 6 deletions.
diff --git a/contents/core/ml_systems/images/jpg/tiny_ml.jpg b/contents/core/ml_systems/images/jpg/tiny_ml.jpg
diff --git a/contents/core/ml_systems/images/png/cloud-edge-tiny.png b/contents/core/ml_systems/images/png/cloud-edge-tiny.png
diff --git a/contents/core/ml_systems/images/png/cloudml.png b/contents/core/ml_systems/images/png/cloudml.png
diff --git a/contents/core/ml_systems/images/png/edgeml.png b/contents/core/ml_systems/images/png/edgeml.png
diff --git a/contents/core/ml_systems/images/png/tinyml.png b/contents/core/ml_systems/images/png/tinyml.png
diff --git a/contents/core/ml_systems/ml_systems.qmd b/contents/core/ml_systems/ml_systems.qmd
@@ -14,7 +14,7 @@ Resources: [Slides](#sec-ml-systems-resource), [Videos](#sec-ml-systems-resource
 
 _How do the diverse environments where machine learning operates shape the fundamental nature of these systems, and what drives their widespread deployment across computing platforms?_
 
-The deployment of machine learning systems across varied computing environments reveals essential insights into the relationship between theoretical principles and practical implementation. Each computing environment - from large-scale distributed systems to resource-constrained devices - introduces distinct requirements that influence both system architecture and algorithmic approaches. Understanding these relationships reveals core engineering principles that govern the design of machine learning systems. This understanding provides a foundation for examining how theoretical concepts translate into practical implementations, and how system designs adapt to meet diverse computational, memory, and energy constraints.
+The deployment of machine learning systems across varied computing environments reveals essential insights into the relationship between theoretical principles and practical implementation. Each computing environment – from large-scale distributed systems to resource-constrained devices – introduces distinct requirements that influence both system architecture and algorithmic approaches. Understanding these relationships reveals core engineering principles that govern the design of machine learning systems. This understanding provides a foundation for examining how theoretical concepts translate into practical implementations, and how system designs adapt to meet diverse computational, memory, and energy constraints.
 
 :::{.callout-tip}
 
@@ -72,13 +72,13 @@ To better understand the dramatic differences between these ML deployment option
 
 The evolution of machine learning systems can be seen as a progression from centralized to increasingly distributed and specialized computing paradigms:
 
-**Cloud ML:** Initially, ML was predominantly cloud-based. Powerful, scalable servers in data centers are used to train and run large ML models. This approach leverages vast computational resources and storage capacities, enabling the development of complex models trained on massive datasets. Cloud ML excels at tasks requiring extensive processing power, distributed training of large models, and is ideal for applications where real-time responsiveness isn't critical. Popular platforms like AWS SageMaker, Google Cloud AI, and Azure ML offer flexible, scalable solutions for model development, training, and deployment. Cloud ML can handle models with billions of parameters, training on petabytes of data, but may incur latencies of 100-500ms for online inference due to network delays.
+**Cloud ML:** Initially, ML was predominantly cloud-based. Powerful, scalable servers in data centers are used to train and run large ML models. This approach leverages vast computational resources and storage capacities, enabling the development of complex models trained on massive datasets. Cloud ML excels at tasks requiring extensive processing power, distributed training of large models, and is ideal for applications where real-time responsiveness isn't critical. Popular platforms like AWS SageMaker, Google Cloud AI, and Azure ML offer flexible, scalable solutions for model development, training, and deployment. Cloud ML can handle models with billions of parameters, training on petabytes of data, but may incur latencies of 100-500 ms for online inference due to network delays.
 
 **Edge ML:** As the need for real-time, low-latency processing grew, Edge ML emerged. This paradigm brings inference capabilities closer to the data source, typically on edge devices such as industrial gateways, smart cameras, autonomous vehicles, or IoT hubs. Edge ML reduces latency (often to less than 50ms), enhances privacy by keeping data local, and can operate with intermittent cloud connectivity. It's particularly useful for applications requiring quick responses or handling sensitive data in industrial or enterprise settings. Frameworks like NVIDIA Jetson or Google's Edge TPU enable powerful ML capabilities on edge devices. Edge ML plays a crucial role in IoT ecosystems, enabling real-time decision making and reducing bandwidth usage by processing data locally.
 
-**Mobile ML:** Building on edge computing concepts, Mobile ML focuses on leveraging the computational capabilities of smartphones and tablets. This approach enables personalized, responsive applications while reducing reliance on constant network connectivity. Mobile ML offers a balance between the power of edge computing and the ubiquity of personal devices. It utilizes on-device sensors (e.g., cameras, GPS, accelerometers) for unique ML applications. Frameworks like TensorFlow Lite and Core ML allow developers to deploy optimized models on mobile devices, with inference times often under 30ms for common tasks. Mobile ML enhances privacy by keeping personal data on the device and can operate offline, but must balance model performance with device resource constraints (typically 4-8GB RAM, 100-200GB storage).
+**Mobile ML:** Building on edge computing concepts, Mobile ML focuses on leveraging the computational capabilities of smartphones and tablets. This approach enables personalized, responsive applications while reducing reliance on constant network connectivity. Mobile ML offers a balance between the power of edge computing and the ubiquity of personal devices. It utilizes on-device sensors (e.g., cameras, GPS, accelerometers) for unique ML applications. Frameworks like TensorFlow Lite and Core ML allow developers to deploy optimized models on mobile devices, with inference times often under 30ms for common tasks. Mobile ML enhances privacy by keeping personal data on the device and can operate offline, but must balance model performance with device resource constraints (typically 4-8 GB RAM, 100-200 GB storage).
 
-**Tiny ML:** The latest development in this progression is Tiny ML, which enables ML models to run on extremely resource-constrained microcontrollers and small embedded systems. Tiny ML allows for on-device inference without relying on connectivity to the cloud, edge, or even the processing power of mobile devices. This approach is crucial for applications where size, power consumption, and cost are critical factors. Tiny ML devices typically operate with less than 1MB of RAM and flash memory, consuming only milliwatts of power, enabling battery life of months or years. Applications include wake word detection, gesture recognition, and predictive maintenance in industrial settings. Platforms like Arduino Nano 33 BLE Sense and STM32 microcontrollers, coupled with frameworks like TensorFlow Lite for Microcontrollers, enable ML on these tiny devices. However, Tiny ML requires significant model optimization and quantization to fit within these constraints.
+**Tiny ML:** The latest development in this progression is Tiny ML, which enables ML models to run on extremely resource-constrained microcontrollers and small embedded systems. Tiny ML allows for on-device inference without relying on connectivity to the cloud, edge, or even the processing power of mobile devices. This approach is crucial for applications where size, power consumption, and cost are critical factors. Tiny ML devices typically operate with less than 1 MB of RAM and flash memory, consuming only milliwatts of power, enabling battery life of months or years. Applications include wake word detection, gesture recognition, and predictive maintenance in industrial settings. Platforms like Arduino Nano 33 BLE Sense and STM32 microcontrollers, coupled with frameworks like TensorFlow Lite for Microcontrollers, enable ML on these tiny devices. However, Tiny ML requires significant model optimization and quantization to fit within these constraints.
 
 Each of these paradigms has its own strengths and is suited to different use cases:
 
@@ -351,6 +351,8 @@ Tiny ML excels in low-power and resource-constrained settings. These environment
 
 Get ready to bring machine learning to the smallest of devices! In the embedded machine learning world, Tiny ML is where resource constraints meet ingenuity. This Colab notebook will walk you through building a gesture recognition model designed on an Arduino board. You'll learn how to train a small but effective neural network, optimize it for minimal memory usage, and deploy it to your microcontroller. If you're excited about making everyday objects smarter, this is where it begins!
 
+\vspace{1ex}
+
 [![](https://colab.research.google.com/assets/colab-badge.png)](https://colab.research.google.com/github/arduino/ArduinoTensorFlowLiteTutorials/blob/master/GestureToEmoji/arduino_TinyML_workshop.ipynb)
 :::
 
@@ -404,7 +406,7 @@ In summary, Tiny ML serves as a trailblazer in the evolution of machine learning
 
 ## Hybrid ML
 
-Systems architects rarely confine themselves to a single approach, instead combining various paradigms to create more nuanced solutions. These ``Hybrid ML'' approaches leverage the complementary strengths we've analyzed---from cloud's computational power to tiny's efficiency---while mitigating their individual limitations. Architects create new architectural patterns that balance competing demands for performance, privacy, and resource efficiency, opening up possibilities for more sophisticated ML applications that better meet complex real-world requirements.
+Systems architects rarely confine themselves to a single approach, instead combining various paradigms to create more nuanced solutions. These "Hybrid ML" approaches leverage the complementary strengths we've analyzed---from cloud's computational power to tiny's efficiency---while mitigating their individual limitations. Architects create new architectural patterns that balance competing demands for performance, privacy, and resource efficiency, opening up possibilities for more sophisticated ML applications that better meet complex real-world requirements.
 
 ### Design Patterns
 
@@ -420,7 +422,7 @@ One of the most common hybrid patterns is the train-serve split, where model tra
 
 Hierarchical processing creates a multi-tier system where data and intelligence flow between different levels of the ML stack. In industrial IoT applications, tiny sensors might perform basic anomaly detection, edge devices aggregate and analyze data from multiple sensors, and cloud systems handle complex analytics and model updates. For instance, we might see ESP32-CAM devices performing basic image classification at the sensor level with their minimal 520KB RAM, feeding data up to Jetson AGX Orin devices for more sophisticated computer vision tasks, and ultimately connecting to cloud infrastructure for complex analytics and model updates.
 
-This hierarchy allows each tier to handle tasks appropriate to its capabilities---Tiny ML devices handle immediate, simple decisions; edge devices manage local coordination; and cloud systems tackle complex analytics and learning tasks. Smart city installations often use this pattern, with street-level sensors feeding data to neighborhood-level edge processors, which in turn connect to city-wide cloud analytics.
+This hierarchy allows each tier to handle tasks appropriate to its capa&shy;bil&shy;ities---Tiny ML devices handle immediate, simple decisions; edge devices manage local coordination; and cloud systems tackle complex analytics and learning tasks. Smart city installations often use this pattern, with street-level sensors feeding data to neighborhood-level edge processors, which in turn connect to city-wide cloud analytics.
 
 #### Progressive Deployment