diff --git a/Images_4_1/cv_obj_detect.png b/Images_4_1/cv_obj_detect.png new file mode 100644 index 00000000..84e6394a Binary files /dev/null and b/Images_4_1/cv_obj_detect.png differ diff --git a/Images_4_1/data_folder.png b/Images_4_1/data_folder.png new file mode 100644 index 00000000..102667cf Binary files /dev/null and b/Images_4_1/data_folder.png differ diff --git a/Images_4_1/img_1.png b/Images_4_1/img_1.png new file mode 100644 index 00000000..3cf9f13e Binary files /dev/null and b/Images_4_1/img_1.png differ diff --git a/Images_4_1/img_10.png b/Images_4_1/img_10.png new file mode 100644 index 00000000..d3aae23e Binary files /dev/null and b/Images_4_1/img_10.png differ diff --git a/Images_4_1/img_11.png b/Images_4_1/img_11.png new file mode 100644 index 00000000..63315dfb Binary files /dev/null and b/Images_4_1/img_11.png differ diff --git a/Images_4_1/img_12.png b/Images_4_1/img_12.png new file mode 100644 index 00000000..ac4550c2 Binary files /dev/null and b/Images_4_1/img_12.png differ diff --git a/Images_4_1/img_13.png b/Images_4_1/img_13.png new file mode 100644 index 00000000..95781b6d Binary files /dev/null and b/Images_4_1/img_13.png differ diff --git a/Images_4_1/img_14.png b/Images_4_1/img_14.png new file mode 100644 index 00000000..be87da2c Binary files /dev/null and b/Images_4_1/img_14.png differ diff --git a/Images_4_1/img_15.png b/Images_4_1/img_15.png new file mode 100644 index 00000000..6b20b7f2 Binary files /dev/null and b/Images_4_1/img_15.png differ diff --git a/Images_4_1/img_16.png b/Images_4_1/img_16.png new file mode 100644 index 00000000..88e3ceb9 Binary files /dev/null and b/Images_4_1/img_16.png differ diff --git a/Images_4_1/img_17.png b/Images_4_1/img_17.png new file mode 100644 index 00000000..5c1b7669 Binary files /dev/null and b/Images_4_1/img_17.png differ diff --git a/Images_4_1/img_18.png b/Images_4_1/img_18.png new file mode 100644 index 00000000..b82d860a Binary files /dev/null and b/Images_4_1/img_18.png differ diff --git a/Images_4_1/img_19.png b/Images_4_1/img_19.png new file mode 100644 index 00000000..af210f25 Binary files /dev/null and b/Images_4_1/img_19.png differ diff --git a/Images_4_1/img_2.png b/Images_4_1/img_2.png new file mode 100644 index 00000000..c00e93d2 Binary files /dev/null and b/Images_4_1/img_2.png differ diff --git a/Images_4_1/img_20.png b/Images_4_1/img_20.png new file mode 100644 index 00000000..6880f101 Binary files /dev/null and b/Images_4_1/img_20.png differ diff --git a/Images_4_1/img_21.png b/Images_4_1/img_21.png new file mode 100644 index 00000000..ef3e4af4 Binary files /dev/null and b/Images_4_1/img_21.png differ diff --git a/Images_4_1/img_22.png b/Images_4_1/img_22.png new file mode 100644 index 00000000..b49d9abb Binary files /dev/null and b/Images_4_1/img_22.png differ diff --git a/Images_4_1/img_23.png b/Images_4_1/img_23.png new file mode 100644 index 00000000..ee070d80 Binary files /dev/null and b/Images_4_1/img_23.png differ diff --git a/Images_4_1/img_24.png b/Images_4_1/img_24.png new file mode 100644 index 00000000..5057db8d Binary files /dev/null and b/Images_4_1/img_24.png differ diff --git a/Images_4_1/img_25.png b/Images_4_1/img_25.png new file mode 100644 index 00000000..e3ad0add Binary files /dev/null and b/Images_4_1/img_25.png differ diff --git a/Images_4_1/img_26.png b/Images_4_1/img_26.png new file mode 100644 index 00000000..9802e642 Binary files /dev/null and b/Images_4_1/img_26.png differ diff --git a/Images_4_1/img_27.png b/Images_4_1/img_27.png new file mode 100644 index 00000000..b0231a8c Binary files /dev/null and b/Images_4_1/img_27.png differ diff --git a/Images_4_1/img_28.png b/Images_4_1/img_28.png new file mode 100644 index 00000000..fbeb1f5e Binary files /dev/null and b/Images_4_1/img_28.png differ diff --git a/Images_4_1/img_3.png b/Images_4_1/img_3.png new file mode 100644 index 00000000..0d854e0e Binary files /dev/null and b/Images_4_1/img_3.png differ diff --git a/Images_4_1/img_4.png b/Images_4_1/img_4.png new file mode 100644 index 00000000..4654de3f Binary files /dev/null and b/Images_4_1/img_4.png differ diff --git a/Images_4_1/img_5.png b/Images_4_1/img_5.png new file mode 100644 index 00000000..c88636e5 Binary files /dev/null and b/Images_4_1/img_5.png differ diff --git a/Images_4_1/img_6.png b/Images_4_1/img_6.png new file mode 100644 index 00000000..6771c762 Binary files /dev/null and b/Images_4_1/img_6.png differ diff --git a/Images_4_1/img_7.png b/Images_4_1/img_7.png new file mode 100644 index 00000000..fac11fd1 Binary files /dev/null and b/Images_4_1/img_7.png differ diff --git a/Images_4_1/img_8.png b/Images_4_1/img_8.png new file mode 100644 index 00000000..08efe60e Binary files /dev/null and b/Images_4_1/img_8.png differ diff --git a/Images_4_1/img_9.png b/Images_4_1/img_9.png new file mode 100644 index 00000000..68aedc11 Binary files /dev/null and b/Images_4_1/img_9.png differ diff --git a/Images_4_1/proj_goal.png b/Images_4_1/proj_goal.png new file mode 100644 index 00000000..dbfc0393 Binary files /dev/null and b/Images_4_1/proj_goal.png differ diff --git a/Images_4_1/samples.png b/Images_4_1/samples.png new file mode 100644 index 00000000..aff2036c Binary files /dev/null and b/Images_4_1/samples.png differ diff --git a/_quarto.yml b/_quarto.yml index 4b23c5ed..504aef14 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -78,6 +78,7 @@ book: chapters: - embedded_sys_exercise.qmd - embedded_ml_exercise.qmd + - kws_feature_eng.qmd references: references.qmd diff --git a/embedded_ml_exercise.qmd b/embedded_ml_exercise.qmd index dec56a5b..44f50242 100644 --- a/embedded_ml_exercise.qmd +++ b/embedded_ml_exercise.qmd @@ -1,300 +1,164 @@ # CV on Nicla Vision {.unnumbered} -As we initiate our studies into embedded machine learning or tinyML, -it\'s impossible to overlook the transformative impact of Computer -Vision (CV) and Artificial Intelligence (AI) in our lives. These two -intertwined disciplines redefine what machines can perceive and -accomplish, from autonomous vehicles and robotics to healthcare and -surveillance. - -More and more, we are facing an artificial intelligence (AI) revolution -where, as stated by Gartner, **Edge AI** has a very high impact -potential, and **it is for now**! - -![](images_4/media/image2.jpg){width="4.729166666666667in" -height="4.895833333333333in"} - -In the \"bull-eye\" of emerging technologies, radar is the *Edge -Computer Vision*, and when we talk about Machine Learning (ML) applied -to vision, the first thing that comes to mind is **Image -Classification**, a kind of ML \"Hello World\"! - -This exercise will explore a computer vision project utilizing -Convolutional Neural Networks (CNNs) for real-time image classification. -Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a -pre-trained MobileNet model and adapt it for edge deployment. The focus -will be optimizing the model to run efficiently on resource-constrained -hardware without sacrificing accuracy. - -We\'ll employ techniques like quantization and pruning to reduce the -computational load. By the end of this tutorial, you\'ll have a working -prototype capable of classifying images in real time, all running on a -low-power embedded system based on the Arduino Nicla Vision board. +## Introduction + +As we initiate our studies into embedded machine learning or tinyML, it's impossible to overlook the transformative impact of Computer Vision (CV) and Artificial Intelligence (AI) in our lives. These two intertwined disciplines redefine what machines can perceive and accomplish, from autonomous vehicles and robotics to healthcare and surveillance. + +More and more, we are facing an artificial intelligence (AI) revolution where, as stated by Gartner, **Edge AI** has a very high impact potential, and **it is for now**! + +![](images_4/media/image2.jpg){fig-align="center" width="4.729166666666667in"} + +In the "bullseye" of the Radar is the *Edge Computer Vision*, and when we talk about Machine Learning (ML) applied to vision, the first thing that comes to mind is **Image Classification**, a kind of ML "Hello World"! + +This exercise will explore a computer vision project utilizing Convolutional Neural Networks (CNNs) for real-time image classification. Leveraging TensorFlow's robust ecosystem, we'll implement a pre-trained MobileNet model and adapt it for edge deployment. The focus will be on optimizing the model to run efficiently on resource-constrained hardware without sacrificing accuracy. + +We'll employ techniques like quantization and pruning to reduce the computational load. By the end of this tutorial, you'll have a working prototype capable of classifying images in real-time, all running on a low-power embedded system based on the Arduino Nicla Vision board. ## Computer Vision -At its core, computer vision aims to enable machines to interpret and -make decisions based on visual data from the world---essentially -mimicking the capability of the human optical system. Conversely, AI is -a broader field encompassing machine learning, natural language -processing, and robotics, among other technologies. When you bring AI -algorithms into computer vision projects, you supercharge the system\'s -ability to understand, interpret, and react to visual stimuli. +At its core, computer vision aims to enable machines to interpret and make decisions based on visual data from the world, essentially mimicking the capability of the human optical system. Conversely, AI is a broader field encompassing machine learning, natural language processing, and robotics, among other technologies. When you bring AI algorithms into computer vision projects, you supercharge the system's ability to understand, interpret, and react to visual stimuli. -When discussing Computer Vision projects applied to embedded devices, -the most common applications that come to mind are *Image -Classification* and *Object Detection*. +When discussing Computer Vision projects applied to embedded devices, the most common applications that come to mind are *Image Classification* and *Object Detection*. -![](images_4/media/image15.jpg){width="6.5in" -height="2.8333333333333335in"} +![](images_4/media/image15.jpg){fig-align="center" width="6.5in"} -Both models can be implemented on tiny devices like the Arduino Nicla -Vision and used on real projects. Let\'s start with the first one. +Both models can be implemented on tiny devices like the Arduino Nicla Vision and used on real projects. In this chapter, we will cover Image Classification. -## Image Classification Project +## Image Classification Project Goal -The first step in any ML project is to define our goal. In this case, it -is to detect and classify two specific objects present in one image. For -this project, we will use two small toys: a *robot* and a small -Brazilian parrot (named *Periquito*). Also, we will collect images of a -*background* where those two objects are absent. +The first step in any ML project is to define the goal. In this case, it is to detect and classify two specific objects present in one image. For this project, we will use two small toys: a *robot* and a small Brazilian parrot (named *Periquito*). Also, we will collect images of a *background* where those two objects are absent. -![](images_4/media/image36.jpg){width="6.5in" -height="3.638888888888889in"} +![](images_4/media/image36.jpg){fig-align="center" width="6.5in"} ## Data Collection -Once you have defined your Machine Learning project goal, the next and -most crucial step is the dataset collection. You can use the Edge -Impulse Studio, the OpenMV IDE we installed, or even your phone for the -image capture. Here, we will use the OpenMV IDE for that. +Once you have defined your Machine Learning project goal, the next and most crucial step is the dataset collection. You can use the Edge Impulse Studio, the OpenMV IDE we installed, or even your phone for the image capture. Here, we will use the OpenMV IDE for that. -**Collecting Dataset with OpenMV IDE** +### Collecting Dataset with OpenMV IDE -First, create in your computer a folder where your data will be saved, -for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset -Editor and select New Dataset to start the dataset collection: +First, create in your computer a folder where your data will be saved, for example, "data." Next, on the OpenMV IDE, go to `Tools > Dataset Editor` and select `New Dataset` to start the dataset collection: -![](images_4/media/image29.png){width="6.291666666666667in" -height="4.010416666666667in"} +![](images_4/media/image29.png){fig-align="center" width="6.291666666666667in"} -The IDE will ask you to open the file where your data will be saved and -choose the \"data\" folder that was created. Note that new icons will -appear on the Left panel. +The IDE will ask you to open the file where your data will be saved and choose the "data" folder that was created. Note that new icons will appear on the Left panel. -![](images_4/media/image46.png){width="0.9583333333333334in" -height="1.5208333333333333in"} +![](images_4/media/image46.png){fig-align="center" width="0.9583333333333334in"} -Using the upper icon (1), enter with the first class name, for example, -\"periquito\": +Using the upper icon (1), enter with the first class name, for example, "periquito": -![](images_4/media/image22.png){width="3.25in" -height="2.65625in"} +![](images_4/media/image22.png){fig-align="center" width="3.25in"} -Run the dataset_capture_script.py, and clicking on the bottom icon (2), -will start capturing images: +Running the `dataset_capture_script.py` and clicking on the camera icon (2), will start capturing images: -![](images_4/media/image43.png){width="6.5in" -height="4.041666666666667in"} +![](images_4/media/image43.png){fig-align="center" width="6.5in"} Repeat the same procedure with the other classes -![](images_4/media/image6.jpg){width="6.5in" -height="3.0972222222222223in"} +![](images_4/media/image6.jpg){fig-align="center" width="6.5in"} -> *We suggest around 60 images from each category. Try to capture -> different angles, backgrounds, and light conditions.* +> We suggest around 60 images from each category. Try to capture different angles, backgrounds, and light conditions. -The stored images use a QVGA frame size 320x240 and RGB565 (color pixel -format). +The stored images use a QVGA frame size of 320x240 and the RGB565 (color pixel format). -After capturing your dataset, close the Dataset Editor Tool on the Tools -\> Dataset Editor. +After capturing your dataset, close the Dataset Editor Tool on the `Tools > Dataset Editor`. -On your computer, you will end with a dataset that contains three -classes: periquito, robot, and background. +On your computer, you will end with a dataset that contains three classes: *periquito,* *robot*, and *background*. -![](images_4/media/image20.png){width="6.5in" -height="2.2083333333333335in"} +![](images_4/media/image20.png){fig-align="center" width="6.5in"} -You should return to Edge Impulse Studio and upload the dataset to your -project. +You should return to *Edge Impulse Studio* and upload the dataset to your project. ## Training the model with Edge Impulse Studio -We will use the Edge Impulse Studio for training our model. Enter your -account credentials at Edge Impulse and create a new project: +We will use the Edge Impulse Studio for training our model. Enter your account credentials and create a new project: -![](images_4/media/image45.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image45.png){fig-align="center" width="6.5in"} -> *Here, you can clone a similar project:* -> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).* +> Here, you can clone a similar project: [NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest). ## Dataset -Using the EI Studio (or *Studio*), we will pass over four main steps to -have our model ready for use on the Nicla Vision board: Dataset, -Impulse, Tests, and Deploy (on the Edge Device, in this case, the -NiclaV). +Using the EI Studio (or *Studio*), we will go over four main steps to have our model ready for use on the Nicla Vision board: Dataset, Impulse, Tests, and Deploy (on the Edge Device, in this case, the NiclaV). -![](images_4/media/image41.jpg){width="6.5in" -height="4.194444444444445in"} +![](images_4/media/image41.jpg){fig-align="center" width="6.5in"} -Regarding the Dataset, it is essential to point out that our Original -Dataset, captured with the OpenMV IDE, will be split into three parts: -Training, Validation, and Test. The Test Set will be divided from the -beginning and left a part to be used only in the Test phase after -training. The Validation Set will be used during training. +Regarding the Dataset, it is essential to point out that our Original Dataset, captured with the OpenMV IDE, will be split into *Training*, *Validation*, and *Test*. The Test Set will be divided from the beginning, and a part will reserved to be used only in the Test phase after training. The Validation Set will be used during training. -![](images_4/media/image7.jpg){width="6.5in" -height="4.763888888888889in"} +![](images_4/media/image7.jpg){fig-align="center" width="6.5in"} -On Studio, go to the Data acquisition tab, and on the UPLOAD DATA -section, upload from your computer the files from chosen categories: +On Studio, go to the Data acquisition tab, and on the UPLOAD DATA section, upload the chosen categories files from your computer: -![](images_4/media/image39.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image39.png){fig-align="center" width="6.5in"} -Left to the Studio to automatically split the original dataset into -training and test and choose the label related to that specific data: +Leave to the Studio the splitting of the original dataset into *train and test* and choose the label about that specific data: -![](images_4/media/image30.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image30.png){fig-align="center" width="6.5in"} -Repeat the procedure for all three classes. At the end, you should see -your \"raw data in the Studio: +Repeat the procedure for all three classes. At the end, you should see your "raw data" in the Studio: -![](images_4/media/image11.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image11.png){fig-align="center" width="6.5in"} -The Studio allows you to explore your data, showing a complete view of -all the data in your project. You can clear, inspect, or change labels -by clicking on individual data items. In our case, a simple project, the -data seems OK. +The Studio allows you to explore your data, showing a complete view of all the data in your project. You can clear, inspect, or change labels by clicking on individual data items. In our case, a very simple project, the data seems OK. -![](images_4/media/image44.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image44.png){fig-align="center" width="6.5in"} ## The Impulse Design In this phase, we should define how to: -- Pre-process our data, which consists of resizing the individual - > images and determining the color depth to use (RGB or Grayscale) - > and +- Pre-process our data, which consists of resizing the individual images and determining the `color depth` to use (be it RGB or Grayscale) and -- Design a Model that will be \"Transfer Learning (Images)\" to - > fine-tune a pre-trained MobileNet V2 image classification model on - > our data. This method performs well even with relatively small - > image datasets (around 150 images in our case). +- Specify a Model, in this case, it will be the `Transfer Learning (Images)` to fine-tune a pre-trained MobileNet V2 image classification model on our data. This method performs well even with relatively small image datasets (around 150 images in our case). -![](images_4/media/image23.jpg){width="6.5in" -height="4.0in"} +![](images_4/media/image23.jpg){fig-align="center" width="6.5in"} -Transfer Learning with MobileNet offers a streamlined approach to model -training, which is especially beneficial for resource-constrained -environments and projects with limited labeled data. MobileNet, known -for its lightweight architecture, is a pre-trained model that has -already learned valuable features from a large dataset (ImageNet). +Transfer Learning with MobileNet offers a streamlined approach to model training, which is especially beneficial for resource-constrained environments and projects with limited labeled data. MobileNet, known for its lightweight architecture, is a pre-trained model that has already learned valuable features from a large dataset (ImageNet). -![](images_4/media/image9.jpg){width="6.5in" -height="1.9305555555555556in"} +![](images_4/media/image9.jpg){fig-align="center" width="6.5in"} -By leveraging these learned features, you can train a new model for your -specific task with fewer data and computational resources yet achieve -competitive accuracy. +By leveraging these learned features, you can train a new model for your specific task with fewer data and computational resources and yet achieve competitive accuracy. -![](images_4/media/image32.jpg){width="6.5in" -height="2.3055555555555554in"} +![](images_4/media/image32.jpg){fig-align="center" width="6.5in"} -This approach significantly reduces training time and computational -cost, making it ideal for quick prototyping and deployment on embedded -devices where efficiency is paramount. +This approach significantly reduces training time and computational cost, making it ideal for quick prototyping and deployment on embedded devices where efficiency is paramount. -Go to the Impulse Design Tab and create the *impulse*, defining an image -size of 96x96 and squashing them (squared form, without crop). Select -Image and Transfer Learning blocks. Save the Impulse. +Go to the Impulse Design Tab and create the *impulse*, defining an image size of 96x96 and squashing them (squared form, without cropping). Select Image and Transfer Learning blocks. Save the Impulse. -![](images_4/media/image16.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image16.png){fig-align="center" width="6.5in"} -### **Image Pre-Processing** +### Image Pre-Processing -All input QVGA/RGB565 images will be converted to 27,640 features -(96x96x3). +All the input QVGA/RGB565 images will be converted to 27,640 features (96x96x3). -![](images_4/media/image17.png){width="6.5in" -height="4.319444444444445in"} +![](images_4/media/image17.png){fig-align="center" width="6.5in"} Press \[Save parameters\] and Generate all features: -![](images_4/media/image5.png){width="6.5in" -height="4.263888888888889in"} - -## Model Design - -In 2007, Google introduced -[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), -a family of general-purpose computer vision neural networks designed -with mobile devices in mind to support classification, detection, and -more. MobileNets are small, low-latency, low-power models parameterized -to meet the resource constraints of various use cases. in 2018, Google -launched [MobileNetV2: Inverted Residuals and Linear -Bottlenecks](https://arxiv.org/abs/1801.04381). - -MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded -vision applications but differ in architectural complexity and -performance. While both use depthwise separable convolutions to reduce -the computational cost, MobileNet V2 introduces Inverted Residual Blocks -and Linear Bottlenecks to enhance performance. These new features allow -V2 to capture more complex features using fewer parameters, making it -computationally more efficient and generally more accurate than its -predecessor. Additionally, V2 employs a non-linear activation in the -intermediate expansion layer. Still, it uses a linear activation for the -bottleneck layer, a design choice found to preserve important -information through the network better. MobileNet V2 offers a more -optimized architecture for higher accuracy and efficiency and will be -used in this project. - -Although the base MobileNet architecture is already tiny and has low -latency, many times, a specific use case or application may require the -model to be smaller and faster. MobileNets introduces a straightforward -parameter α (alpha) called width multiplier to construct these smaller, -less computationally expensive models. The role of the width multiplier -α is to thin a network uniformly at each layer. - -Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2 -(96x96 and 160x160 images), with several different **α** values (from -0.05 to 1.0). For example, you will get the highest accuracy with V2, -160x160 images, and α=1.0. Of course, there is a trade-off. The higher -the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be -needed to run the model, implying more latency. The smaller footprint -will be obtained at another extreme with MobileNetV1 and α=0.10 (around -53.2K RAM and 101K ROM). - -![](images_4/media/image27.jpg){width="6.5in" -height="3.5277777777777777in"} - -For this project, we will use **MobileNetV2 96x96 0.1**, which estimates -a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla -Vision with 1MB of SRAM. On the Transfer Learning Tab, select this -model: - -![](images_4/media/image24.png){width="6.5in" -height="4.263888888888889in"} - -Another necessary technique to be used with Deep Learning is **Data -Augmentation**. Data augmentation is a method that can help improve the -accuracy of machine learning models, creating additional artificial -data. A data augmentation system makes small, random changes to your -training data during the training process (such as flipping, cropping, -or rotating the images). - -Under the rood, here you can see how Edge Impulse implements a data -Augmentation policy on your data: - -```python +![](images_4/media/image5.png){fig-align="center" width="6.5in"} + +### Model Design + +In 2007, Google introduced [[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), a family of general-purpose computer vision neural networks designed with mobile devices in mind to support classification, detection, and more. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of various use cases. in 2018, Google launched [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). + +MobileNet V1 and MobileNet V2 aim at mobile efficiency and embedded vision applications but differ in architectural complexity and performance. While both use depthwise separable convolutions to reduce the computational cost, MobileNet V2 introduces Inverted Residual Blocks and Linear Bottlenecks to enhance performance. These new features allow V2 to capture more complex features using fewer parameters, making it computationally more efficient and generally more accurate than its predecessor. Additionally, V2 employs a non-linear activation in the intermediate expansion layer. It still uses a linear activation for the bottleneck layer, a design choice found to preserve important information through the network. MobileNet V2 offers an optimized architecture for higher accuracy and efficiency and will be used in this project. + +Although the base MobileNet architecture is already tiny and has low latency, many times, a specific use case or application may require the model to be even smaller and faster. MobileNets introduces a straightforward parameter α (alpha) called width multiplier to construct these smaller, less computationally expensive models. The role of the width multiplier α is that of thinning a network uniformly at each layer. + +Edge Impulse Studio can use both MobileNetV1 (96x96 images) and V2 (96x96 or 160x160 images), with several different **α** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, 160x160 images, and α=1.0. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3MB RAM and 2.6MB ROM) will be needed to run the model, implying more latency. The smaller footprint will be obtained at the other extreme with MobileNetV1 and α=0.10 (around 53.2K RAM and 101K ROM). + +![](images_4/media/image27.jpg){fig-align="center" width="6.5in"} + +We will use **MobileNetV2 96x96 0.1** for this project, with an estimated memory cost of 265.3 KB in RAM. This model should be OK for the Nicla Vision with 1MB of SRAM. On the Transfer Learning Tab, select this model: + +![](images_4/media/image24.png){fig-align="center" width="6.5in"} + +## Model Training + +Another valuable technique to be used with Deep Learning is **Data Augmentation**. Data augmentation is a method to improve the accuracy of machine learning models by creating additional artificial data. A data augmentation system makes small, random changes to your training data during the training process (such as flipping, cropping, or rotating the images). + +Looking under the hood, here you can see how Edge Impulse implements a data Augmentation policy on your data: + +``` python # Implements the data augmentation policy def augment_image(image, label): # Flips the image randomly @@ -312,140 +176,99 @@ def augment_image(image, label): image = tf.image.random_brightness(image, max_delta=0.2) return image, label - ``` -Exposure to these variations during training can help prevent your model -from taking shortcuts by \"memorizing\" superficial clues in your -training data, meaning it may better reflect the deep underlying -patterns in your dataset. -The final layer of our model will have 12 neurons with a 15% dropout for -overfitting prevention. Here is the Training result: +Exposure to these variations during training can help prevent your model from taking shortcuts by "memorizing" superficial clues in your training data, meaning it may better reflect the deep underlying patterns in your dataset. + +The final layer of our model will have 12 neurons with a 15% dropout for overfitting prevention. Here is the Training result: -![](images_4/media/image31.jpg){width="6.5in" -height="3.5in"} +![](images_4/media/image31.jpg){fig-align="center" width="6.5in"} -The result is excellent, with 77ms of latency, which should result in -13fps (frames per second) during inference. +The result is excellent, with 77ms of latency, which should result in 13fps (frames per second) during inference. ## Model Testing -![](images_4/media/image10.jpg){width="6.5in" -height="3.8472222222222223in"} +![](images_4/media/image10.jpg){fig-align="center" width="6.5in"} -Now, you should take the data put apart at the start of the project and -run the trained model having them as input: +Now, you should take the data set aside at the start of the project and run the trained model using it as input: -![](images_4/media/image34.png){width="3.1041666666666665in" -height="1.7083333333333333in"} +![](images_4/media/image34.png){fig-align="center" width="3.1041666666666665in"} -The result was, again, excellent. +The result is, again, excellent. -![](images_4/media/image12.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image12.png){fig-align="center" width="6.5in"} ## Deploying the model -At this point, we can deploy the trained model as.tflite and use the -OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ -or an Arduino library. +At this point, we can deploy the trained model as.tflite and use the OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ or an Arduino library. -![](images_4/media/image28.jpg){width="6.5in" -height="3.763888888888889in"} +![](images_4/media/image28.jpg){fig-align="center" width="6.5in"} -**Arduino Library** +### Arduino Library -First, Let\'s deploy it as an Arduino Library: +First, Let's deploy it as an Arduino Library: -![](images_4/media/image48.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image48.png){fig-align="center" width="6.5in"} -You should install the library as.zip on the Arduino IDE and run the -sketch nicla_vision_camera.ino available in Examples under your library -name. +You should install the library as.zip on the Arduino IDE and run the sketch *nicla_vision_camera.ino* available in Examples under your library name. -> *Note that Arduino Nicla Vision has, by default, 512KB of RAM -> allocated for the M7 core and an additional 244KB on the M4 address -> space. In the code, this allocation was changed to 288 kB to guarantee -> that the model will run on the device -> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).* +> Note that Arduino Nicla Vision has, by default, 512KB of RAM allocated for the M7 core and an additional 244KB on the M4 address space. In the code, this allocation was changed to 288 kB to guarantee that the model will run on the device (`malloc_addblock((void*)0x30000000, 288 * 1024);`). -The result was good, with 86ms of measured latency. +The result is good, with 86ms of measured latency. -![](images_4/media/image25.jpg){width="6.5in" -height="3.4444444444444446in"} +![](images_4/media/image25.jpg){fig-align="center" width="6.5in"} -Here is a short video showing the inference results: -[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o) +Here is a short video showing the inference results: {{< video https://youtu.be/bZPZZJblU-o width="480" height="270" center >}} -**OpenMV** +### OpenMV -It is possible to deploy the trained model to be used with OpenMV in two -ways: as a library and as a firmware. +It is possible to deploy the trained model to be used with OpenMV in two ways: as a library and as a firmware. -Three files are generated as a library: the.tflite model, a list with -the labels, and a simple MicroPython script that can make inferences -using the model. +Three files are generated as a library: the trained.tflite model, a list with labels, and a simple MicroPython script that can make inferences using the model. -![](images_4/media/image26.png){width="6.5in" -height="1.0in"} +![](images_4/media/image26.png){fig-align="center" width="6.5in"} -Running this model as a.tflite directly in the Nicla was impossible. So, -we can sacrifice the accuracy using a smaller model or deploy the model -as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates -optimized models, libraries, and frameworks needed to make the -inference. Let\'s explore this last one. +Running this model as a *.tflite* directly in the Nicla was impossible. So, we can sacrifice the accuracy using a smaller model or deploy the model as an OpenMV Firmware (FW). Choosing FW, the Edge Impulse Studio generates optimized models, libraries, and frameworks needed to make the inference. Let's explore this option. -Select OpenMV Firmware on the Deploy Tab and press \[Build\]. +Select `OpenMV Firmware` on the `Deploy Tab` and press `[Build]`. -![](images_4/media/image3.png){width="6.5in" -height="4.263888888888889in"} +![](images_4/media/image3.png){fig-align="center" width="6.5in"} On your computer, you will find a ZIP file. Open it: -![](images_4/media/image33.png){width="6.5in" height="2.625in"} +![](images_4/media/image33.png){fig-align="center" width="6.5in"} Use the Bootloader tool on the OpenMV IDE to load the FW on your board: -![](images_4/media/image35.jpg){width="6.5in" height="3.625in"} +![](images_4/media/image35.jpg){fig-align="center" width="6.5in"} Select the appropriate file (.bin for Nicla-Vision): -![](images_4/media/image8.png){width="6.5in" height="1.9722222222222223in"} +![](images_4/media/image8.png){fig-align="center" width="6.5in"} After the download is finished, press OK: -![DFU firmware update complete!.png](images_4/media/image40.png){width="3.875in" height="5.708333333333333in"} +![](images_4/media/image40.png){fig-align="center" width="3.875in"} -If a message says that the FW is outdated, DO NOT UPGRADE. Select -\[NO\]. +If a message says that the FW is outdated, DO NOT UPGRADE. Select \[NO\]. -![](images_4/media/image42.png){width="4.572916666666667in" -height="2.875in"} +![](images_4/media/image42.png){fig-align="center" width="4.572916666666667in"} -Now, open the script **ei_image_classification.py** that was downloaded -from the Studio and the.bin file for the Nicla. +Now, open the script **ei_image_classification.py** that was downloaded from the Studio and the.bin file for the Nicla. -![](images_4/media/image14.png){width="6.5in" -height="4.0in"} +![](images_4/media/image14.png){fig-align="center" width="6.5in"} -And run it. Pointing the camera to the objects we want to classify, the -inference result will be displayed on the Serial Terminal. +Run it. Pointing the camera to the objects we want to classify, the inference result will be displayed on the Serial Terminal. -![](images_4/media/image37.png){width="6.5in" -height="3.736111111111111in"} +![](images_4/media/image37.png){fig-align="center" width="6.5in"} -**Changing Code to add labels:** +#### Changing the Code to add labels -The code provided by Edge Impulse can be modified so that we can see, -for test reasons, the inference result directly on the image displayed -on the OpenMV IDE. +The code provided by Edge Impulse can be modified so that we can see, for test reasons, the inference result directly on the image displayed on the OpenMV IDE. -[[Upload the code from -GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) -or modify it as below: +[[Upload the code from GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) or modify it as below: -```python +``` python # Marcelo Rovai - NICLA Vision - Image Classification # Adapted from Edge Impulse - OpenMV Image Classification Example # @24Aug23 @@ -510,45 +333,31 @@ while(True): mono_space = False, scale=2 ) - ``` Here you can see the result: -![](images_4/media/image47.jpg){width="6.5in" -height="2.9444444444444446in"} +![](images_4/media/image47.jpg){fig-align="center" width="6.5in"} -Note that the latency (136 ms) is almost double what we got directly -with the Arduino IDE. This is because we are using the IDE as an -interface and the time to wait for the camera to be ready. If we start -the clock just before the inference: +Note that the latency (136 ms) is almost double of what we got directly with the Arduino IDE. This is because we are using the IDE as an interface and also the time to wait for the camera to be ready. If we start the clock just before the inference: -![](images_4/media/image13.jpg){width="6.5in" -height="2.0972222222222223in"} +![](images_4/media/image13.jpg){fig-align="center" width="6.5in"} The latency will drop to only 71 ms. -![](images_4/media/image1.jpg){width="3.5520833333333335in" -height="1.53125in"} +![](images_4/media/image1.jpg){fig-align="center" width="3.5520833333333335in"} -> *The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected.* +> The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected. -### **Post-Processing with LEDs** +#### Post-Processing with LEDs -When working with embedded machine learning, we are looking for devices -that can continually proceed with the inference and result, taking some -action directly on the physical world and not displaying the result on a -connected computer. To simulate this, we will define one LED to light up -for each one of the possible inference results. +When working with embedded machine learning, we are looking for devices that can continually proceed with the inference and result, taking some action directly on the physical world and not displaying the result on a connected computer. To simulate this, we will light up a different LED for each possible inference result. -![](images_4/media/image38.jpg){width="6.5in" -height="3.236111111111111in"} +![](images_4/media/image38.jpg){fig-align="center" width="6.5in"} -For that, we should [[upload the code from -GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) -or change the last code to include the LEDs: +To accomplish that, we should [[upload the code from GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) or change the last code to include the LEDs: -```python +``` python # Marcelo Rovai - NICLA Vision - Image Classification with LEDs # Adapted from Edge Impulse - OpenMV Image Classification Example # @24Aug23 @@ -648,13 +457,11 @@ while(True): mono_space = False, scale=2 ) - ``` -Now, each time that a class gets a result superior of 0.8, the -correspondent LED will be light on as below: +Now, each time that a class scores a result greater than 0.8, the correspondent LED will be lit: -- Led Red 0n: Uncertain (no one class is over 0.8) +- Led Red 0n: Uncertain (no class is over 0.8) - Led Green 0n: Periquito \> 0.8 @@ -664,64 +471,42 @@ correspondent LED will be light on as below: Here is the result: -![](images_4/media/image18.jpg){width="6.5in" -height="3.6527777777777777in"} +![](images_4/media/image18.jpg){fig-align="center" width="6.5in"} In more detail -![](images_4/media/image21.jpg){width="6.5in" -height="2.0972222222222223in"} +![](images_4/media/image21.jpg){fig-align="center" width="6.5in"} -### **Image Classification (non-official) Benchmark** +## Image Classification (non-official) Benchmark -Several development boards can be used for embedded machine learning -(tinyML), and the most common ones for Computer Vision applications -(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the -Arduinos Nicla Vison, and Portenta. +Several development boards can be used for embedded machine learning (tinyML), and the most common ones for Computer Vision applications (consuming low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the Arduino Nicla Vison, and the Arduino Portenta. -![](images_4/media/image19.jpg){width="6.5in" -height="4.194444444444445in"} +![](images_4/media/image19.jpg){fig-align="center" width="6.5in"} -Using the opportunity, the same trained model was deployed on the -ESP-CAM, the XIAO, and Portenta (in this one, the model was trained -again, using grayscaled images to be compatible with its camera. Here is -the result, deploying the models as Arduino\'s Library: +Catching the opportunity, the same trained model was deployed on the ESP-CAM, the XIAO, and the Portenta (in this one, the model was trained again, using grayscaled images to be compatible with its camera). Here is the result, deploying the models as Arduino's Library: -![](images_4/media/image4.jpg){width="6.5in" -height="3.4444444444444446in"} +![](images_4/media/image4.jpg){fig-align="center" width="6.5in"} ## Conclusion -Before we finish, consider that Computer Vision is more than just image -classification. For example, you can develop Edge Machine Learning -projects around vision in several areas, such as: +Before we finish, consider that Computer Vision is more than just image classification. For example, you can develop Edge Machine Learning projects around vision in several areas, such as: -- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer - > vision algorithms to navigate and make decisions. +- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer vision algorithms to navigate and make decisions. -- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, - > and CT scan image analysis +- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, and CT scan image analysis -- **Retail**: Automated checkout systems that identify products as - > they pass through a scanner. +- **Retail**: Automated checkout systems that identify products as they pass through a scanner. -- **Security and Surveillance**: Facial recognition, anomaly - > detection, and object tracking in real-time video feeds. +- **Security and Surveillance**: Facial recognition, anomaly detection, and object tracking in real-time video feeds. -- **Augmented Reality**: Object detection and classification to - > overlay digital information in the real world. +- **Augmented Reality**: Object detection and classification to overlay digital information in the real world. -- **Industrial Automation**: Visual inspection of products, predictive - > maintenance, and robot and drone guidance. +- **Industrial Automation**: Visual inspection of products, predictive maintenance, and robot and drone guidance. -- **Agriculture**: Drone-based crop monitoring and automated - > harvesting. +- **Agriculture**: Drone-based crop monitoring and automated harvesting. -- **Natural Language Processing**: Image captioning and visual - > question answering. +- **Natural Language Processing**: Image captioning and visual question answering. -- **Gesture Recognition**: For gaming, sign language translation, and - > human-machine interaction. +- **Gesture Recognition**: For gaming, sign language translation, and human-machine interaction. -- **Content Recommendation**: Image-based recommendation systems in - > e-commerce. +- **Content Recommendation**: Image-based recommendation systems in e-commerce. diff --git a/embedded_sys_exercise.qmd b/embedded_sys_exercise.qmd index 2fc19cab..ff6efd47 100644 --- a/embedded_sys_exercise.qmd +++ b/embedded_sys_exercise.qmd @@ -1,34 +1,16 @@ # Setup Nicla Vision {.unnumbered} -The [Arduino Nicla -Vision](https://docs.arduino.cc/hardware/nicla-vision) (sometimes called -*NiclaV*) is a development board that includes two processors that can -run tasks in parallel. It is part of a family of development boards with -the same form factor but designed for specific tasks, such as the [Nicla -Sense -ME](https://www.bosch-sensortec.com/software-tools/tools/arduino-nicla-sense-me/) -and the [Nicla -Voice](https://store-usa.arduino.cc/products/nicla-voice?_gl=1*l3abc6*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY5NjM0Mzk1My4xMDIuMS4xNjk2MzQ0MjQ1LjAuMC4w). -The *Niclas* can efficiently run processes created with TensorFlow™ -Lite. For example, one of the cores of the NiclaV computing a computer -vision algorithm on the fly (inference), while the other leads with -low-level operations like controlling a motor and communicating or -acting as a user interface. - -> *The onboard wireless module allows the management of WiFi and -> Bluetooth Low Energy (BLE) connectivity simultaneously.* - -![](images_2/media/image29.jpg){width="6.5in" -height="3.861111111111111in"} - -## Two Parallel Cores - -The central processor is the dual-core -[STM32H747,](https://content.arduino.cc/assets/Arduino-Portenta-H7_Datasheet_stm32h747xi.pdf?_gl=1*6quciu*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDYzMzkuMA..) -including a Cortex® M7 at 480 MHz and a Cortex® M4 at 240 MHz. The two -cores communicate via a Remote Procedure Call mechanism that seamlessly -allows calling functions on the other processor. Both processors share -all the on-chip peripherals and can run: +## Introduction + +The [Arduino Nicla Vision](https://docs.arduino.cc/hardware/nicla-vision) (sometimes called *NiclaV*) is a development board that includes two processors that can run tasks in parallel. It is part of a family of development boards with the same form factor but designed for specific tasks, such as the [Nicla Sense ME](https://www.bosch-sensortec.com/software-tools/tools/arduino-nicla-sense-me/) and the [Nicla Voice](https://store-usa.arduino.cc/products/nicla-voice?_gl=1*l3abc6*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY5NjM0Mzk1My4xMDIuMS4xNjk2MzQ0MjQ1LjAuMC4w). The *Niclas* can efficiently run processes created with TensorFlow™ Lite. For example, one of the cores of the NiclaV runs a computer vision algorithm on the fly (inference), while the other executes low-level operations like controlling a motor and communicating or acting as a user interface. The onboard wireless module allows the management of WiFi and Bluetooth Low Energy (BLE) connectivity simultaneously. + +![](images_2/media/image29.jpg){fig-align="center" width="6.5in"} + +## Hardware + +### Two Parallel Cores + +The central processor is the dual-core [STM32H747,](https://content.arduino.cc/assets/Arduino-Portenta-H7_Datasheet_stm32h747xi.pdf?_gl=1*6quciu*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDYzMzkuMA..) including a Cortex® M7 at 480 MHz and a Cortex® M4 at 240 MHz. The two cores communicate via a Remote Procedure Call mechanism that seamlessly allows calling functions on the other processor. Both processors share all the on-chip peripherals and can run: - Arduino sketches on top of the Arm® Mbed™ OS @@ -38,224 +20,133 @@ all the on-chip peripherals and can run: - TensorFlow™ Lite -![](images_2/media/image22.jpg){width="5.78125in" -height="5.78125in"} +![](images_2/media/image22.jpg){fig-align="center" width="6.5in"} -## Memory +### Memory -Memory is crucial for embedded machine learning projects. The NiclaV -board can host up to 16 MB of QSPI Flash for storage. However, it is -essential to consider that the MCU SRAM is the one to be used with -machine learning inferences; the STM32H747 is only 1MB, shared by both -processors. This MCU also has incorporated 2MB of FLASH, mainly for code -storage. +Memory is crucial for embedded machine learning projects. The NiclaV board can host up to 16 MB of QSPI Flash for storage. However, it is essential to consider that the MCU SRAM is the one to be used with machine learning inferences; the STM32H747 is only 1MB, shared by both processors. This MCU also has incorporated 2MB of FLASH, mainly for code storage. -## Sensors +### Sensors - **Camera**: A GC2145 2 MP Color CMOS Camera. -- **Microphone**: A - > [MP34DT05,](https://content.arduino.cc/assets/Nano_BLE_Sense_mp34dt05-a.pdf?_gl=1*12fxus9*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDc3NzMuMA..) - > an ultra-compact, low-power, omnidirectional, digital MEMS - > microphone built with a capacitive sensing element and an IC - > interface. +- **Microphone**: The `MP34DT05` is an ultra-compact, low-power, omnidirectional, digital MEMS microphone built with a capacitive sensing element and the IC interface. -- **6-Axis IMU**: 3D gyroscope and 3D accelerometer data from the - > LSM6DSOX 6-axis IMU. +- **6-Axis IMU**: 3D gyroscope and 3D accelerometer data from the `LSM6DSOX` 6-axis IMU. -- **Time of Flight Sensor**: The VL53L1CBV0FY Time-of-Flight sensor - > adds accurate and low power-ranging capabilities to the Nicla - > Vision. The invisible near-infrared VCSEL laser (including the - > analog driver) is encapsulated with receiving optics in an - > all-in-one small module below the camera. +- **Time of Flight Sensor**: The `VL53L1CBV0FY` Time-of-Flight sensor adds accurate and low power-ranging capabilities to the Nicla Vision. The invisible near-infrared VCSEL laser (including the analog driver) is encapsulated with receiving optics in an all-in-one small module below the camera. -### **HW Installation (Arduino IDE)** +## Arduino IDE Installation -Start connecting the board (USB-C) to your computer : +Start connecting the board (*microUSB*) to your computer: -![](images_2/media/image14.jpg){width="6.5in" -height="3.0833333333333335in"} +![](images_2/media/image14.jpg){fig-align="center" width="6.5in"} -Install the Mbed OS core for Nicla boards in the Arduino IDE. Having the -IDE open, navigate to Tools \> Board \> Board Manager, look for Arduino -Nicla Vision on the search window, and install the board. +Install the Mbed OS core for Nicla boards in the Arduino IDE. Having the IDE open, navigate to `Tools > Board > Board Manager`, look for Arduino Nicla Vision on the search window, and install the board. -![](images_2/media/image2.jpg){width="6.5in" -height="2.7083333333333335in"} +![](images_2/media/image2.jpg){fig-align="center" width="6.5in"} -Next, go to Tools \> Board \> Arduino Mbed OS Nicla Boards and select -Arduino Nicla Vision. Having your board connected to the USB, you should -see the Nicla on Port and select it. +Next, go to `Tools > Board > Arduino Mbed OS Nicla Boards` and select `Arduino Nicla Vision`. Having your board connected to the USB, you should see the Nicla on Port and select it. -> *Open the Blink sketch on Examples/Basic and run it using the IDE -> Upload button. You should see the Built-in LED (green RGB) blinking, -> which means the Nicla board is correctly installed and functional!* +> Open the Blink sketch on Examples/Basic and run it using the IDE Upload button. You should see the Built-in LED (green RGB) blinking, which means the Nicla board is correctly installed and functional! -## Testing the Microphone +### Testing the Microphone -On Arduino IDE, go to Examples \> PDM \> PDMSerialPlotter, open and run -the sketch. Open the Plotter and see the audio representation from the -microphone: +On Arduino IDE, go to `Examples > PDM > PDMSerialPlotter`, open and run the sketch. Open the Plotter and see the audio representation from the microphone: -![](images_2/media/image9.png){width="6.5in" -height="4.361111111111111in"} +![](images_2/media/image9.png){fig-align="center" width="6.5in"} -> *Vary the frequency of the sound you generate and confirm that the mic -> is working correctly.* +> Vary the frequency of the sound you generate and confirm that the mic is working correctly. -## Testing the IMU +### Testing the IMU -Before testing the IMU, it will be necessary to install the LSM6DSOX -library. For that, go to Library Manager and look for LSM6DSOX. Install -the library provided by Arduino: +Before testing the IMU, it will be necessary to install the LSM6DSOX library. For that, go to Library Manager and look for LSM6DSOX. Install the library provided by Arduino: -![](images_2/media/image19.jpg){width="6.5in" -height="2.4027777777777777in"} +![](images_2/media/image19.jpg){fig-align="center" width="6.5in"} -Next, go to Examples \> Arduino_LSM6DSOX \> SimpleAccelerometer and run -the accelerometer test (you can also run Gyro and board temperature): +Next, go to `Examples > Arduino_LSM6DSOX > SimpleAccelerometer` and run the accelerometer test (you can also run Gyro and board temperature): -![](images_2/media/image28.png){width="6.5in" -height="4.361111111111111in"} +![](images_2/media/image28.png){fig-align="center" width="6.5in"} -### **Testing the ToF (Time of Flight) Sensor** +### Testing the ToF (Time of Flight) Sensor -As we did with IMU, installing the ToF library, the VL53L1X is -necessary. For that, go to Library Manager and look for VL53L1X. Install -the library provided by Pololu: +As we did with IMU, it is necessary to install the VL53L1X ToF library. For that, go to Library Manager and look for VL53L1X. Install the library provided by Pololu: -![](images_2/media/image15.jpg){width="6.5in" -height="2.4583333333333335in"} +![](images_2/media/image15.jpg){fig-align="center" width="6.5in"} -Next, run the sketch -[proximity_detection.ino](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/distance_image_meter.py): +Next, run the sketch [proximity_detection.ino](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/distance_image_meter.py): -![](images_2/media/image12.png){width="4.947916666666667in" -height="4.635416666666667in"} +![](images_2/media/image12.png){fig-align="center" width="6.5in"} -On the Serial Monitor, you will see the distance from the camera and an -object in front of it (max of 4m). +On the Serial Monitor, you will see the distance from the camera to an object in front of it (max of 4m). -![](images_2/media/image13.jpg){width="6.5in" -height="4.847222222222222in"} +![](images_2/media/image13.jpg){fig-align="center" width="6.5in"} -## Testing the Camera +### Testing the Camera -We can also test the camera using, for example, the code provided on -Examples \> Camera \> CameraCaptureRawBytes. We can not see the image -directly, but it is possible to get the raw image data generated by the -camera. +We can also test the camera using, for example, the code provided on `Examples > Camera > CameraCaptureRawBytes`. We cannot see the image directly, but it is possible to get the raw image data generated by the camera. -Anyway, the best test with the camera is to see a live image. For that, -we will use another IDE, the OpenMV. +Anyway, the best test with the camera is to see a live image. For that, we will use another IDE, the OpenMV. ## Installing the OpenMV IDE -OpenMV IDE is the premier integrated development environment for use -with OpenMV Cameras and the one on the Portenta. It features a powerful -text editor, debug terminal, and frame buffer viewer with a histogram -display. We will use MicroPython to program the camera. +OpenMV IDE is the premier integrated development environment with OpenMV Cameras like the one on the Nicla Vision. It features a powerful text editor, debug terminal, and frame buffer viewer with a histogram display. We will use MicroPython to program the camera. -Go to the [OpenMV IDE page](https://openmv.io/pages/download), download -the correct version for your Operating System, and follow the -instructions for its installation on your computer. +Go to the [OpenMV IDE page](https://openmv.io/pages/download), download the correct version for your Operating System, and follow the instructions for its installation on your computer. -![](images_2/media/image21.png){width="6.5in" -height="4.791666666666667in"} +![](images_2/media/image21.png){fig-align="center" width="6.5in"} -The IDE should open, defaulting the helloworld_1.py code on its Code -Area. If not, you can open it from Files \> Examples \> HelloWord \> -helloword.py +The IDE should open, defaulting to the helloworld_1.py code on its Code Area. If not, you can open it from `Files > Examples > HelloWord > helloword.py` -![](images_2/media/image7.png){width="6.5in" -height="4.444444444444445in"} +![](images_2/media/image7.png){fig-align="center" width="6.5in"} -Any messages sent through a serial connection (using print() or error -messages) will be displayed on the **Serial Terminal** during run time. -The image captured by a camera will be displayed in the **Camera -Viewer** Area (or Frame Buffer) and in the Histogram area, immediately -below the Camera Viewer. +Any messages sent through a serial connection (using print() or error messages) will be displayed on the **Serial Terminal** during run time. The image captured by a camera will be displayed in the **Camera Viewer** Area (or Frame Buffer) and in the Histogram area, immediately below the Camera Viewer. -OpenMV IDE is the premier integrated development environment with OpenMV -Cameras and the Arduino Pro boards. It features a powerful text editor, -debug terminal, and frame buffer viewer with a histogram display. We -will use MicroPython to program the Nicla Vision. +OpenMV IDE is the premier integrated development environment with OpenMV Cameras and the Arduino Pro boards. It features a powerful text editor, debug terminal, and frame buffer viewer with a histogram display. We will use MicroPython to program the Nicla Vision. -> *Before connecting the Nicla to the OpenMV IDE, ensure you have the -> latest bootloader version. To that, go to your Arduino IDE, select the -> Nicla board, and open the sketch on Examples \> STM_32H747_System -> STM_32H747_updateBootloader. Upload the code to your board. The Serial -> Monitor will guide you.* +> Before connecting the Nicla to the OpenMV IDE, ensure you have the latest bootloader version. Go to your Arduino IDE, select the Nicla board, and open the sketch on `Examples > STM_32H747_System STM_32H747_updateBootloader`. Upload the code to your board. The Serial Monitor will guide you. -After updating the bootloader, put the Nicla Vision in bootloader mode -by double-pressing the reset button on the board. The built-in green LED -will start fading in and out. Now return to the OpenMV IDE and click on -the connect icon (Left ToolBar): +After updating the bootloader, put the Nicla Vision in bootloader mode by double-pressing the reset button on the board. The built-in green LED will start fading in and out. Now return to the OpenMV IDE and click on the connect icon (Left ToolBar): -![](images_2/media/image23.jpg){width="4.010416666666667in" -height="1.0520833333333333in"} +![](images_2/media/image23.jpg){fig-align="center" width="4.010416666666667in"} -A pop-up will tell you that a board in DFU mode was detected and ask you -how you would like to proceed. First, select \"Install the latest -release firmware.\" This action will install the latest OpenMV firmware -on the Nicla Vision. +A pop-up will tell you that a board in DFU mode was detected and ask how you would like to proceed. First, select `Install the latest release firmware (vX.Y.Z)`. This action will install the latest OpenMV firmware on the Nicla Vision. -![](images_2/media/image10.png){width="6.5in" -height="2.6805555555555554in"} +![](images_2/media/image10.png){fig-align="center" width="6.5in"} -You can leave the option of erasing the internal file system unselected -and click \[OK\]. +You can leave the option `Erase internal file system` unselected and click `[OK]`. -Nicla\'s green LED will start flashing while the OpenMV firmware is -uploaded to the board, and a terminal window will then open, showing the -flashing progress. +Nicla's green LED will start flashing while the OpenMV firmware is uploaded to the board, and a terminal window will then open, showing the flashing progress. -![](images_2/media/image5.png){width="4.854166666666667in" -height="3.5416666666666665in"} +![](images_2/media/image5.png){fig-align="center" width="4.854166666666667in"} -Wait until the green LED stops flashing and fading. When the process -ends, you will see a message saying, \"DFU firmware update complete!\". -Press \[OK\]. +Wait until the green LED stops flashing and fading. When the process ends, you will see a message saying, "DFU firmware update complete!". Press `[OK]`. -![](images_2/media/image1.png){width="3.875in" -height="5.708333333333333in"} +![](images_2/media/image1.png){fig-align="center" width="3.875in"} -A green play button appears when the Nicla Vison connects to the Tool -Bar. +A green play button appears when the Nicla Vison connects to the Tool Bar. -![](images_2/media/image18.jpg){width="4.791666666666667in" -height="1.4791666666666667in"} +![](images_2/media/image18.jpg){fig-align="center" width="4.791666666666667in"} Also, note that a drive named "NO NAME" will appear on your computer.: -![](images_2/media/image3.png){width="6.447916666666667in" -height="2.4166666666666665in"} +![](images_2/media/image3.png){fig-align="center" width="6.447916666666667in"} -Every time you press the \[RESET\] button on the board, it automatically -executes the main.py script stored on it. You can load the -[main.py](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/main.py) -code on the IDE (File \> Open File\...). +Every time you press the `[RESET]` button on the board, it automatically executes the *main.py* script stored on it. You can load the [main.py](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/main.py) code on the IDE (`File > Open File...`). -![](images_2/media/image16.png){width="4.239583333333333in" -height="3.8229166666666665in"} +![](images_2/media/image16.png){fig-align="center" width="4.239583333333333in"} -> *This code is the \"Blink\" code, confirming that the HW is OK.* +> This code is the "Blink" code, confirming that the HW is OK. -For testing the camera, let\'s run helloword_1.py. For that, select the -script on File \> Examples \> HelloWorld \> helloword.py, +For testing the camera, let's run *helloword_1.py*. For that, select the script on `File > Examples > HelloWorld > helloword.py`, -When clicking the green play button, the MicroPython script -(hellowolrd.py) on the Code Area will be uploaded and run on the Nicla -Vision. On-Camera Viewer, you will start to see the video streaming. The -Serial Monitor will show us the FPS (Frames per second), which should be -around 14fps. +When clicking the green play button, the MicroPython script (*hellowolrd.py*) on the Code Area will be uploaded and run on the Nicla Vision. On-Camera Viewer, you will start to see the video streaming. The Serial Monitor will show us the FPS (Frames per second), which should be around 14fps. -![](images_2/media/image6.png){width="6.5in" -height="3.9722222222222223in"} +![](images_2/media/image6.png){fig-align="center" width="6.5in"} -Let\'s go through the [helloworld.py](http://helloworld.py/) script: +Here is the [helloworld.py](http://helloworld.py/) script: -```python +``` python # Hello World Example 2 # # Welcome to the OpenMV IDE! Click on the green run arrow button below to run the script! @@ -274,123 +165,77 @@ while(True): print(clock.fps()) ``` - -In GitHub, you can find the Python scripts used here. +In [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision), you can find the Python scripts used here. The code can be split into two parts: -- **Setup**: Where the libraries are imported and initialized, and the - > variables are defined and initiated. +- **Setup**: Where the libraries are imported, initialized and the variables are defined and initiated. -- **Loop**: (while loop) part of the code that runs continually. The - > image (img variable) is captured (a frame). Each of those frames - > can be used for inference in Machine Learning Applications. +- **Loop**: (while loop) part of the code that runs continually. The image (*img* variable) is captured (one frame). Each of those frames can be used for inference in Machine Learning Applications. -To interrupt the program execution, press the red \[X\] button. +To interrupt the program execution, press the red `[X]` button. -> *Note: OpenMV Cam runs about half as fast when connected to the IDE. -> The FPS should increase once disconnected.* +> Note: OpenMV Cam runs about half as fast when connected to the IDE. The FPS should increase once disconnected. -In [[the GitHub, You can find other Python -scripts]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython). -Try to test the onboard sensors. +In the [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython), You can find other Python scripts. Try to test the onboard sensors. ## Connecting the Nicla Vision to Edge Impulse Studio -We will use the Edge Impulse Studio later in other exercises. [Edge -Impulse I](https://www.edgeimpulse.com/)s a leading development platform -for machine learning on edge devices. +We will need the Edge Impulse Studio later in other exercises. [Edge Impulse](https://www.edgeimpulse.com/) is a leading development platform for machine learning on edge devices. -Edge Impulse officially supports the Nicla Vision. So, for starting, -please create a new project on the Studio and connect the Nicla to it. -For that, follow the steps: +Edge Impulse officially supports the Nicla Vision. So, for starting, please create a new project on the Studio and connect the Nicla to it. For that, follow the steps: -- Download the [last EI - > Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) - > and unzip it. +- Download the most updated [EI Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. -- Open the zip file on your computer and select the uploader related - > to your OS: +- Open the zip file on your computer and select the uploader corresponding to your OS: -![](images_2/media/image17.png){width="4.416666666666667in" -height="1.5520833333333333in"} +![](images_2/media/image17.png){fig-align="center" width="4.416666666666667in"} - Put the Nicla-Vision on Boot Mode, pressing the reset button twice. -- Execute the specific batch code for your OS for uploading the binary - > (arduino-nicla-vision.bin) to your board. +- Execute the specific batch code for your OS for uploading the binary *arduino-nicla-vision.bin* to your board. -Go to your project on the Studio, and on the Data Acquisition tab, -select WebUSB (1). A window will appear; choose the option that shows -that the Nicla is pared (2) and press \[Connect\] (3). +Go to your project on the Studio, and on the `Data Acquisition tab`, select `WebUSB` (1). A window will pop up; choose the option that shows that the `Nicla is paired` (2) and press `[Connect]` (3). -![](images_2/media/image27.png){width="6.5in" -height="4.319444444444445in"} +![](images_2/media/image27.png){fig-align="center" width="6.5in"} -In the Collect Data section on the Data Acquisition tab, you can choose -what sensor data you will pick. +In the *Collect Data* section on the `Data Acquisition` tab, you can choose which sensor data to pick. -![](images_2/media/image25.png){width="6.5in" -height="4.319444444444445in"} +![](images_2/media/image25.png){fig-align="center" width="6.5in"} -For example. IMU data: +For example. `IMU data`: -![](images_2/media/image8.png){width="6.5in" -height="4.319444444444445in"} +![](images_2/media/image8.png){fig-align="center" width="6.5in"} -Or Image: +Or Image (`Camera`): -![](images_2/media/image4.png){width="6.5in" -height="4.319444444444445in"} +![](images_2/media/image4.png){fig-align="center" width="6.5in"} -And so on. You can also test an external sensor connected to the Nicla -ADC (pin 0) and the other onboard sensors, such as the microphone and -the ToF. +And so on. You can also test an external sensor connected to the `ADC` (Nicla pin 0) and the other onboard sensors, such as the microphone and the ToF. -### **Expanding the Nicla Vision Board (optional)** +## Expanding the Nicla Vision Board (optional) -A last item to be explored is that sometimes, during prototyping, it is -essential to experiment with external sensors and devices, and an -excellent expansion to the Nicla is the [Arduino MKR Connector Carrier -(Grove -compatible)](https://store-usa.arduino.cc/products/arduino-mkr-connector-carrier-grove-compatible). +A last item to be explored is that sometimes, during prototyping, it is essential to experiment with external sensors and devices, and an excellent expansion to the Nicla is the [Arduino MKR Connector Carrier (Grove compatible)](https://store-usa.arduino.cc/products/arduino-mkr-connector-carrier-grove-compatible). -The shield has 14 Grove connectors: five single analog inputs, one -single analog input, five single digital I/Os, one double digital I/O, -one I2C, and one UART. All connectors are 5V compatible. +The shield has 14 Grove connectors: five single analog inputs (A0-A5), one double analog input (A5/A6), five single digital I/Os (D0-D4), one double digital I/O (D5/D6), one I2C (TWI), and one UART (Serial). All connectors are 5V compatible. -> *Note that besides all 17 Nicla Vision pins that will be connected to -> the Shield Groves, some Grove connections are disconnected.* +> Note that all 17 Nicla Vision pins will be connected to the Shield Groves, but some Grove connections remain disconnected. -![](images_2/media/image20.jpg){width="6.5in" -height="4.875in"} +![](images_2/media/image20.jpg){fig-align="center" width="6.5in"} -This shield is MKR compatible and can be used with the Nicla Vision and -the Portenta. +This shield is MKR compatible and can be used with the Nicla Vision and Portenta. -![](images_2/media/image26.jpg){width="4.34375in" -height="5.78125in"} +![](images_2/media/image26.jpg){fig-align="center" width="4.34375in"} -For example, suppose that on a TinyML project, you want to send -inference results using a LoRaWan device and add information about local -luminosity. Besides, with offline operations, a local low-power display -as an OLED display is advised. This setup can be seen here: +For example, suppose that on a TinyML project, you want to send inference results using a LoRaWAN device and add information about local luminosity. Often, with offline operations, a local low-power display such as an OLED is advised. This setup can be seen here: -![](images_2/media/image11.jpg){width="6.5in" -height="4.708333333333333in"} +![](images_2/media/image11.jpg){fig-align="center" width="6.5in"} -The [Grove Light -Sensor](https://wiki.seeedstudio.com/Grove-Light_Sensor/) would be -connected to one of the single Analog pins (A0/PC4), the [LoRaWan -device](https://wiki.seeedstudio.com/Grove_LoRa_E5_New_Version/) to the -UART, and the [OLED](https://arduino.cl/producto/display-oled-grove/) to -the I2C connector. +The [Grove Light Sensor](https://wiki.seeedstudio.com/Grove-Light_Sensor/) would be connected to one of the single Analog pins (A0/PC4), the [LoRaWAN device](https://wiki.seeedstudio.com/Grove_LoRa_E5_New_Version/) to the UART, and the [OLED](https://arduino.cl/producto/display-oled-grove/) to the I2C connector. -The Nicla Pins 3 (Tx) and 4 (Rx) are connected with the Shield Serial -connector. The UART communication is used with the LoRaWan device. Here -is a simple code to use the UART.: +The Nicla Pins 3 (Tx) and 4 (Rx) are connected with the Serial Shield connector. The UART communication is used with the LoRaWan device. Here is a simple code to use the UART: -```python +``` python # UART Test - By: marcelo_rovai - Sat Sep 23 2023 import time @@ -407,25 +252,15 @@ while(True): uart.write("Hello World!\r\n") redLED.toggle() time.sleep_ms(1000) - ``` -To verify if the UART is working, you should, for example, connect -another device as an [Arduino -UNO](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Arduino-IDE/teste_uart_UNO/teste_uart_UNO.ino), -displaying the Hello Word. +To verify that the UART is working, you should, for example, connect another device as the Arduino UNO, displaying "Hello Word" on the Serial Monitor. Here is the [code](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Arduino-IDE/teste_uart_UNO/teste_uart_UNO.ino). -![](images_2/media/image24.gif){width="2.8125in" -height="3.75in"} +![](images_2/media/image24.jpg){fig-align="center" width="2.8125in"} -Here is a Hello World code to be used with the I2C OLED. The MicroPython -SSD1306 OLED driver (ssd1306.py), created by Adafruit, should also be -uploaded to the Nicla (the -[[ssd1306.py]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/ssd1306.py) -can be found in GitHub). +Below is the *Hello World code* to be used with the I2C OLED. The MicroPython SSD1306 OLED driver (ssd1306.py), created by Adafruit, should also be uploaded to the Nicla (the ssd1306.py script can be found in [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/ssd1306.py)). - -```python +``` python # Nicla_OLED_Hello_World - By: marcelo_rovai - Sat Sep 30 2023 #Save on device: MicroPython SSD1306 OLED driver, I2C and SPI interfaces created by Adafruit @@ -442,9 +277,9 @@ oled.text('Hello, World', 10, 10) oled.show() ``` -Finally, here is a simple script to read the ADC value on pin \"PC4\" -(Nicla pin A0): -```python +Finally, here is a simple script to read the ADC value on pin "PC4" (Nicla pin A0): + +``` python # Light Sensor (A0) - By: marcelo_rovai - Wed Oct 4 2023 @@ -461,22 +296,12 @@ while (True): sleep (1) ``` -The ADC can be used for other valuable sensors, such as -[Temperature](https://wiki.seeedstudio.com/Grove-Temperature_Sensor_V1.2/). +The ADC can be used for other sensor variables, such as [Temperature](https://wiki.seeedstudio.com/Grove-Temperature_Sensor_V1.2/). -> *Note that the above scripts ([[downloaded from -> Github]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython)) -> only introduce how to connect external devices with the Nicla Vision -> board using MicroPython.* +> Note that the above scripts ([[downloaded from Github]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython)) introduce only how to connect external devices with the Nicla Vision board using MicroPython. ## Conclusion -The Arduino Nicla Vision is an excellent *tiny device* for industrial -and professional uses! However, it is powerful, trustworthy, low power, -and has suitable sensors for the most common embedded machine learning -applications such as vision, movement, sensor fusion, and sound. +The Arduino Nicla Vision is an excellent *tiny device* for industrial and professional uses! However, it is powerful, trustworthy, low power, and has suitable sensors for the most common embedded machine learning applications such as vision, movement, sensor fusion, and sound. -> *On the* *[GitHub -> repository,](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main) -> you will find the last version of all the codes used or commented on -> in this exercise.* +> On the [GitHub repository,](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main) you will find the last version of all the codes used or commented on in this hands-on exercise. diff --git a/images_2/media/image24.jpg b/images_2/media/image24.jpg new file mode 100644 index 00000000..084cff5d Binary files /dev/null and b/images_2/media/image24.jpg differ diff --git a/images_4_2/frame_to_fft.png b/images_4_2/frame_to_fft.png new file mode 100644 index 00000000..360582da Binary files /dev/null and b/images_4_2/frame_to_fft.png differ diff --git a/images_4_2/frame_wind.png b/images_4_2/frame_wind.png new file mode 100644 index 00000000..c34e0bc5 Binary files /dev/null and b/images_4_2/frame_wind.png differ diff --git a/images_4_2/kws_diagram.png b/images_4_2/kws_diagram.png new file mode 100644 index 00000000..d441997d Binary files /dev/null and b/images_4_2/kws_diagram.png differ diff --git a/images_4_2/logo_original_tinyML4D.png b/images_4_2/logo_original_tinyML4D.png new file mode 100644 index 00000000..2ae8013d Binary files /dev/null and b/images_4_2/logo_original_tinyML4D.png differ diff --git a/images_4_2/melbank-1_00.hires.png b/images_4_2/melbank-1_00.hires.png new file mode 100644 index 00000000..9f4b5277 Binary files /dev/null and b/images_4_2/melbank-1_00.hires.png differ diff --git a/images_4_2/mfcc_final.png b/images_4_2/mfcc_final.png new file mode 100644 index 00000000..22c969a8 Binary files /dev/null and b/images_4_2/mfcc_final.png differ diff --git a/images_4_2/time_vs_freq.png b/images_4_2/time_vs_freq.png new file mode 100644 index 00000000..a91e1707 Binary files /dev/null and b/images_4_2/time_vs_freq.png differ diff --git a/images_4_2/yes_no_mfcc.png b/images_4_2/yes_no_mfcc.png new file mode 100644 index 00000000..4d10d794 Binary files /dev/null and b/images_4_2/yes_no_mfcc.png differ diff --git a/kws_feature_eng.qmd b/kws_feature_eng.qmd new file mode 100644 index 00000000..9fe8108c --- /dev/null +++ b/kws_feature_eng.qmd @@ -0,0 +1,141 @@ +# Feature Engineering for Audio Classification {.unnumbered} + +## Introduction + +In this hands-on tutorial, the emphasis is on the critical role that feature engineering plays in optimizing the performance of machine learning models applied to audio classification tasks, such as speech recognition. It is essential to be aware that the performance of any machine learning model relies heavily on the quality of features used, and we will deal with "under-the-hood" mechanics of feature extraction, mainly focusing on Mel-frequency Cepstral Coefficients (MFCCs), a cornerstone in the field of audio signal processing. + +Machine learning models, especially traditional algorithms, don't understand audio waves. They understand numbers arranged in some meaningful way, i.e., features. These features encapsulate the characteristics of the audio signal, making it easier for models to distinguish between different sounds. + +> This tutorial will deal with generating features specifically for audio classification. This can be particularly interesting for applying machine learning to a variety of audio data, whether for speech recognition, music categorization, insect classification based on wingbeat sounds, or other sound analysis tasks + +## The KWS + +The most common TinyML application is Keyword Spotting (KWS), a subset of the broader field of speech recognition. While general speech recognition aims to transcribe all spoken words into text, Keyword Spotting focuses on detecting specific "keywords" or "wake words" in a continuous audio stream. The system is trained to recognize these keywords as predefined phrases or words, such as *yes* or *no*. In short, KWS is a specialized form of speech recognition with its own set of challenges and requirements. + +Here a typical KWS Process using MFCC Feature Converter: ![](images_4_2/kws_diagram.png){fig-align="center" width="6.5in"} + +#### Applications of KWS: + +- **Voice Assistants**: In devices like Amazon's *Alexa* or *Google Home*, KWS is used to detect the wake word ("Alexa" or "Hey Google") to activate the device. +- **Voice-Activated Controls**: In automotive or industrial settings, KWS can be used to initiate specific commands like "Start engine" or "Turn off lights." +- **Security Systems**: Voice-activated security systems may use KWS to authenticate users based on a spoken passphrase. +- **Telecommunication Services**: Customer service lines may use KWS to route calls based on spoken keywords. + +#### Differences from General Speech Recognition: + +- **Computational Efficiency**: KWS is usually designed to be less computationally intensive than full speech recognition, as it only needs to recognize a small set of phrases. +- **Real-time Processing**: KWS often operates in real-time and is optimized for low-latency detection of keywords. +- **Resource Constraints**: KWS models are often designed to be lightweight, so they can run on devices with limited computational resources, like microcontrollers or mobile phones. +- **Focused Task**: While general speech recognition models are trained to handle a broad range of vocabulary and accents, KWS models are fine-tuned to recognize specific keywords, often in noisy environments accurately. + +## Introduction to Audio Signals + +Understanding the basic properties of audio signals is crucial for effective feature extraction and, ultimately, for successfully applying machine learning algorithms in audio classification tasks. Audio signal is a complex waveform that capture fluctuations in air pressure over time. These signals can be characterized by several fundamental attributes such as sampling rate, frequency, and amplitude. + +- **Frequency and Amplitude**: [Frequency](https://en.wikipedia.org/wiki/Audio_frequency) refers to the number of oscillations a waveform undergoes per unit time and is also measured in Hz. In the context of audio signals, different frequencies correspond to different pitches. [Amplitude](https://en.wikipedia.org/wiki/Amplitude), on the other hand, measures the magnitude of the oscillations and correlates with the loudness of the sound. Both frequency and amplitude are essential features that capture audio signals' tonal and rhythmic qualities. + +- **Sampling Rate**: The [sampling rate](https://en.wikipedia.org/wiki/Sampling_(signal_processing)), often denoted in Hertz (Hz), defines the number of samples taken per second when digitizing an analog signal. A higher sampling rate allows for a more accurate digital representation of the signal but also demands more computational resources for processing. Typical sampling rates include 44.1 kHz for CD-quality audio and 16 kHz or 8 kHz for speech recognition tasks. Understanding the trade-offs in selecting an appropriate sampling rate is essential for balancing accuracy and computational efficiency. In general, with TinyML projects, we work with 16KHz. Although music tones can be heard at frequencies up to 20 kHz, voice maxes out at 8 kHz. Traditional telephone systems use an 8 kHz sampling frequency. + +> For an accurate representation of the signal, the sampling rate must be at least twice the highest frequency present in the signal. + +- **Time Domain vs. Frequency Domain**: Audio signals can be analyzed in the time and frequency domains. In the time domain, a signal is represented as a waveform where the amplitude is plotted against time. This representation helps to observe temporal features like onset and duration but the signal's tonal characteristics are not well evidenced. Conversely, a frequency domain representation provides a view of the signal's constituent frequencies and their respective amplitudes, typically obtained via a Fourier Transform. This is invaluable for tasks that require understanding the signal's spectral content, such as identifying musical notes or speech phonemes (our case). + +The image below shows the words `YES` and `NO` with typical representations in the Time (Raw Audio) and Frequency domains: + +![](images_4_2/time_vs_freq.png){fig-align="center" width="6.5in"} + +### Why Not Raw Audio? + +While using raw audio data directly for machine learning tasks may seem tempting, this approach presents several challenges that make it less suitable for building robust and efficient models. + +Using raw audio data for Keyword Spotting (KWS), for example, on TinyML devices poses challenges due to its high dimensionality (using a 16 kHz sampling rate), computational complexity for capturing temporal features, susceptibility to noise, and lack of semantically meaningful features, making feature extraction techniques like MFCCs a more practical choice for resource-constrained applications. + +Here are some additional details of the critical issues associated with using raw audio: + +- **High Dimensionality**: Audio signals, especially those sampled at high rates, result in large amounts of data. For example, a 1-second audio clip sampled at 16 kHz will have 16,000 individual data points. High-dimensional data increases computational complexity, leading to longer training times and higher computational costs, making it impractical for resource-constrained environments. Furthermore, the wide dynamic range of audio signals requires a significant amount of bits per sample, while conveying little useful information. + +- **Temporal Dependencies**: Raw audio signals have temporal structures that simple machine learning models may find hard to capture. While recurrent neural networks like [LSTMs](https://annals-csis.org/Volume_18/drp/pdf/185.pdf) can model such dependencies, they are computationally intensive and tricky to train on tiny devices. + +- **Noise and Variability**: Raw audio signals often contain background noise and other non-essential elements affecting model performance. Additionally, the same sound can have different characteristics based on various factors such as distance from the microphone, the orientation of the sound source, and acoustic properties of the environment, adding to the complexity of the data. + +- **Lack of Semantic Meaning**: Raw audio doesn't inherently contain semantically meaningful features for classification tasks. Features like pitch, tempo, and spectral characteristics, which can be crucial for speech recognition, are not directly accessible from raw waveform data. + +- **Signal Redundancy**: Audio signals often contain redundant information, with certain portions of the signal contributing little to no value to the task at hand. This redundancy can make learning inefficient and potentially lead to overfitting. + +For these reasons, feature extraction techniques such as Mel-frequency Cepstral Coefficients (MFCCs), Mel-Frequency Energies (MFEs), and simple Spectograms are commonly used to transform raw audio data into a more manageable and informative format. These features capture the essential characteristics of the audio signal while reducing dimensionality and noise, facilitating more effective machine learning. + +## Introduction to MFCCs + +### What are MFCCs? + +[Mel-frequency Cepstral Coefficients (MFCCs)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) are a set of features derived from the spectral content of an audio signal. They are based on human auditory perceptions and are commonly used to capture the phonetic characteristics of an audio signal. The MFCCs are computed through a multi-step process that includes pre-emphasis, framing, windowing, applying the Fast Fourier Transform (FFT) to convert the signal to the frequency domain, and finally, applying the Discrete Cosine Transform (DCT). The result is a compact representation of the original audio signal's spectral characteristics. + +The image below shows the words `YES` and `NO` in their MFCC representation: + +![](images_4_2/yes_no_mfcc.png){fig-align="center" width="6.5in"} + +> This [video](https://youtu.be/SJo7vPgRlBQ?si=KSgzmDg8DtSVqzXp) explains the Mel Frequency Cepstral Coefficients (MFCC) and how to compute them. + +### Why are MFCCs important? + +MFCCs are crucial for several reasons, particularly in the context of Keyword Spotting (KWS) and TinyML: + +- **Dimensionality Reduction**: MFCCs capture essential spectral characteristics of the audio signal while significantly reducing the dimensionality of the data, making it ideal for resource-constrained TinyML applications. +- **Robustness**: MFCCs are less susceptible to noise and variations in pitch and amplitude, providing a more stable and robust feature set for audio classification tasks. +- **Human Auditory System Modeling**: The Mel scale in MFCCs approximates the human ear's response to different frequencies, making them practical for speech recognition where human-like perception is desired. +- **Computational Efficiency**: The process of calculating MFCCs is computationally efficient, making it well-suited for real-time applications on hardware with limited computational resources. + +In summary, MFCCs offer a balance of information richness and computational efficiency, making them popular for audio classification tasks, particularly in constrained environments like TinyML. + +### Computing MFCCs + +The computation of Mel-frequency Cepstral Coefficients (MFCCs) involves several key steps. Let's walk through these, which are particularly important for Keyword Spotting (KWS) tasks on TinyML devices. + +- **Pre-emphasis**: The first step is pre-emphasis, which is applied to accentuate the high-frequency components of the audio signal and balance the frequency spectrum. This is achieved by applying a filter that amplifies the difference between consecutive samples. The formula for pre-emphasis is: y(t) = x(t) - $\alpha$ x(t-1) , where $\alpha$ is the pre-emphasis factor, typically around 0.97. + +- **Framing**: Audio signals are divided into short frames (the *frame length*), usually 20 to 40 milliseconds. This is based on the assumption that frequencies in a signal are stationary over a short period. Framing helps in analyzing the signal in such small time slots. The *frame stride* (or step) will displace one frame and the adjacent. Those steps could be sequential or overlapped. + +- **Windowing**: Each frame is then windowed to minimize the discontinuities at the frame boundaries. A commonly used window function is the Hamming window. Windowing prepares the signal for a Fourier transform by minimizing the edge effects. The image below shows three frames (10, 20, and 30) and the time samples after windowing (note that the frame length and frame stride are 20 ms): + +![](images_4_2/frame_wind.png){fig-align="center" width="6.5in"} + +- **Fast Fourier Transform (FFT)** The Fast Fourier Transform (FFT) is applied to each windowed frame to convert it from the time domain to the frequency domain. The FFT gives us a complex-valued representation that includes both magnitude and phase information. However, for MFCCs, only the magnitude is used to calculate the Power Spectrum. The power spectrum is the square of the magnitude spectrum and measures the energy present at each frequency component. + +> The power spectrum $P(f)$ of a signal $x(t)$ is defined as $P(f) = |X(f)|^2$, where $X(f)$ is the Fourier Transform of $x(t)$. By squaring the magnitude of the Fourier Transform, we emphasize *stronger* frequencies over *weaker* ones, thereby capturing more relevant spectral characteristics of the audio signal. This is important in applications like audio classification, speech recognition, and Keyword Spotting (KWS), where the focus is on identifying distinct frequency patterns that characterize different classes of audio or phonemes in speech. + +![](images_4_2/frame_to_fft.png){fig-align="center" width="6.5in"} + +- **Mel Filter Banks**: The frequency domain is then mapped to the [Mel scale](https://en.wikipedia.org/wiki/Mel_scale), which approximates the human ear's response to different frequencies. The idea is to extract more features (more filter banks) in the lower frequencies and less in the high frequencies. Thus, it performs well on sounds distinguished by the human ear. Typically, 20 to 40 triangular filters extract the Mel-frequency energies. These energies are then log-transformed to convert multiplicative factors into additive ones, making them more suitable for further processing. + +![](images_4_2/melbank-1_00.hires.png){fig-align="center" width="6.5in"} + +- **Discrete Cosine Transform (DCT)**: The last step is to apply the [Discrete Cosine Transform (DCT)](https://en.wikipedia.org/wiki/Discrete_cosine_transform) to the log Mel energies. The DCT helps to decorrelate the energies, effectively compressing the data and retaining only the most discriminative features. Usually, the first 12-13 DCT coefficients are retained, forming the final MFCC feature vector. + +![](images_4_2/mfcc_final.png){fig-align="center" width="6.5in"} + +## Hands-On using Python + +Let's apply what we discussed while working on an actual audio sample. Open the notebook on Google CoLab and extract the MLCC features on your audio samples: [\[Open In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/Audio_Data_Analysis.ipynb) + +## Conclusion + +### What Feature Extraction technique should we use? + +Mel-frequency Cepstral Coefficients (MFCCs), Mel-Frequency Energies (MFEs), or Spectrogram are techniques for representing audio data, which are often helpful in different contexts. + +In general, MFCCs are more focused on capturing the envelope of the power spectrum, which makes them less sensitive to fine-grained spectral details but more robust to noise. This is often desirable for speech-related tasks. On the other hand, spectrograms or MFEs preserve more detailed frequency information, which can be advantageous in tasks that require discrimination based on fine-grained spectral content. + +#### MFCCs are particularly strong for: + +1. **Speech Recognition**: MFCCs are excellent for identifying phonetic content in speech signals. +2. **Speaker Identification**: They can be used to distinguish between different speakers based on voice characteristics. +3. **Emotion Recognition**: MFCCs can capture the nuanced variations in speech indicative of emotional states. +4. **Keyword Spotting**: Especially in TinyML, where low computational complexity and small feature size are crucial. + +#### Spectrograms or MFEs are often more suitable for: + +1. **Music Analysis**: Spectrograms can capture harmonic and timbral structures in music, which is essential for tasks like genre classification, instrument recognition, or music transcription. +2. **Environmental Sound Classification**: In recognizing non-speech, environmental sounds (e.g., rain, wind, traffic), the full spectrogram can provide more discriminative features. +3. **Birdsong Identification**: The intricate details of bird calls are often better captured using spectrograms. +4. **Bioacoustic Signal Processing**: In applications like dolphin or bat call analysis, the fine-grained frequency information in a spectrogram can be essential. +5. **Audio Quality Assurance**: Spectrograms are often used in professional audio analysis to identify unwanted noises, clicks, or other artifacts. diff --git a/object_detection_fomo.qmd b/object_detection_fomo.qmd new file mode 100644 index 00000000..b42d302f --- /dev/null +++ b/object_detection_fomo.qmd @@ -0,0 +1,309 @@ +--- +title: "Object Detection" +format: + html: + self-contained: true +editor: visual +--- + +## Introduction + +This is a continuation of **CV on Nicla Vision**, now exploring **Object Detection** on microcontrollers. + +![](Images_4_1/cv_obj_detect.png){fig-align="center" width="6.5in"} + +### Object Detection versus Image Classification + +The main task with Image Classification models is to produce a list of the most probable object categories present on an image, for example, to identify a tabby cat just after his dinner: + +![](Images_4_1/img_1.png){fig-align="center"} + +But what happens when the cat jumps near the wine glass? The model still only recognizes the predominant category on the image, the tabby cat: + +![](Images_4_1/img_2.png){fig-align="center"} + +And what happens if there is not a dominant category on the image? + +![](Images_4_1/img_3.png){fig-align="center"} + +The model identifies the above image completely wrong as an "ashcan," possibly due to the color tonalities. + +> The model used in all previous examples is the *MobileNet*, trained with a large dataset, the *ImageNet*. + +To solve this issue, we need another type of model, where not only **multiple categories** (or labels) can be found but also **where** the objects are located on a given image. + +As we can imagine, such models are much more complicated and bigger, for example, the **MobileNetV2 SSD FPN-Lite 320x320, trained with the COCO dataset.** This pre-trained object detection model is designed to locate up to 10 objects within an image, outputting a bounding box for each object detected. The below image is the result of such a model running on a Raspberry Pi: + +![](Images_4_1/img_4.png){fig-align="center"} + +Those models used for Object detection (such as the MobileNet SSD or YOLO) usually have several MB in size, which is OK for use with Raspberry Pi but unsuitable for use with embedded devices, where the RAM usually is lower than 1M Bytes. + +### An innovative solution for Object Detection: FOMO + +[Edge Impulse launched in 2022, **FOMO** (Faster Objects, More Objects)](https://docs.edgeimpulse.com/docs/edge-impulse-studio/learning-blocks/object-detection/fomo-object-detection-for-constrained-devices), a novel solution to perform object detection on embedded devices, not only on the Nicla Vision (Cortex M7) but also on Cortex M4F CPUs (Arduino Nano33 and OpenMV M4 series) as well the Espressif ESP32 devices (ESP-CAM and XIAO ESP32S3 Sense). + +In this Hands-On exercise, we will explore using FOMO with Object Detection, not entering many details about the model itself. To understand more about how the model works, you can go into the [official FOMO announcement](https://www.edgeimpulse.com/blog/announcing-fomo-faster-objects-more-objects) by Edge Impulse, where Louis Moreau and Mat Kelcey explain in detail how it works. + +## The Object Detection Project Goal + +All Machine Learning projects need to start with a detailed goal. Let's assume we are in an industrial facility and must sort and count **wheels** and special **boxes**. + +![](Images_4_1/proj_goal.png){fig-align="center" width="752"} + +In other words, we should perform a multi-label classification, where each image can have three classes: + +- Background (No objects) + +- Box + +- Wheel + +Here are some not labeled image samples that we should use to detect the objects (wheels and boxes): + +![](Images_4_1/samples.png){fig-align="center"} + +We are interested in which object is in the image, its location (centroid), and how many we can find on it. The object's size is not detected with FOMO, as with MobileNet SSD or YOLO, where the Bounding Box is one of the model outputs. + +We will develop the project using the Nicla Vision for image capture and model inference. The ML project will be developed using the Edge Impulse Studio. But before starting the object detection project in the Studio, let's create a *raw dataset* (not labeled) with images that contain the objects to be detected. + +## Data Collection + +We can use the Edge Impulse Studio, the OpenMV IDE, your phone, or other devices for the image capture. Here, we will use again the OpenMV IDE for our purpose. + +### Collecting Dataset with OpenMV IDE + +First, create in your computer a folder where your data will be saved, for example, "data." Next, on the OpenMV IDE, go to Tools \> Dataset Editor and select New Dataset to start the dataset collection: + +![](Images_4_1/data_folder.png){fig-align="center"}Edge impulse suggests that the objects should be of similar size and not overlapping for better performance. This is OK in an industrial facility, where the camera should be fixed, keeping the same distance from the objects to be detected. Despite that, we will also try with mixed sizes and positions to see the result. + +> We will not create separate folders for our images because each contains multiple labels. + +Connect the Nicla Vision to the OpenMV IDE and run the `dataset_capture_script.py`. Clicking on the Capture Image button will start capturing images: + +![](Images_4_1/img_5.png) + +We suggest around 50 images mixing the objects and varying the number of each appearing on the scene. Try to capture different angles, backgrounds, and light conditions. + +> The stored images use a QVGA frame size 320x240 and RGB565 (color pixel format). + +After capturing your dataset, close the Dataset Editor Tool on the `Tools > Dataset Editor`. + +## Edge Impulse Studio + +### Setup the project + +Go to [Edge Impulse Studio,](https://www.edgeimpulse.com/) enter your credentials at **Login** (or create an account), and start a new project. + +![](Images_4_1/img_6.png) + +> Here, you can clone the project developed for this hands-on: [NICLA_Vision_Object_Detection](https://studio.edgeimpulse.com/public/292737/latest). + +On your Project Dashboard, go down and on **Project info** and select **Bounding boxes (object detection)** and Nicla Vision as your Target Device: + +![](Images_4_1/img_7.png) + +### Uploading the unlabeled data + +On Studio, go to the `Data acquisition` tab, and on the `UPLOAD DATA` section, upload from your computer files captured. + +![](Images_4_1/img_8.png) + +> You can leave for the Studio to split your data automatically between Train and Test or do it manually. + +![](Images_4_1/img_9.png) + +All the not labeled images (51) were uploaded but they still need to be labeled appropriately before using them as a dataset in the project. The Studio has a tool for that purpose, which you can find in the link `Labeling queue (51)`. + +There are two ways you can use to perform AI-assisted labeling on the Edge Impulse Studio (free version): + +- Using yolov5 +- Tracking objects between frames + +> Edge Impulse launched an [auto-labeling feature](https://docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition/auto-labeler) for Enterprise customers, easing labeling tasks in object detection projects. + +Ordinary objects can quickly be identified and labeled using an existing library of pre-trained object detection models from YOLOv5 (trained with the COCO dataset). But since, in our case, the objects are not part of COCO datasets, we should select the option of `tracking objects`. With this option, once you draw bounding boxes and label the images in one frame, the objects will be tracked automatically from frame to frame, *partially* labeling the new ones (not all are correctly labeled). + +> You can use the [EI uploader](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader#bounding-boxes) to import your data if you already have a labeled dataset containing bounding boxes. + +### Labeling the Dataset + +Starting with the first image of your unlabeled data, use your mouse to drag a box around an object to add a label. Then click **Save labels** to advance to the next item. + +![](Images_4_1/img_10.png) + +Continue with this process until the queue is empty. At the end, all images should have the objects labeled as those samples below: + +![](Images_4_1/img_11.png) + +Next, review the labeled samples on the `Data acquisition` tab. If one of the labels was wrong, you can edit it using the *`three dots`* menu after the sample name: + +![](Images_4_1/img_12.png) + +You will be guided to replace the wrong label, correcting the dataset. + +![](Images_4_1/img_13.png) + +## The Impulse Design + +In this phase, you should define how to: + +- **Pre-processing** consists of resizing the individual images from `320 x 240` to `96 x 96` and squashing them (squared form, without cropping). Afterwards, the images are converted from RGB to Grayscale. + +- **Design a Model,** in this case, "Object Detection." + +![](Images_4_1/img_14.png) + +### Preprocessing all dataset + +In this section, select **Color depth** as `Grayscale`, which is suitable for use with FOMO models and Save `parameters`. + +![](Images_4_1/img_15.png) + +The Studio moves automatically to the next section, `Generate features`, where all samples will be pre-processed, resulting in a dataset with individual 96x96x1 images or 9,216 features. + +![](Images_4_1/img_16.png) + +The feature explorer shows that all samples evidence a good separation after the feature generation. + +> One of the samples (46) apparently is in the wrong space, but clicking on it can confirm that the labeling is correct. + +## Model Design, Training, and Test + +We will use FOMO, an object detection model based on MobileNetV2 (alpha 0.35) designed to coarsely segment an image into a grid of **background** vs **objects of interest** (here, *boxes* and *wheels*). + +FOMO is an innovative machine learning model for object detection, which can use up to 30 times less energy and memory than traditional models like Mobilenet SSD and YOLOv5. FOMO can operate on microcontrollers with less than 200 KB of RAM. The main reason this is possible is that while other models calculate the object's size by drawing a square around it (bounding box), FOMO ignores the size of the image, providing only the information about where the object is located in the image, by means of its centroid coordinates. + +**How FOMO works?** + +FOMO takes the image in grayscale and divides it into blocks of pixels using a factor of 8. For the input of 96x96, the grid would be 12x12 (96/8=12). Next, FOMO will run a classifier through each pixel block to calculate the probability that there is a box or a wheel in each of them and, subsequently, determine the regions which have the highest probability of containing the object (If a pixel block has no objects, it will be classified as *background*). From the overlap of the final region, the FOMO provides the coordinates (related to the image dimensions) of the centroid of this region. + +![](Images_4_1/img_17.png) + +For training, we should select a pre-trained model. Let's use the **`FOMO (Faster Objects, More Objects) MobileNetV2 0.35`\`.** This model uses around 250KB RAM and 80KB of ROM (Flash), which suits well with our board since it has 1MB of RAM and ROM. + +![](Images_4_1/img_18.png) + +Regarding the training hyper-parameters, the model will be trained with: + +- Epochs: 60, +- Batch size: 32 +- Learning Rate: 0.001. + +For validation during training, 20% of the dataset (*validation_dataset*) will be spared. For the remaining 80% (*train_dataset*), we will apply Data Augmentation, which will randomly flip, change the size and brightness of the image, and crop them, artificially increasing the number of samples on the dataset for training. + +As a result, the model ends with practically 1.00 in the F1 score, with a similar result when using the Test data. + +> Note that FOMO automatically added a 3rd label background to the two previously defined (*box* and *wheel*). + +![](Images_4_1/img_19.png) + +> In object detection tasks, accuracy is generally not the primary [evaluation metric](https://learnopencv.com/mean-average-precision-map-object-detection-model-evaluation-metric/). Object detection involves classifying objects and providing bounding boxes around them, making it a more complex problem than simple classification. The issue is that we do not have the bounding box, only the centroids. In short, using accuracy as a metric could be misleading and may not provide a complete understanding of how well the model is performing. Because of that, we will use the F1 score. + +### Test model with "Live Classification" + +Since Edge Impulse officially supports the Nicla Vision, let's connect it to the Studio. For that, follow the steps: + +- Download the [last EI Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. + +- Open the zip file on your computer and select the uploader related to your OS: + +![](images_2/media/image17.png){fig-align="center" width="4.416666666666667in"} + +- Put the Nicla-Vision on Boot Mode, pressing the reset button twice. + +- Execute the specific batch code for your OS for uploading the binary (`arduino-nicla-vision.bin`) to your board. + +Go to `Live classification` section at EI Studio, and using *webUSB,* connect your Nicla Vision: + +![](Images_4_1/img_20.png) + +Once connected, you can use the Nicla to capture actual images to be tested by the trained model on Edge Impulse Studio. + +![](Images_4_1/img_21.png) + +One thing to be noted is that the model can produce false positives and negatives. This can be minimized by defining a proper `Confidence Threshold` (use the `Three dots` menu for the set-up). Try with 0.8 or more. + +## Deploying the Model + +Select OpenMV Firmware on the Deploy Tab and press \[Build\]. + +![](Images_4_1/img_22.png) + +When you try to connect the Nicla with the OpenMV IDE again, it will try to update its FW. Choose the option `Load a specific firmware` instead. + +![](Images_4_1/img_24.png){fig-align="center"} + +You will find a ZIP file on your computer from the Studio. Open it: + +![](Images_4_1/img_23.png)Load the .bin file to your board: + +![](Images_4_1/img_25.png) + +After the download is finished, a pop-up message will be displayed. `Press OK`, and open the script **ei_object_detection.py** downloaded from the Studio. + +Before running the script, let's change a few lines. Note that you can leave the window definition as 240 x 240 and the camera capturing images as QVGA/RGB. The captured image will be pre-processed by the FW deployed from Edge Impulse + +``` python +# Edge Impulse - OpenMV Object Detection Example + +import sensor, image, time, os, tf, math, uos, gc + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None +``` + +Redefine the minimum confidence, for example, to 0.8 to minimize false positives and negatives. + +``` python +min_confidence = 0.8 +``` + +Change if necessary, the color of the circles that will be used to display the detected object's centroid for a better contrast. + +``` python +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +colors = [ # Add more colors if you are detecting more than 7 types of classes at once. + (255, 255, 0), # background: yellow (not used) + ( 0, 255, 0), # cube: green + (255, 0, 0), # wheel: red + ( 0, 0, 255), # not used + (255, 0, 255), # not used + ( 0, 255, 255), # not used + (255, 255, 255), # not used +] +``` + +Keep the remaining code as it is and press the `green Play button` to run the code: + +![](Images_4_1/img_26.png) + +On the camera view, we can see the objects with their centroids marked with 12 pixel-fixed circles (each circle has a distinct color, depending on its class). On the Serial Terminal, the model shows the labels detected and their position on the image window (240X240). + +> Be ware that the coordinate origin is in the upper left corner. + +![](Images_4_1/img_27.png) + +Note that the frames per second rate is around 8 fps (similar to what we got with the Image Classification project). This happens because FOMO is cleverly built over a CNN model, not with an object detection model like the SSD MobileNet. For example, when running a MobileNetV2 SSD FPN-Lite 320x320 model on a Raspberry Pi 4, the latency is around 5 times higher (around 1.5 fps) + +Here is a short video showing the inference results: {{< video https://youtu.be/JbpoqRp3BbM width="480" height="270" center >}} + +## Conclusion + +FOMO is a significant leap in the image processing space, as Louis Moreau and Mat Kelcey put it during its launch in 2022: + +> FOMO is a ground-breaking algorithm that brings real-time object detection, tracking, and counting to microcontrollers for the first time. + +Multiple possibilities exist for exploring object detection (and, more precisely, counting them) on embedded devices, for example, to explore the Nicla doing sensor fusion (camera + microphone) and object detection. This can be very useful on projects involving bees, for example. + +![](Images_4_1/img_28.png)