diff --git a/_quarto.yml b/_quarto.yml index 4b23c5ed..359a2b56 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -76,8 +76,11 @@ book: - part: EXERCISES chapters: - - embedded_sys_exercise.qmd - - embedded_ml_exercise.qmd + - niclav_sys.qmd + - image_classification.qmd + - object_detection_fomo.qmd + - kws_feature_eng.qmd + - kws_nicla.qmd references: references.qmd diff --git a/embedded_ml.qmd b/embedded_ml.qmd index 70ad6b02..db4d2374 100644 --- a/embedded_ml.qmd +++ b/embedded_ml.qmd @@ -258,6 +258,6 @@ Now would be a great time for you to try out a small computer vision model out o If you want to play with an embedded system, try out the Nicla Vision -[Computer Vision](./embedded_ml_exercise.qmd) +[Computer Vision](./image_classification.qmd) ::: \ No newline at end of file diff --git a/embedded_ml_exercise.qmd b/embedded_ml_exercise.qmd deleted file mode 100644 index dec56a5b..00000000 --- a/embedded_ml_exercise.qmd +++ /dev/null @@ -1,727 +0,0 @@ -# CV on Nicla Vision {.unnumbered} - -As we initiate our studies into embedded machine learning or tinyML, -it\'s impossible to overlook the transformative impact of Computer -Vision (CV) and Artificial Intelligence (AI) in our lives. These two -intertwined disciplines redefine what machines can perceive and -accomplish, from autonomous vehicles and robotics to healthcare and -surveillance. - -More and more, we are facing an artificial intelligence (AI) revolution -where, as stated by Gartner, **Edge AI** has a very high impact -potential, and **it is for now**! - -![](images_4/media/image2.jpg){width="4.729166666666667in" -height="4.895833333333333in"} - -In the \"bull-eye\" of emerging technologies, radar is the *Edge -Computer Vision*, and when we talk about Machine Learning (ML) applied -to vision, the first thing that comes to mind is **Image -Classification**, a kind of ML \"Hello World\"! - -This exercise will explore a computer vision project utilizing -Convolutional Neural Networks (CNNs) for real-time image classification. -Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a -pre-trained MobileNet model and adapt it for edge deployment. The focus -will be optimizing the model to run efficiently on resource-constrained -hardware without sacrificing accuracy. - -We\'ll employ techniques like quantization and pruning to reduce the -computational load. By the end of this tutorial, you\'ll have a working -prototype capable of classifying images in real time, all running on a -low-power embedded system based on the Arduino Nicla Vision board. - -## Computer Vision - -At its core, computer vision aims to enable machines to interpret and -make decisions based on visual data from the world---essentially -mimicking the capability of the human optical system. Conversely, AI is -a broader field encompassing machine learning, natural language -processing, and robotics, among other technologies. When you bring AI -algorithms into computer vision projects, you supercharge the system\'s -ability to understand, interpret, and react to visual stimuli. - -When discussing Computer Vision projects applied to embedded devices, -the most common applications that come to mind are *Image -Classification* and *Object Detection*. - -![](images_4/media/image15.jpg){width="6.5in" -height="2.8333333333333335in"} - -Both models can be implemented on tiny devices like the Arduino Nicla -Vision and used on real projects. Let\'s start with the first one. - -## Image Classification Project - -The first step in any ML project is to define our goal. In this case, it -is to detect and classify two specific objects present in one image. For -this project, we will use two small toys: a *robot* and a small -Brazilian parrot (named *Periquito*). Also, we will collect images of a -*background* where those two objects are absent. - -![](images_4/media/image36.jpg){width="6.5in" -height="3.638888888888889in"} - -## Data Collection - -Once you have defined your Machine Learning project goal, the next and -most crucial step is the dataset collection. You can use the Edge -Impulse Studio, the OpenMV IDE we installed, or even your phone for the -image capture. Here, we will use the OpenMV IDE for that. - -**Collecting Dataset with OpenMV IDE** - -First, create in your computer a folder where your data will be saved, -for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset -Editor and select New Dataset to start the dataset collection: - -![](images_4/media/image29.png){width="6.291666666666667in" -height="4.010416666666667in"} - -The IDE will ask you to open the file where your data will be saved and -choose the \"data\" folder that was created. Note that new icons will -appear on the Left panel. - -![](images_4/media/image46.png){width="0.9583333333333334in" -height="1.5208333333333333in"} - -Using the upper icon (1), enter with the first class name, for example, -\"periquito\": - -![](images_4/media/image22.png){width="3.25in" -height="2.65625in"} - -Run the dataset_capture_script.py, and clicking on the bottom icon (2), -will start capturing images: - -![](images_4/media/image43.png){width="6.5in" -height="4.041666666666667in"} - -Repeat the same procedure with the other classes - -![](images_4/media/image6.jpg){width="6.5in" -height="3.0972222222222223in"} - -> *We suggest around 60 images from each category. Try to capture -> different angles, backgrounds, and light conditions.* - -The stored images use a QVGA frame size 320x240 and RGB565 (color pixel -format). - -After capturing your dataset, close the Dataset Editor Tool on the Tools -\> Dataset Editor. - -On your computer, you will end with a dataset that contains three -classes: periquito, robot, and background. - -![](images_4/media/image20.png){width="6.5in" -height="2.2083333333333335in"} - -You should return to Edge Impulse Studio and upload the dataset to your -project. - -## Training the model with Edge Impulse Studio - -We will use the Edge Impulse Studio for training our model. Enter your -account credentials at Edge Impulse and create a new project: - -![](images_4/media/image45.png){width="6.5in" -height="4.263888888888889in"} - -> *Here, you can clone a similar project:* -> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).* - -## Dataset - -Using the EI Studio (or *Studio*), we will pass over four main steps to -have our model ready for use on the Nicla Vision board: Dataset, -Impulse, Tests, and Deploy (on the Edge Device, in this case, the -NiclaV). - -![](images_4/media/image41.jpg){width="6.5in" -height="4.194444444444445in"} - -Regarding the Dataset, it is essential to point out that our Original -Dataset, captured with the OpenMV IDE, will be split into three parts: -Training, Validation, and Test. The Test Set will be divided from the -beginning and left a part to be used only in the Test phase after -training. The Validation Set will be used during training. - -![](images_4/media/image7.jpg){width="6.5in" -height="4.763888888888889in"} - -On Studio, go to the Data acquisition tab, and on the UPLOAD DATA -section, upload from your computer the files from chosen categories: - -![](images_4/media/image39.png){width="6.5in" -height="4.263888888888889in"} - -Left to the Studio to automatically split the original dataset into -training and test and choose the label related to that specific data: - -![](images_4/media/image30.png){width="6.5in" -height="4.263888888888889in"} - -Repeat the procedure for all three classes. At the end, you should see -your \"raw data in the Studio: - -![](images_4/media/image11.png){width="6.5in" -height="4.263888888888889in"} - -The Studio allows you to explore your data, showing a complete view of -all the data in your project. You can clear, inspect, or change labels -by clicking on individual data items. In our case, a simple project, the -data seems OK. - -![](images_4/media/image44.png){width="6.5in" -height="4.263888888888889in"} - -## The Impulse Design - -In this phase, we should define how to: - -- Pre-process our data, which consists of resizing the individual - > images and determining the color depth to use (RGB or Grayscale) - > and - -- Design a Model that will be \"Transfer Learning (Images)\" to - > fine-tune a pre-trained MobileNet V2 image classification model on - > our data. This method performs well even with relatively small - > image datasets (around 150 images in our case). - -![](images_4/media/image23.jpg){width="6.5in" -height="4.0in"} - -Transfer Learning with MobileNet offers a streamlined approach to model -training, which is especially beneficial for resource-constrained -environments and projects with limited labeled data. MobileNet, known -for its lightweight architecture, is a pre-trained model that has -already learned valuable features from a large dataset (ImageNet). - -![](images_4/media/image9.jpg){width="6.5in" -height="1.9305555555555556in"} - -By leveraging these learned features, you can train a new model for your -specific task with fewer data and computational resources yet achieve -competitive accuracy. - -![](images_4/media/image32.jpg){width="6.5in" -height="2.3055555555555554in"} - -This approach significantly reduces training time and computational -cost, making it ideal for quick prototyping and deployment on embedded -devices where efficiency is paramount. - -Go to the Impulse Design Tab and create the *impulse*, defining an image -size of 96x96 and squashing them (squared form, without crop). Select -Image and Transfer Learning blocks. Save the Impulse. - -![](images_4/media/image16.png){width="6.5in" -height="4.263888888888889in"} - -### **Image Pre-Processing** - -All input QVGA/RGB565 images will be converted to 27,640 features -(96x96x3). - -![](images_4/media/image17.png){width="6.5in" -height="4.319444444444445in"} - -Press \[Save parameters\] and Generate all features: - -![](images_4/media/image5.png){width="6.5in" -height="4.263888888888889in"} - -## Model Design - -In 2007, Google introduced -[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), -a family of general-purpose computer vision neural networks designed -with mobile devices in mind to support classification, detection, and -more. MobileNets are small, low-latency, low-power models parameterized -to meet the resource constraints of various use cases. in 2018, Google -launched [MobileNetV2: Inverted Residuals and Linear -Bottlenecks](https://arxiv.org/abs/1801.04381). - -MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded -vision applications but differ in architectural complexity and -performance. While both use depthwise separable convolutions to reduce -the computational cost, MobileNet V2 introduces Inverted Residual Blocks -and Linear Bottlenecks to enhance performance. These new features allow -V2 to capture more complex features using fewer parameters, making it -computationally more efficient and generally more accurate than its -predecessor. Additionally, V2 employs a non-linear activation in the -intermediate expansion layer. Still, it uses a linear activation for the -bottleneck layer, a design choice found to preserve important -information through the network better. MobileNet V2 offers a more -optimized architecture for higher accuracy and efficiency and will be -used in this project. - -Although the base MobileNet architecture is already tiny and has low -latency, many times, a specific use case or application may require the -model to be smaller and faster. MobileNets introduces a straightforward -parameter α (alpha) called width multiplier to construct these smaller, -less computationally expensive models. The role of the width multiplier -α is to thin a network uniformly at each layer. - -Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2 -(96x96 and 160x160 images), with several different **α** values (from -0.05 to 1.0). For example, you will get the highest accuracy with V2, -160x160 images, and α=1.0. Of course, there is a trade-off. The higher -the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be -needed to run the model, implying more latency. The smaller footprint -will be obtained at another extreme with MobileNetV1 and α=0.10 (around -53.2K RAM and 101K ROM). - -![](images_4/media/image27.jpg){width="6.5in" -height="3.5277777777777777in"} - -For this project, we will use **MobileNetV2 96x96 0.1**, which estimates -a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla -Vision with 1MB of SRAM. On the Transfer Learning Tab, select this -model: - -![](images_4/media/image24.png){width="6.5in" -height="4.263888888888889in"} - -Another necessary technique to be used with Deep Learning is **Data -Augmentation**. Data augmentation is a method that can help improve the -accuracy of machine learning models, creating additional artificial -data. A data augmentation system makes small, random changes to your -training data during the training process (such as flipping, cropping, -or rotating the images). - -Under the rood, here you can see how Edge Impulse implements a data -Augmentation policy on your data: - -```python -# Implements the data augmentation policy -def augment_image(image, label): - # Flips the image randomly - image = tf.image.random_flip_left_right(image) - - # Increase the image size, then randomly crop it down to - # the original dimensions - resize_factor = random.uniform(1, 1.2) - new_height = math.floor(resize_factor * INPUT_SHAPE[0]) - new_width = math.floor(resize_factor * INPUT_SHAPE[1]) - image = tf.image.resize_with_crop_or_pad(image, new_height, new_width) - image = tf.image.random_crop(image, size=INPUT_SHAPE) - - # Vary the brightness of the image - image = tf.image.random_brightness(image, max_delta=0.2) - - return image, label - -``` -Exposure to these variations during training can help prevent your model -from taking shortcuts by \"memorizing\" superficial clues in your -training data, meaning it may better reflect the deep underlying -patterns in your dataset. - -The final layer of our model will have 12 neurons with a 15% dropout for -overfitting prevention. Here is the Training result: - -![](images_4/media/image31.jpg){width="6.5in" -height="3.5in"} - -The result is excellent, with 77ms of latency, which should result in -13fps (frames per second) during inference. - -## Model Testing - -![](images_4/media/image10.jpg){width="6.5in" -height="3.8472222222222223in"} - -Now, you should take the data put apart at the start of the project and -run the trained model having them as input: - -![](images_4/media/image34.png){width="3.1041666666666665in" -height="1.7083333333333333in"} - -The result was, again, excellent. - -![](images_4/media/image12.png){width="6.5in" -height="4.263888888888889in"} - -## Deploying the model - -At this point, we can deploy the trained model as.tflite and use the -OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ -or an Arduino library. - -![](images_4/media/image28.jpg){width="6.5in" -height="3.763888888888889in"} - -**Arduino Library** - -First, Let\'s deploy it as an Arduino Library: - -![](images_4/media/image48.png){width="6.5in" -height="4.263888888888889in"} - -You should install the library as.zip on the Arduino IDE and run the -sketch nicla_vision_camera.ino available in Examples under your library -name. - -> *Note that Arduino Nicla Vision has, by default, 512KB of RAM -> allocated for the M7 core and an additional 244KB on the M4 address -> space. In the code, this allocation was changed to 288 kB to guarantee -> that the model will run on the device -> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).* - -The result was good, with 86ms of measured latency. - -![](images_4/media/image25.jpg){width="6.5in" -height="3.4444444444444446in"} - -Here is a short video showing the inference results: -[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o) - -**OpenMV** - -It is possible to deploy the trained model to be used with OpenMV in two -ways: as a library and as a firmware. - -Three files are generated as a library: the.tflite model, a list with -the labels, and a simple MicroPython script that can make inferences -using the model. - -![](images_4/media/image26.png){width="6.5in" -height="1.0in"} - -Running this model as a.tflite directly in the Nicla was impossible. So, -we can sacrifice the accuracy using a smaller model or deploy the model -as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates -optimized models, libraries, and frameworks needed to make the -inference. Let\'s explore this last one. - -Select OpenMV Firmware on the Deploy Tab and press \[Build\]. - -![](images_4/media/image3.png){width="6.5in" -height="4.263888888888889in"} - -On your computer, you will find a ZIP file. Open it: - -![](images_4/media/image33.png){width="6.5in" height="2.625in"} - -Use the Bootloader tool on the OpenMV IDE to load the FW on your board: - -![](images_4/media/image35.jpg){width="6.5in" height="3.625in"} - -Select the appropriate file (.bin for Nicla-Vision): - -![](images_4/media/image8.png){width="6.5in" height="1.9722222222222223in"} - -After the download is finished, press OK: - -![DFU firmware update complete!.png](images_4/media/image40.png){width="3.875in" height="5.708333333333333in"} - -If a message says that the FW is outdated, DO NOT UPGRADE. Select -\[NO\]. - -![](images_4/media/image42.png){width="4.572916666666667in" -height="2.875in"} - -Now, open the script **ei_image_classification.py** that was downloaded -from the Studio and the.bin file for the Nicla. - -![](images_4/media/image14.png){width="6.5in" -height="4.0in"} - -And run it. Pointing the camera to the objects we want to classify, the -inference result will be displayed on the Serial Terminal. - -![](images_4/media/image37.png){width="6.5in" -height="3.736111111111111in"} - -**Changing Code to add labels:** - -The code provided by Edge Impulse can be modified so that we can see, -for test reasons, the inference result directly on the image displayed -on the OpenMV IDE. - -[[Upload the code from -GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) -or modify it as below: - -```python -# Marcelo Rovai - NICLA Vision - Image Classification -# Adapted from Edge Impulse - OpenMV Image Classification Example -# @24Aug23 - -import sensor, image, time, os, tf, uos, gc - -sensor.reset() # Reset and initialize the sensor. -sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE) -sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) -sensor.set_windowing((240, 240)) # Set 240x240 window. -sensor.skip_frames(time=2000) # Let the camera adjust. - -net = None -labels = None - -try: - # Load built in model - labels, net = tf.load_builtin_model('trained') -except Exception as e: - raise Exception(e) - -clock = time.clock() -while(True): - clock.tick() # Starts tracking elapsed time. - - img = sensor.snapshot() - - # default settings just do one detection - for obj in net.classify(img, - min_scale=1.0, - scale_mul=0.8, - x_overlap=0.5, - y_overlap=0.5): - fps = clock.fps() - lat = clock.avg() - - print("**********\nPrediction:") - img.draw_rectangle(obj.rect()) - # This combines the labels and confidence values into a list of tuples - predictions_list = list(zip(labels, obj.output())) - - max_val = predictions_list[0][1] - max_lbl = 'background' - for i in range(len(predictions_list)): - val = predictions_list[i][1] - lbl = predictions_list[i][0] - - if val > max_val: - max_val = val - max_lbl = lbl - - # Print label with the highest probability - if max_val < 0.5: - max_lbl = 'uncertain' - print("{} with a prob of {:.2f}".format(max_lbl, max_val)) - print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) - - # Draw label with highest probability to image viewer - img.draw_string( - 10, 10, - max_lbl + "\n{:.2f}".format(max_val), - mono_space = False, - scale=2 - ) - -``` - -Here you can see the result: - -![](images_4/media/image47.jpg){width="6.5in" -height="2.9444444444444446in"} - -Note that the latency (136 ms) is almost double what we got directly -with the Arduino IDE. This is because we are using the IDE as an -interface and the time to wait for the camera to be ready. If we start -the clock just before the inference: - -![](images_4/media/image13.jpg){width="6.5in" -height="2.0972222222222223in"} - -The latency will drop to only 71 ms. - -![](images_4/media/image1.jpg){width="3.5520833333333335in" -height="1.53125in"} - -> *The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected.* - -### **Post-Processing with LEDs** - -When working with embedded machine learning, we are looking for devices -that can continually proceed with the inference and result, taking some -action directly on the physical world and not displaying the result on a -connected computer. To simulate this, we will define one LED to light up -for each one of the possible inference results. - -![](images_4/media/image38.jpg){width="6.5in" -height="3.236111111111111in"} - -For that, we should [[upload the code from -GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) -or change the last code to include the LEDs: - -```python -# Marcelo Rovai - NICLA Vision - Image Classification with LEDs -# Adapted from Edge Impulse - OpenMV Image Classification Example -# @24Aug23 - -import sensor, image, time, os, tf, uos, gc, pyb - -ledRed = pyb.LED(1) -ledGre = pyb.LED(2) -ledBlu = pyb.LED(3) - -sensor.reset() # Reset and initialize the sensor. -sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE) -sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) -sensor.set_windowing((240, 240)) # Set 240x240 window. -sensor.skip_frames(time=2000) # Let the camera adjust. - -net = None -labels = None - -ledRed.off() -ledGre.off() -ledBlu.off() - -try: - # Load built in model - labels, net = tf.load_builtin_model('trained') -except Exception as e: - raise Exception(e) - -clock = time.clock() - - -def setLEDs(max_lbl): - - if max_lbl == 'uncertain': - ledRed.on() - ledGre.off() - ledBlu.off() - - if max_lbl == 'periquito': - ledRed.off() - ledGre.on() - ledBlu.off() - - if max_lbl == 'robot': - ledRed.off() - ledGre.off() - ledBlu.on() - - if max_lbl == 'background': - ledRed.off() - ledGre.off() - ledBlu.off() - - -while(True): - img = sensor.snapshot() - clock.tick() # Starts tracking elapsed time. - - # default settings just do one detection. - for obj in net.classify(img, - min_scale=1.0, - scale_mul=0.8, - x_overlap=0.5, - y_overlap=0.5): - fps = clock.fps() - lat = clock.avg() - - print("**********\nPrediction:") - img.draw_rectangle(obj.rect()) - # This combines the labels and confidence values into a list of tuples - predictions_list = list(zip(labels, obj.output())) - - max_val = predictions_list[0][1] - max_lbl = 'background' - for i in range(len(predictions_list)): - val = predictions_list[i][1] - lbl = predictions_list[i][0] - - if val > max_val: - max_val = val - max_lbl = lbl - - # Print label and turn on LED with the highest probability - if max_val < 0.8: - max_lbl = 'uncertain' - - setLEDs(max_lbl) - - print("{} with a prob of {:.2f}".format(max_lbl, max_val)) - print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) - - # Draw label with highest probability to image viewer - img.draw_string( - 10, 10, - max_lbl + "\n{:.2f}".format(max_val), - mono_space = False, - scale=2 - ) - -``` - -Now, each time that a class gets a result superior of 0.8, the -correspondent LED will be light on as below: - -- Led Red 0n: Uncertain (no one class is over 0.8) - -- Led Green 0n: Periquito \> 0.8 - -- Led Blue 0n: Robot \> 0.8 - -- All LEDs Off: Background \> 0.8 - -Here is the result: - -![](images_4/media/image18.jpg){width="6.5in" -height="3.6527777777777777in"} - -In more detail - -![](images_4/media/image21.jpg){width="6.5in" -height="2.0972222222222223in"} - -### **Image Classification (non-official) Benchmark** - -Several development boards can be used for embedded machine learning -(tinyML), and the most common ones for Computer Vision applications -(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the -Arduinos Nicla Vison, and Portenta. - -![](images_4/media/image19.jpg){width="6.5in" -height="4.194444444444445in"} - -Using the opportunity, the same trained model was deployed on the -ESP-CAM, the XIAO, and Portenta (in this one, the model was trained -again, using grayscaled images to be compatible with its camera. Here is -the result, deploying the models as Arduino\'s Library: - -![](images_4/media/image4.jpg){width="6.5in" -height="3.4444444444444446in"} - -## Conclusion - -Before we finish, consider that Computer Vision is more than just image -classification. For example, you can develop Edge Machine Learning -projects around vision in several areas, such as: - -- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer - > vision algorithms to navigate and make decisions. - -- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, - > and CT scan image analysis - -- **Retail**: Automated checkout systems that identify products as - > they pass through a scanner. - -- **Security and Surveillance**: Facial recognition, anomaly - > detection, and object tracking in real-time video feeds. - -- **Augmented Reality**: Object detection and classification to - > overlay digital information in the real world. - -- **Industrial Automation**: Visual inspection of products, predictive - > maintenance, and robot and drone guidance. - -- **Agriculture**: Drone-based crop monitoring and automated - > harvesting. - -- **Natural Language Processing**: Image captioning and visual - > question answering. - -- **Gesture Recognition**: For gaming, sign language translation, and - > human-machine interaction. - -- **Content Recommendation**: Image-based recommendation systems in - > e-commerce. diff --git a/embedded_sys.qmd b/embedded_sys.qmd index da6632a4..f7f60ec2 100644 --- a/embedded_sys.qmd +++ b/embedded_sys.qmd @@ -387,6 +387,6 @@ Now would be a great time for you to get your hands on a real embedded device, a If you want to play with an embedded system, try out the Nicla Vision -[Setup Nicla Vision](./embedded_sys_exercise.qmd) +[Setup Nicla Vision](./niclav_sys.qmd) ::: \ No newline at end of file diff --git a/embedded_sys_exercise.qmd b/embedded_sys_exercise.qmd deleted file mode 100644 index 2fc19cab..00000000 --- a/embedded_sys_exercise.qmd +++ /dev/null @@ -1,482 +0,0 @@ -# Setup Nicla Vision {.unnumbered} - -The [Arduino Nicla -Vision](https://docs.arduino.cc/hardware/nicla-vision) (sometimes called -*NiclaV*) is a development board that includes two processors that can -run tasks in parallel. It is part of a family of development boards with -the same form factor but designed for specific tasks, such as the [Nicla -Sense -ME](https://www.bosch-sensortec.com/software-tools/tools/arduino-nicla-sense-me/) -and the [Nicla -Voice](https://store-usa.arduino.cc/products/nicla-voice?_gl=1*l3abc6*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY5NjM0Mzk1My4xMDIuMS4xNjk2MzQ0MjQ1LjAuMC4w). -The *Niclas* can efficiently run processes created with TensorFlow™ -Lite. For example, one of the cores of the NiclaV computing a computer -vision algorithm on the fly (inference), while the other leads with -low-level operations like controlling a motor and communicating or -acting as a user interface. - -> *The onboard wireless module allows the management of WiFi and -> Bluetooth Low Energy (BLE) connectivity simultaneously.* - -![](images_2/media/image29.jpg){width="6.5in" -height="3.861111111111111in"} - -## Two Parallel Cores - -The central processor is the dual-core -[STM32H747,](https://content.arduino.cc/assets/Arduino-Portenta-H7_Datasheet_stm32h747xi.pdf?_gl=1*6quciu*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDYzMzkuMA..) -including a Cortex® M7 at 480 MHz and a Cortex® M4 at 240 MHz. The two -cores communicate via a Remote Procedure Call mechanism that seamlessly -allows calling functions on the other processor. Both processors share -all the on-chip peripherals and can run: - -- Arduino sketches on top of the Arm® Mbed™ OS - -- Native Mbed™ applications - -- MicroPython / JavaScript via an interpreter - -- TensorFlow™ Lite - -![](images_2/media/image22.jpg){width="5.78125in" -height="5.78125in"} - -## Memory - -Memory is crucial for embedded machine learning projects. The NiclaV -board can host up to 16 MB of QSPI Flash for storage. However, it is -essential to consider that the MCU SRAM is the one to be used with -machine learning inferences; the STM32H747 is only 1MB, shared by both -processors. This MCU also has incorporated 2MB of FLASH, mainly for code -storage. - -## Sensors - -- **Camera**: A GC2145 2 MP Color CMOS Camera. - -- **Microphone**: A - > [MP34DT05,](https://content.arduino.cc/assets/Nano_BLE_Sense_mp34dt05-a.pdf?_gl=1*12fxus9*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDc3NzMuMA..) - > an ultra-compact, low-power, omnidirectional, digital MEMS - > microphone built with a capacitive sensing element and an IC - > interface. - -- **6-Axis IMU**: 3D gyroscope and 3D accelerometer data from the - > LSM6DSOX 6-axis IMU. - -- **Time of Flight Sensor**: The VL53L1CBV0FY Time-of-Flight sensor - > adds accurate and low power-ranging capabilities to the Nicla - > Vision. The invisible near-infrared VCSEL laser (including the - > analog driver) is encapsulated with receiving optics in an - > all-in-one small module below the camera. - -### **HW Installation (Arduino IDE)** - -Start connecting the board (USB-C) to your computer : - -![](images_2/media/image14.jpg){width="6.5in" -height="3.0833333333333335in"} - -Install the Mbed OS core for Nicla boards in the Arduino IDE. Having the -IDE open, navigate to Tools \> Board \> Board Manager, look for Arduino -Nicla Vision on the search window, and install the board. - -![](images_2/media/image2.jpg){width="6.5in" -height="2.7083333333333335in"} - -Next, go to Tools \> Board \> Arduino Mbed OS Nicla Boards and select -Arduino Nicla Vision. Having your board connected to the USB, you should -see the Nicla on Port and select it. - -> *Open the Blink sketch on Examples/Basic and run it using the IDE -> Upload button. You should see the Built-in LED (green RGB) blinking, -> which means the Nicla board is correctly installed and functional!* - -## Testing the Microphone - -On Arduino IDE, go to Examples \> PDM \> PDMSerialPlotter, open and run -the sketch. Open the Plotter and see the audio representation from the -microphone: - -![](images_2/media/image9.png){width="6.5in" -height="4.361111111111111in"} - -> *Vary the frequency of the sound you generate and confirm that the mic -> is working correctly.* - -## Testing the IMU - -Before testing the IMU, it will be necessary to install the LSM6DSOX -library. For that, go to Library Manager and look for LSM6DSOX. Install -the library provided by Arduino: - -![](images_2/media/image19.jpg){width="6.5in" -height="2.4027777777777777in"} - -Next, go to Examples \> Arduino_LSM6DSOX \> SimpleAccelerometer and run -the accelerometer test (you can also run Gyro and board temperature): - -![](images_2/media/image28.png){width="6.5in" -height="4.361111111111111in"} - -### **Testing the ToF (Time of Flight) Sensor** - -As we did with IMU, installing the ToF library, the VL53L1X is -necessary. For that, go to Library Manager and look for VL53L1X. Install -the library provided by Pololu: - -![](images_2/media/image15.jpg){width="6.5in" -height="2.4583333333333335in"} - -Next, run the sketch -[proximity_detection.ino](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/distance_image_meter.py): - -![](images_2/media/image12.png){width="4.947916666666667in" -height="4.635416666666667in"} - -On the Serial Monitor, you will see the distance from the camera and an -object in front of it (max of 4m). - -![](images_2/media/image13.jpg){width="6.5in" -height="4.847222222222222in"} - -## Testing the Camera - -We can also test the camera using, for example, the code provided on -Examples \> Camera \> CameraCaptureRawBytes. We can not see the image -directly, but it is possible to get the raw image data generated by the -camera. - -Anyway, the best test with the camera is to see a live image. For that, -we will use another IDE, the OpenMV. - -## Installing the OpenMV IDE - -OpenMV IDE is the premier integrated development environment for use -with OpenMV Cameras and the one on the Portenta. It features a powerful -text editor, debug terminal, and frame buffer viewer with a histogram -display. We will use MicroPython to program the camera. - -Go to the [OpenMV IDE page](https://openmv.io/pages/download), download -the correct version for your Operating System, and follow the -instructions for its installation on your computer. - -![](images_2/media/image21.png){width="6.5in" -height="4.791666666666667in"} - -The IDE should open, defaulting the helloworld_1.py code on its Code -Area. If not, you can open it from Files \> Examples \> HelloWord \> -helloword.py - -![](images_2/media/image7.png){width="6.5in" -height="4.444444444444445in"} - -Any messages sent through a serial connection (using print() or error -messages) will be displayed on the **Serial Terminal** during run time. -The image captured by a camera will be displayed in the **Camera -Viewer** Area (or Frame Buffer) and in the Histogram area, immediately -below the Camera Viewer. - -OpenMV IDE is the premier integrated development environment with OpenMV -Cameras and the Arduino Pro boards. It features a powerful text editor, -debug terminal, and frame buffer viewer with a histogram display. We -will use MicroPython to program the Nicla Vision. - -> *Before connecting the Nicla to the OpenMV IDE, ensure you have the -> latest bootloader version. To that, go to your Arduino IDE, select the -> Nicla board, and open the sketch on Examples \> STM_32H747_System -> STM_32H747_updateBootloader. Upload the code to your board. The Serial -> Monitor will guide you.* - -After updating the bootloader, put the Nicla Vision in bootloader mode -by double-pressing the reset button on the board. The built-in green LED -will start fading in and out. Now return to the OpenMV IDE and click on -the connect icon (Left ToolBar): - -![](images_2/media/image23.jpg){width="4.010416666666667in" -height="1.0520833333333333in"} - -A pop-up will tell you that a board in DFU mode was detected and ask you -how you would like to proceed. First, select \"Install the latest -release firmware.\" This action will install the latest OpenMV firmware -on the Nicla Vision. - -![](images_2/media/image10.png){width="6.5in" -height="2.6805555555555554in"} - -You can leave the option of erasing the internal file system unselected -and click \[OK\]. - -Nicla\'s green LED will start flashing while the OpenMV firmware is -uploaded to the board, and a terminal window will then open, showing the -flashing progress. - -![](images_2/media/image5.png){width="4.854166666666667in" -height="3.5416666666666665in"} - -Wait until the green LED stops flashing and fading. When the process -ends, you will see a message saying, \"DFU firmware update complete!\". -Press \[OK\]. - -![](images_2/media/image1.png){width="3.875in" -height="5.708333333333333in"} - -A green play button appears when the Nicla Vison connects to the Tool -Bar. - -![](images_2/media/image18.jpg){width="4.791666666666667in" -height="1.4791666666666667in"} - -Also, note that a drive named "NO NAME" will appear on your computer.: - -![](images_2/media/image3.png){width="6.447916666666667in" -height="2.4166666666666665in"} - -Every time you press the \[RESET\] button on the board, it automatically -executes the main.py script stored on it. You can load the -[main.py](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/main.py) -code on the IDE (File \> Open File\...). - -![](images_2/media/image16.png){width="4.239583333333333in" -height="3.8229166666666665in"} - -> *This code is the \"Blink\" code, confirming that the HW is OK.* - -For testing the camera, let\'s run helloword_1.py. For that, select the -script on File \> Examples \> HelloWorld \> helloword.py, - -When clicking the green play button, the MicroPython script -(hellowolrd.py) on the Code Area will be uploaded and run on the Nicla -Vision. On-Camera Viewer, you will start to see the video streaming. The -Serial Monitor will show us the FPS (Frames per second), which should be -around 14fps. - -![](images_2/media/image6.png){width="6.5in" -height="3.9722222222222223in"} - -Let\'s go through the [helloworld.py](http://helloworld.py/) script: - -```python -# Hello World Example 2 -# -# Welcome to the OpenMV IDE! Click on the green run arrow button below to run the script! - -import sensor, image, time - -sensor.reset() # Reset and initialize the sensor. -sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE) -sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) -sensor.skip_frames(time = 2000) # Wait for settings take effect. -clock = time.clock() # Create a clock object to track the FPS. - -while(True): - clock.tick() # Update the FPS clock. - img = sensor.snapshot() # Take a picture and return the image. - print(clock.fps()) -``` - - -In GitHub, you can find the Python scripts used here. - -The code can be split into two parts: - -- **Setup**: Where the libraries are imported and initialized, and the - > variables are defined and initiated. - -- **Loop**: (while loop) part of the code that runs continually. The - > image (img variable) is captured (a frame). Each of those frames - > can be used for inference in Machine Learning Applications. - -To interrupt the program execution, press the red \[X\] button. - -> *Note: OpenMV Cam runs about half as fast when connected to the IDE. -> The FPS should increase once disconnected.* - -In [[the GitHub, You can find other Python -scripts]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython). -Try to test the onboard sensors. - -## Connecting the Nicla Vision to Edge Impulse Studio - -We will use the Edge Impulse Studio later in other exercises. [Edge -Impulse I](https://www.edgeimpulse.com/)s a leading development platform -for machine learning on edge devices. - -Edge Impulse officially supports the Nicla Vision. So, for starting, -please create a new project on the Studio and connect the Nicla to it. -For that, follow the steps: - -- Download the [last EI - > Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) - > and unzip it. - -- Open the zip file on your computer and select the uploader related - > to your OS: - -![](images_2/media/image17.png){width="4.416666666666667in" -height="1.5520833333333333in"} - -- Put the Nicla-Vision on Boot Mode, pressing the reset button twice. - -- Execute the specific batch code for your OS for uploading the binary - > (arduino-nicla-vision.bin) to your board. - -Go to your project on the Studio, and on the Data Acquisition tab, -select WebUSB (1). A window will appear; choose the option that shows -that the Nicla is pared (2) and press \[Connect\] (3). - -![](images_2/media/image27.png){width="6.5in" -height="4.319444444444445in"} - -In the Collect Data section on the Data Acquisition tab, you can choose -what sensor data you will pick. - -![](images_2/media/image25.png){width="6.5in" -height="4.319444444444445in"} - -For example. IMU data: - -![](images_2/media/image8.png){width="6.5in" -height="4.319444444444445in"} - -Or Image: - -![](images_2/media/image4.png){width="6.5in" -height="4.319444444444445in"} - -And so on. You can also test an external sensor connected to the Nicla -ADC (pin 0) and the other onboard sensors, such as the microphone and -the ToF. - -### **Expanding the Nicla Vision Board (optional)** - -A last item to be explored is that sometimes, during prototyping, it is -essential to experiment with external sensors and devices, and an -excellent expansion to the Nicla is the [Arduino MKR Connector Carrier -(Grove -compatible)](https://store-usa.arduino.cc/products/arduino-mkr-connector-carrier-grove-compatible). - -The shield has 14 Grove connectors: five single analog inputs, one -single analog input, five single digital I/Os, one double digital I/O, -one I2C, and one UART. All connectors are 5V compatible. - -> *Note that besides all 17 Nicla Vision pins that will be connected to -> the Shield Groves, some Grove connections are disconnected.* - -![](images_2/media/image20.jpg){width="6.5in" -height="4.875in"} - -This shield is MKR compatible and can be used with the Nicla Vision and -the Portenta. - -![](images_2/media/image26.jpg){width="4.34375in" -height="5.78125in"} - -For example, suppose that on a TinyML project, you want to send -inference results using a LoRaWan device and add information about local -luminosity. Besides, with offline operations, a local low-power display -as an OLED display is advised. This setup can be seen here: - -![](images_2/media/image11.jpg){width="6.5in" -height="4.708333333333333in"} - -The [Grove Light -Sensor](https://wiki.seeedstudio.com/Grove-Light_Sensor/) would be -connected to one of the single Analog pins (A0/PC4), the [LoRaWan -device](https://wiki.seeedstudio.com/Grove_LoRa_E5_New_Version/) to the -UART, and the [OLED](https://arduino.cl/producto/display-oled-grove/) to -the I2C connector. - -The Nicla Pins 3 (Tx) and 4 (Rx) are connected with the Shield Serial -connector. The UART communication is used with the LoRaWan device. Here -is a simple code to use the UART.: - -```python -# UART Test - By: marcelo_rovai - Sat Sep 23 2023 - -import time -from pyb import UART -from pyb import LED - -redLED = LED(1) # built-in red LED - -# Init UART object. -# Nicla Vision's UART (TX/RX pins) is on "LP1" -uart = UART("LP1", 9600) - -while(True): - uart.write("Hello World!\r\n") - redLED.toggle() - time.sleep_ms(1000) - -``` - -To verify if the UART is working, you should, for example, connect -another device as an [Arduino -UNO](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Arduino-IDE/teste_uart_UNO/teste_uart_UNO.ino), -displaying the Hello Word. - -![](images_2/media/image24.gif){width="2.8125in" -height="3.75in"} - -Here is a Hello World code to be used with the I2C OLED. The MicroPython -SSD1306 OLED driver (ssd1306.py), created by Adafruit, should also be -uploaded to the Nicla (the -[[ssd1306.py]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/ssd1306.py) -can be found in GitHub). - - -```python -# Nicla_OLED_Hello_World - By: marcelo_rovai - Sat Sep 30 2023 - -#Save on device: MicroPython SSD1306 OLED driver, I2C and SPI interfaces created by Adafruit -import ssd1306 - -from machine import I2C -i2c = I2C(1) - -oled_width = 128 -oled_height = 64 -oled = ssd1306.SSD1306_I2C(oled_width, oled_height, i2c) - -oled.text('Hello, World', 10, 10) -oled.show() -``` - -Finally, here is a simple script to read the ADC value on pin \"PC4\" -(Nicla pin A0): -```python - -# Light Sensor (A0) - By: marcelo_rovai - Wed Oct 4 2023 - -import pyb -from time import sleep - -adc = pyb.ADC(pyb.Pin("PC4")) # create an analog object from a pin -val = adc.read() # read an analog value - -while (True): - - val = adc.read() - print ("Light={}".format (val)) - sleep (1) -``` - -The ADC can be used for other valuable sensors, such as -[Temperature](https://wiki.seeedstudio.com/Grove-Temperature_Sensor_V1.2/). - -> *Note that the above scripts ([[downloaded from -> Github]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython)) -> only introduce how to connect external devices with the Nicla Vision -> board using MicroPython.* - -## Conclusion - -The Arduino Nicla Vision is an excellent *tiny device* for industrial -and professional uses! However, it is powerful, trustworthy, low power, -and has suitable sensors for the most common embedded machine learning -applications such as vision, movement, sensor fusion, and sound. - -> *On the* *[GitHub -> repository,](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main) -> you will find the last version of all the codes used or commented on -> in this exercise.* diff --git a/image_classification.qmd b/image_classification.qmd new file mode 100644 index 00000000..b5f19ecb --- /dev/null +++ b/image_classification.qmd @@ -0,0 +1,512 @@ +# CV on Nicla Vision {.unnumbered} + +## Introduction + +As we initiate our studies into embedded machine learning or tinyML, it's impossible to overlook the transformative impact of Computer Vision (CV) and Artificial Intelligence (AI) in our lives. These two intertwined disciplines redefine what machines can perceive and accomplish, from autonomous vehicles and robotics to healthcare and surveillance. + +More and more, we are facing an artificial intelligence (AI) revolution where, as stated by Gartner, **Edge AI** has a very high impact potential, and **it is for now**! + +![](images/imgs_image_classification/image2.jpg){fig-align="center" width="4.729166666666667in"} + +In the "bullseye" of the Radar is the *Edge Computer Vision*, and when we talk about Machine Learning (ML) applied to vision, the first thing that comes to mind is **Image Classification**, a kind of ML "Hello World"! + +This exercise will explore a computer vision project utilizing Convolutional Neural Networks (CNNs) for real-time image classification. Leveraging TensorFlow's robust ecosystem, we'll implement a pre-trained MobileNet model and adapt it for edge deployment. The focus will be on optimizing the model to run efficiently on resource-constrained hardware without sacrificing accuracy. + +We'll employ techniques like quantization and pruning to reduce the computational load. By the end of this tutorial, you'll have a working prototype capable of classifying images in real-time, all running on a low-power embedded system based on the Arduino Nicla Vision board. + +## Computer Vision + +At its core, computer vision aims to enable machines to interpret and make decisions based on visual data from the world, essentially mimicking the capability of the human optical system. Conversely, AI is a broader field encompassing machine learning, natural language processing, and robotics, among other technologies. When you bring AI algorithms into computer vision projects, you supercharge the system's ability to understand, interpret, and react to visual stimuli. + +When discussing Computer Vision projects applied to embedded devices, the most common applications that come to mind are *Image Classification* and *Object Detection*. + +![](images/imgs_image_classification/image15.jpg){fig-align="center" width="6.5in"} + +Both models can be implemented on tiny devices like the Arduino Nicla Vision and used on real projects. In this chapter, we will cover Image Classification. + +## Image Classification Project Goal + +The first step in any ML project is to define the goal. In this case, it is to detect and classify two specific objects present in one image. For this project, we will use two small toys: a *robot* and a small Brazilian parrot (named *Periquito*). Also, we will collect images of a *background* where those two objects are absent. + +![](images/imgs_image_classification/image36.jpg){fig-align="center" width="6.5in"} + +## Data Collection + +Once you have defined your Machine Learning project goal, the next and most crucial step is the dataset collection. You can use the Edge Impulse Studio, the OpenMV IDE we installed, or even your phone for the image capture. Here, we will use the OpenMV IDE for that. + +### Collecting Dataset with OpenMV IDE + +First, create in your computer a folder where your data will be saved, for example, "data." Next, on the OpenMV IDE, go to `Tools > Dataset Editor` and select `New Dataset` to start the dataset collection: + +![](images/imgs_image_classification/image29.png){fig-align="center" width="6.291666666666667in"} + +The IDE will ask you to open the file where your data will be saved and choose the "data" folder that was created. Note that new icons will appear on the Left panel. + +![](images/imgs_image_classification/image46.png){fig-align="center" width="0.9583333333333334in"} + +Using the upper icon (1), enter with the first class name, for example, "periquito": + +![](images/imgs_image_classification/image22.png){fig-align="center" width="3.25in"} + +Running the `dataset_capture_script.py` and clicking on the camera icon (2), will start capturing images: + +![](images/imgs_image_classification/image43.png){fig-align="center" width="6.5in"} + +Repeat the same procedure with the other classes + +![](images/imgs_image_classification/image6.jpg){fig-align="center" width="6.5in"} + +> We suggest around 60 images from each category. Try to capture different angles, backgrounds, and light conditions. + +The stored images use a QVGA frame size of 320x240 and the RGB565 (color pixel format). + +After capturing your dataset, close the Dataset Editor Tool on the `Tools > Dataset Editor`. + +On your computer, you will end with a dataset that contains three classes: *periquito,* *robot*, and *background*. + +![](images/imgs_image_classification/image20.png){fig-align="center" width="6.5in"} + +You should return to *Edge Impulse Studio* and upload the dataset to your project. + +## Training the model with Edge Impulse Studio + +We will use the Edge Impulse Studio for training our model. Enter your account credentials and create a new project: + +![](images/imgs_image_classification/image45.png){fig-align="center" width="6.5in"} + +> Here, you can clone a similar project: [NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest). + +## Dataset + +Using the EI Studio (or *Studio*), we will go over four main steps to have our model ready for use on the Nicla Vision board: Dataset, Impulse, Tests, and Deploy (on the Edge Device, in this case, the NiclaV). + +![](images/imgs_image_classification/image41.jpg){fig-align="center" width="6.5in"} + +Regarding the Dataset, it is essential to point out that our Original Dataset, captured with the OpenMV IDE, will be split into *Training*, *Validation*, and *Test*. The Test Set will be divided from the beginning, and a part will reserved to be used only in the Test phase after training. The Validation Set will be used during training. + +![](images/imgs_image_classification/image7.jpg){fig-align="center" width="6.5in"} + +On Studio, go to the Data acquisition tab, and on the UPLOAD DATA section, upload the chosen categories files from your computer: + +![](images/imgs_image_classification/image39.png){fig-align="center" width="6.5in"} + +Leave to the Studio the splitting of the original dataset into *train and test* and choose the label about that specific data: + +![](images/imgs_image_classification/image30.png){fig-align="center" width="6.5in"} + +Repeat the procedure for all three classes. At the end, you should see your "raw data" in the Studio: + +![](images/imgs_image_classification/image11.png){fig-align="center" width="6.5in"} + +The Studio allows you to explore your data, showing a complete view of all the data in your project. You can clear, inspect, or change labels by clicking on individual data items. In our case, a very simple project, the data seems OK. + +![](images/imgs_image_classification/image44.png){fig-align="center" width="6.5in"} + +## The Impulse Design + +In this phase, we should define how to: + +- Pre-process our data, which consists of resizing the individual images and determining the `color depth` to use (be it RGB or Grayscale) and + +- Specify a Model, in this case, it will be the `Transfer Learning (Images)` to fine-tune a pre-trained MobileNet V2 image classification model on our data. This method performs well even with relatively small image datasets (around 150 images in our case). + +![](images/imgs_image_classification/image23.jpg){fig-align="center" width="6.5in"} + +Transfer Learning with MobileNet offers a streamlined approach to model training, which is especially beneficial for resource-constrained environments and projects with limited labeled data. MobileNet, known for its lightweight architecture, is a pre-trained model that has already learned valuable features from a large dataset (ImageNet). + +![](images/imgs_image_classification/image9.jpg){fig-align="center" width="6.5in"} + +By leveraging these learned features, you can train a new model for your specific task with fewer data and computational resources and yet achieve competitive accuracy. + +![](images/imgs_image_classification/image32.jpg){fig-align="center" width="6.5in"} + +This approach significantly reduces training time and computational cost, making it ideal for quick prototyping and deployment on embedded devices where efficiency is paramount. + +Go to the Impulse Design Tab and create the *impulse*, defining an image size of 96x96 and squashing them (squared form, without cropping). Select Image and Transfer Learning blocks. Save the Impulse. + +![](images/imgs_image_classification/image16.png){fig-align="center" width="6.5in"} + +### Image Pre-Processing + +All the input QVGA/RGB565 images will be converted to 27,640 features (96x96x3). + +![](images/imgs_image_classification/image17.png){fig-align="center" width="6.5in"} + +Press \[Save parameters\] and Generate all features: + +![](images/imgs_image_classification/image5.png){fig-align="center" width="6.5in"} + +### Model Design + +In 2007, Google introduced [[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), a family of general-purpose computer vision neural networks designed with mobile devices in mind to support classification, detection, and more. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of various use cases. in 2018, Google launched [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381). + +MobileNet V1 and MobileNet V2 aim at mobile efficiency and embedded vision applications but differ in architectural complexity and performance. While both use depthwise separable convolutions to reduce the computational cost, MobileNet V2 introduces Inverted Residual Blocks and Linear Bottlenecks to enhance performance. These new features allow V2 to capture more complex features using fewer parameters, making it computationally more efficient and generally more accurate than its predecessor. Additionally, V2 employs a non-linear activation in the intermediate expansion layer. It still uses a linear activation for the bottleneck layer, a design choice found to preserve important information through the network. MobileNet V2 offers an optimized architecture for higher accuracy and efficiency and will be used in this project. + +Although the base MobileNet architecture is already tiny and has low latency, many times, a specific use case or application may require the model to be even smaller and faster. MobileNets introduces a straightforward parameter α (alpha) called width multiplier to construct these smaller, less computationally expensive models. The role of the width multiplier α is that of thinning a network uniformly at each layer. + +Edge Impulse Studio can use both MobileNetV1 (96x96 images) and V2 (96x96 or 160x160 images), with several different **α** values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, 160x160 images, and α=1.0. Of course, there is a trade-off. The higher the accuracy, the more memory (around 1.3MB RAM and 2.6MB ROM) will be needed to run the model, implying more latency. The smaller footprint will be obtained at the other extreme with MobileNetV1 and α=0.10 (around 53.2K RAM and 101K ROM). + +![](images/imgs_image_classification/image27.jpg){fig-align="center" width="6.5in"} + +We will use **MobileNetV2 96x96 0.1** for this project, with an estimated memory cost of 265.3 KB in RAM. This model should be OK for the Nicla Vision with 1MB of SRAM. On the Transfer Learning Tab, select this model: + +![](images/imgs_image_classification/image24.png){fig-align="center" width="6.5in"} + +## Model Training + +Another valuable technique to be used with Deep Learning is **Data Augmentation**. Data augmentation is a method to improve the accuracy of machine learning models by creating additional artificial data. A data augmentation system makes small, random changes to your training data during the training process (such as flipping, cropping, or rotating the images). + +Looking under the hood, here you can see how Edge Impulse implements a data Augmentation policy on your data: + +``` python +# Implements the data augmentation policy +def augment_image(image, label): + # Flips the image randomly + image = tf.image.random_flip_left_right(image) + + # Increase the image size, then randomly crop it down to + # the original dimensions + resize_factor = random.uniform(1, 1.2) + new_height = math.floor(resize_factor * INPUT_SHAPE[0]) + new_width = math.floor(resize_factor * INPUT_SHAPE[1]) + image = tf.image.resize_with_crop_or_pad(image, new_height, new_width) + image = tf.image.random_crop(image, size=INPUT_SHAPE) + + # Vary the brightness of the image + image = tf.image.random_brightness(image, max_delta=0.2) + + return image, label +``` + +Exposure to these variations during training can help prevent your model from taking shortcuts by "memorizing" superficial clues in your training data, meaning it may better reflect the deep underlying patterns in your dataset. + +The final layer of our model will have 12 neurons with a 15% dropout for overfitting prevention. Here is the Training result: + +![](images/imgs_image_classification/image31.jpg){fig-align="center" width="6.5in"} + +The result is excellent, with 77ms of latency, which should result in 13fps (frames per second) during inference. + +## Model Testing + +![](images/imgs_image_classification/image10.jpg){fig-align="center" width="6.5in"} + +Now, you should take the data set aside at the start of the project and run the trained model using it as input: + +![](images/imgs_image_classification/image34.png){fig-align="center" width="3.1041666666666665in"} + +The result is, again, excellent. + +![](images/imgs_image_classification/image12.png){fig-align="center" width="6.5in"} + +## Deploying the model + +At this point, we can deploy the trained model as.tflite and use the OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ or an Arduino library. + +![](images/imgs_image_classification/image28.jpg){fig-align="center" width="6.5in"} + +### Arduino Library + +First, Let's deploy it as an Arduino Library: + +![](images/imgs_image_classification/image48.png){fig-align="center" width="6.5in"} + +You should install the library as.zip on the Arduino IDE and run the sketch *nicla_vision_camera.ino* available in Examples under your library name. + +> Note that Arduino Nicla Vision has, by default, 512KB of RAM allocated for the M7 core and an additional 244KB on the M4 address space. In the code, this allocation was changed to 288 kB to guarantee that the model will run on the device (`malloc_addblock((void*)0x30000000, 288 * 1024);`). + +The result is good, with 86ms of measured latency. + +![](images/imgs_image_classification/image25.jpg){fig-align="center" width="6.5in"} + +Here is a short video showing the inference results: {{< video https://youtu.be/bZPZZJblU-o width="480" height="270" center >}} + +### OpenMV + +It is possible to deploy the trained model to be used with OpenMV in two ways: as a library and as a firmware. + +Three files are generated as a library: the trained.tflite model, a list with labels, and a simple MicroPython script that can make inferences using the model. + +![](images/imgs_image_classification/image26.png){fig-align="center" width="6.5in"} + +Running this model as a *.tflite* directly in the Nicla was impossible. So, we can sacrifice the accuracy using a smaller model or deploy the model as an OpenMV Firmware (FW). Choosing FW, the Edge Impulse Studio generates optimized models, libraries, and frameworks needed to make the inference. Let's explore this option. + +Select `OpenMV Firmware` on the `Deploy Tab` and press `[Build]`. + +![](images/imgs_image_classification/image3.png){fig-align="center" width="6.5in"} + +On your computer, you will find a ZIP file. Open it: + +![](images/imgs_image_classification/image33.png){fig-align="center" width="6.5in"} + +Use the Bootloader tool on the OpenMV IDE to load the FW on your board: + +![](images/imgs_image_classification/image35.jpg){fig-align="center" width="6.5in"} + +Select the appropriate file (.bin for Nicla-Vision): + +![](images/imgs_image_classification/image8.png){fig-align="center" width="6.5in"} + +After the download is finished, press OK: + +![](images/imgs_image_classification/image40.png){fig-align="center" width="3.875in"} + +If a message says that the FW is outdated, DO NOT UPGRADE. Select \[NO\]. + +![](images/imgs_image_classification/image42.png){fig-align="center" width="4.572916666666667in"} + +Now, open the script **ei_image_classification.py** that was downloaded from the Studio and the.bin file for the Nicla. + +![](images/imgs_image_classification/image14.png){fig-align="center" width="6.5in"} + +Run it. Pointing the camera to the objects we want to classify, the inference result will be displayed on the Serial Terminal. + +![](images/imgs_image_classification/image37.png){fig-align="center" width="6.5in"} + +#### Changing the Code to add labels + +The code provided by Edge Impulse can be modified so that we can see, for test reasons, the inference result directly on the image displayed on the OpenMV IDE. + +[[Upload the code from GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) or modify it as below: + +``` python +# Marcelo Rovai - NICLA Vision - Image Classification +# Adapted from Edge Impulse - OpenMV Image Classification Example +# @24Aug23 + +import sensor, image, time, os, tf, uos, gc + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None + +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +clock = time.clock() +while(True): + clock.tick() # Starts tracking elapsed time. + + img = sensor.snapshot() + + # default settings just do one detection + for obj in net.classify(img, + min_scale=1.0, + scale_mul=0.8, + x_overlap=0.5, + y_overlap=0.5): + fps = clock.fps() + lat = clock.avg() + + print("**********\nPrediction:") + img.draw_rectangle(obj.rect()) + # This combines the labels and confidence values into a list of tuples + predictions_list = list(zip(labels, obj.output())) + + max_val = predictions_list[0][1] + max_lbl = 'background' + for i in range(len(predictions_list)): + val = predictions_list[i][1] + lbl = predictions_list[i][0] + + if val > max_val: + max_val = val + max_lbl = lbl + + # Print label with the highest probability + if max_val < 0.5: + max_lbl = 'uncertain' + print("{} with a prob of {:.2f}".format(max_lbl, max_val)) + print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) + + # Draw label with highest probability to image viewer + img.draw_string( + 10, 10, + max_lbl + "\n{:.2f}".format(max_val), + mono_space = False, + scale=2 + ) +``` + +Here you can see the result: + +![](images/imgs_image_classification/image47.jpg){fig-align="center" width="6.5in"} + +Note that the latency (136 ms) is almost double of what we got directly with the Arduino IDE. This is because we are using the IDE as an interface and also the time to wait for the camera to be ready. If we start the clock just before the inference: + +![](images/imgs_image_classification/image13.jpg){fig-align="center" width="6.5in"} + +The latency will drop to only 71 ms. + +![](images/imgs_image_classification/image1.jpg){fig-align="center" width="3.5520833333333335in"} + +> The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected. + +#### Post-Processing with LEDs + +When working with embedded machine learning, we are looking for devices that can continually proceed with the inference and result, taking some action directly on the physical world and not displaying the result on a connected computer. To simulate this, we will light up a different LED for each possible inference result. + +![](images/imgs_image_classification/image38.jpg){fig-align="center" width="6.5in"} + +To accomplish that, we should [[upload the code from GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) or change the last code to include the LEDs: + +``` python +# Marcelo Rovai - NICLA Vision - Image Classification with LEDs +# Adapted from Edge Impulse - OpenMV Image Classification Example +# @24Aug23 + +import sensor, image, time, os, tf, uos, gc, pyb + +ledRed = pyb.LED(1) +ledGre = pyb.LED(2) +ledBlu = pyb.LED(3) + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None + +ledRed.off() +ledGre.off() +ledBlu.off() + +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +clock = time.clock() + + +def setLEDs(max_lbl): + + if max_lbl == 'uncertain': + ledRed.on() + ledGre.off() + ledBlu.off() + + if max_lbl == 'periquito': + ledRed.off() + ledGre.on() + ledBlu.off() + + if max_lbl == 'robot': + ledRed.off() + ledGre.off() + ledBlu.on() + + if max_lbl == 'background': + ledRed.off() + ledGre.off() + ledBlu.off() + + +while(True): + img = sensor.snapshot() + clock.tick() # Starts tracking elapsed time. + + # default settings just do one detection. + for obj in net.classify(img, + min_scale=1.0, + scale_mul=0.8, + x_overlap=0.5, + y_overlap=0.5): + fps = clock.fps() + lat = clock.avg() + + print("**********\nPrediction:") + img.draw_rectangle(obj.rect()) + # This combines the labels and confidence values into a list of tuples + predictions_list = list(zip(labels, obj.output())) + + max_val = predictions_list[0][1] + max_lbl = 'background' + for i in range(len(predictions_list)): + val = predictions_list[i][1] + lbl = predictions_list[i][0] + + if val > max_val: + max_val = val + max_lbl = lbl + + # Print label and turn on LED with the highest probability + if max_val < 0.8: + max_lbl = 'uncertain' + + setLEDs(max_lbl) + + print("{} with a prob of {:.2f}".format(max_lbl, max_val)) + print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) + + # Draw label with highest probability to image viewer + img.draw_string( + 10, 10, + max_lbl + "\n{:.2f}".format(max_val), + mono_space = False, + scale=2 + ) +``` + +Now, each time that a class scores a result greater than 0.8, the correspondent LED will be lit: + +- Led Red 0n: Uncertain (no class is over 0.8) + +- Led Green 0n: Periquito \> 0.8 + +- Led Blue 0n: Robot \> 0.8 + +- All LEDs Off: Background \> 0.8 + +Here is the result: + +![](images/imgs_image_classification/image18.jpg){fig-align="center" width="6.5in"} + +In more detail + +![](images/imgs_image_classification/image21.jpg){fig-align="center" width="6.5in"} + +## Image Classification (non-official) Benchmark + +Several development boards can be used for embedded machine learning (tinyML), and the most common ones for Computer Vision applications (consuming low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the Arduino Nicla Vison, and the Arduino Portenta. + +![](images/imgs_image_classification/image19.jpg){fig-align="center" width="6.5in"} + +Catching the opportunity, the same trained model was deployed on the ESP-CAM, the XIAO, and the Portenta (in this one, the model was trained again, using grayscaled images to be compatible with its camera). Here is the result, deploying the models as Arduino's Library: + +![](images/imgs_image_classification/image4.jpg){fig-align="center" width="6.5in"} + +## Conclusion + +Before we finish, consider that Computer Vision is more than just image classification. For example, you can develop Edge Machine Learning projects around vision in several areas, such as: + +- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer vision algorithms to navigate and make decisions. + +- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, and CT scan image analysis + +- **Retail**: Automated checkout systems that identify products as they pass through a scanner. + +- **Security and Surveillance**: Facial recognition, anomaly detection, and object tracking in real-time video feeds. + +- **Augmented Reality**: Object detection and classification to overlay digital information in the real world. + +- **Industrial Automation**: Visual inspection of products, predictive maintenance, and robot and drone guidance. + +- **Agriculture**: Drone-based crop monitoring and automated harvesting. + +- **Natural Language Processing**: Image captioning and visual question answering. + +- **Gesture Recognition**: For gaming, sign language translation, and human-machine interaction. + +- **Content Recommendation**: Image-based recommendation systems in e-commerce. diff --git a/images_4/media/image1.jpg b/images/imgs_image_classification/image1.jpg similarity index 100% rename from images_4/media/image1.jpg rename to images/imgs_image_classification/image1.jpg diff --git a/images_4/media/image10.jpg b/images/imgs_image_classification/image10.jpg similarity index 100% rename from images_4/media/image10.jpg rename to images/imgs_image_classification/image10.jpg diff --git a/images_4/media/image11.png b/images/imgs_image_classification/image11.png similarity index 100% rename from images_4/media/image11.png rename to images/imgs_image_classification/image11.png diff --git a/images_4/media/image12.png b/images/imgs_image_classification/image12.png similarity index 100% rename from images_4/media/image12.png rename to images/imgs_image_classification/image12.png diff --git a/images_4/media/image13.jpg b/images/imgs_image_classification/image13.jpg similarity index 100% rename from images_4/media/image13.jpg rename to images/imgs_image_classification/image13.jpg diff --git a/images_4/media/image14.png b/images/imgs_image_classification/image14.png similarity index 100% rename from images_4/media/image14.png rename to images/imgs_image_classification/image14.png diff --git a/images_4/media/image15.jpg b/images/imgs_image_classification/image15.jpg similarity index 100% rename from images_4/media/image15.jpg rename to images/imgs_image_classification/image15.jpg diff --git a/images_4/media/image16.png b/images/imgs_image_classification/image16.png similarity index 100% rename from images_4/media/image16.png rename to images/imgs_image_classification/image16.png diff --git a/images_4/media/image17.png b/images/imgs_image_classification/image17.png similarity index 100% rename from images_4/media/image17.png rename to images/imgs_image_classification/image17.png diff --git a/images_4/media/image18.jpg b/images/imgs_image_classification/image18.jpg similarity index 100% rename from images_4/media/image18.jpg rename to images/imgs_image_classification/image18.jpg diff --git a/images_4/media/image19.jpg b/images/imgs_image_classification/image19.jpg similarity index 100% rename from images_4/media/image19.jpg rename to images/imgs_image_classification/image19.jpg diff --git a/images_4/media/image2.jpg b/images/imgs_image_classification/image2.jpg similarity index 100% rename from images_4/media/image2.jpg rename to images/imgs_image_classification/image2.jpg diff --git a/images_4/media/image20.png b/images/imgs_image_classification/image20.png similarity index 100% rename from images_4/media/image20.png rename to images/imgs_image_classification/image20.png diff --git a/images_4/media/image21.jpg b/images/imgs_image_classification/image21.jpg similarity index 100% rename from images_4/media/image21.jpg rename to images/imgs_image_classification/image21.jpg diff --git a/images_4/media/image22.png b/images/imgs_image_classification/image22.png similarity index 100% rename from images_4/media/image22.png rename to images/imgs_image_classification/image22.png diff --git a/images_4/media/image23.jpg b/images/imgs_image_classification/image23.jpg similarity index 100% rename from images_4/media/image23.jpg rename to images/imgs_image_classification/image23.jpg diff --git a/images_4/media/image24.png b/images/imgs_image_classification/image24.png similarity index 100% rename from images_4/media/image24.png rename to images/imgs_image_classification/image24.png diff --git a/images_4/media/image25.jpg b/images/imgs_image_classification/image25.jpg similarity index 100% rename from images_4/media/image25.jpg rename to images/imgs_image_classification/image25.jpg diff --git a/images_4/media/image26.png b/images/imgs_image_classification/image26.png similarity index 100% rename from images_4/media/image26.png rename to images/imgs_image_classification/image26.png diff --git a/images_4/media/image27.jpg b/images/imgs_image_classification/image27.jpg similarity index 100% rename from images_4/media/image27.jpg rename to images/imgs_image_classification/image27.jpg diff --git a/images_4/media/image28.jpg b/images/imgs_image_classification/image28.jpg similarity index 100% rename from images_4/media/image28.jpg rename to images/imgs_image_classification/image28.jpg diff --git a/images_4/media/image29.png b/images/imgs_image_classification/image29.png similarity index 100% rename from images_4/media/image29.png rename to images/imgs_image_classification/image29.png diff --git a/images_4/media/image3.png b/images/imgs_image_classification/image3.png similarity index 100% rename from images_4/media/image3.png rename to images/imgs_image_classification/image3.png diff --git a/images_4/media/image30.png b/images/imgs_image_classification/image30.png similarity index 100% rename from images_4/media/image30.png rename to images/imgs_image_classification/image30.png diff --git a/images_4/media/image31.jpg b/images/imgs_image_classification/image31.jpg similarity index 100% rename from images_4/media/image31.jpg rename to images/imgs_image_classification/image31.jpg diff --git a/images_4/media/image32.jpg b/images/imgs_image_classification/image32.jpg similarity index 100% rename from images_4/media/image32.jpg rename to images/imgs_image_classification/image32.jpg diff --git a/images_4/media/image33.png b/images/imgs_image_classification/image33.png similarity index 100% rename from images_4/media/image33.png rename to images/imgs_image_classification/image33.png diff --git a/images_4/media/image34.png b/images/imgs_image_classification/image34.png similarity index 100% rename from images_4/media/image34.png rename to images/imgs_image_classification/image34.png diff --git a/images_4/media/image35.jpg b/images/imgs_image_classification/image35.jpg similarity index 100% rename from images_4/media/image35.jpg rename to images/imgs_image_classification/image35.jpg diff --git a/images_4/media/image36.jpg b/images/imgs_image_classification/image36.jpg similarity index 100% rename from images_4/media/image36.jpg rename to images/imgs_image_classification/image36.jpg diff --git a/images_4/media/image37.png b/images/imgs_image_classification/image37.png similarity index 100% rename from images_4/media/image37.png rename to images/imgs_image_classification/image37.png diff --git a/images_4/media/image38.jpg b/images/imgs_image_classification/image38.jpg similarity index 100% rename from images_4/media/image38.jpg rename to images/imgs_image_classification/image38.jpg diff --git a/images_4/media/image39.png b/images/imgs_image_classification/image39.png similarity index 100% rename from images_4/media/image39.png rename to images/imgs_image_classification/image39.png diff --git a/images_4/media/image4.jpg b/images/imgs_image_classification/image4.jpg similarity index 100% rename from images_4/media/image4.jpg rename to images/imgs_image_classification/image4.jpg diff --git a/images_4/media/image40.png b/images/imgs_image_classification/image40.png similarity index 100% rename from images_4/media/image40.png rename to images/imgs_image_classification/image40.png diff --git a/images_4/media/image41.jpg b/images/imgs_image_classification/image41.jpg similarity index 100% rename from images_4/media/image41.jpg rename to images/imgs_image_classification/image41.jpg diff --git a/images_4/media/image42.png b/images/imgs_image_classification/image42.png similarity index 100% rename from images_4/media/image42.png rename to images/imgs_image_classification/image42.png diff --git a/images_4/media/image43.png b/images/imgs_image_classification/image43.png similarity index 100% rename from images_4/media/image43.png rename to images/imgs_image_classification/image43.png diff --git a/images_4/media/image44.png b/images/imgs_image_classification/image44.png similarity index 100% rename from images_4/media/image44.png rename to images/imgs_image_classification/image44.png diff --git a/images_4/media/image45.png b/images/imgs_image_classification/image45.png similarity index 100% rename from images_4/media/image45.png rename to images/imgs_image_classification/image45.png diff --git a/images_4/media/image46.png b/images/imgs_image_classification/image46.png similarity index 100% rename from images_4/media/image46.png rename to images/imgs_image_classification/image46.png diff --git a/images_4/media/image47.jpg b/images/imgs_image_classification/image47.jpg similarity index 100% rename from images_4/media/image47.jpg rename to images/imgs_image_classification/image47.jpg diff --git a/images_4/media/image48.png b/images/imgs_image_classification/image48.png similarity index 100% rename from images_4/media/image48.png rename to images/imgs_image_classification/image48.png diff --git a/images_4/media/image5.png b/images/imgs_image_classification/image5.png similarity index 100% rename from images_4/media/image5.png rename to images/imgs_image_classification/image5.png diff --git a/images_4/media/image6.jpg b/images/imgs_image_classification/image6.jpg similarity index 100% rename from images_4/media/image6.jpg rename to images/imgs_image_classification/image6.jpg diff --git a/images_4/media/image7.jpg b/images/imgs_image_classification/image7.jpg similarity index 100% rename from images_4/media/image7.jpg rename to images/imgs_image_classification/image7.jpg diff --git a/images_4/media/image8.png b/images/imgs_image_classification/image8.png similarity index 100% rename from images_4/media/image8.png rename to images/imgs_image_classification/image8.png diff --git a/images_4/media/image9.jpg b/images/imgs_image_classification/image9.jpg similarity index 100% rename from images_4/media/image9.jpg rename to images/imgs_image_classification/image9.jpg diff --git a/images/imgs_kws_feature_eng/.ipynb_checkpoints/time_vs_freq-checkpoint.png b/images/imgs_kws_feature_eng/.ipynb_checkpoints/time_vs_freq-checkpoint.png new file mode 100644 index 00000000..a91e1707 Binary files /dev/null and b/images/imgs_kws_feature_eng/.ipynb_checkpoints/time_vs_freq-checkpoint.png differ diff --git a/images/imgs_kws_feature_eng/cover.jpg b/images/imgs_kws_feature_eng/cover.jpg new file mode 100644 index 00000000..8d8e8dc7 Binary files /dev/null and b/images/imgs_kws_feature_eng/cover.jpg differ diff --git a/images/imgs_kws_feature_eng/frame_to_fft.jpg b/images/imgs_kws_feature_eng/frame_to_fft.jpg new file mode 100644 index 00000000..bfb5bdc7 Binary files /dev/null and b/images/imgs_kws_feature_eng/frame_to_fft.jpg differ diff --git a/images/imgs_kws_feature_eng/frame_wind.jpg b/images/imgs_kws_feature_eng/frame_wind.jpg new file mode 100644 index 00000000..bb766860 Binary files /dev/null and b/images/imgs_kws_feature_eng/frame_wind.jpg differ diff --git a/images/imgs_kws_feature_eng/kws_diagram.jpg b/images/imgs_kws_feature_eng/kws_diagram.jpg new file mode 100644 index 00000000..e9e17d1a Binary files /dev/null and b/images/imgs_kws_feature_eng/kws_diagram.jpg differ diff --git a/images/imgs_kws_feature_eng/melbank-1_00.hires.jpg b/images/imgs_kws_feature_eng/melbank-1_00.hires.jpg new file mode 100644 index 00000000..5a93e86e Binary files /dev/null and b/images/imgs_kws_feature_eng/melbank-1_00.hires.jpg differ diff --git a/images/imgs_kws_feature_eng/mfcc_final.jpg b/images/imgs_kws_feature_eng/mfcc_final.jpg new file mode 100644 index 00000000..bec68dd1 Binary files /dev/null and b/images/imgs_kws_feature_eng/mfcc_final.jpg differ diff --git a/images/imgs_kws_feature_eng/time_vs_freq.jpg b/images/imgs_kws_feature_eng/time_vs_freq.jpg new file mode 100644 index 00000000..6e9ae476 Binary files /dev/null and b/images/imgs_kws_feature_eng/time_vs_freq.jpg differ diff --git a/images/imgs_kws_feature_eng/yes_no_mfcc.jpg b/images/imgs_kws_feature_eng/yes_no_mfcc.jpg new file mode 100644 index 00000000..d252862f Binary files /dev/null and b/images/imgs_kws_feature_eng/yes_no_mfcc.jpg differ diff --git a/images/imgs_kws_nicla/KWS_PROJ_INF_BLK.jpg b/images/imgs_kws_nicla/KWS_PROJ_INF_BLK.jpg new file mode 100644 index 00000000..886079a1 Binary files /dev/null and b/images/imgs_kws_nicla/KWS_PROJ_INF_BLK.jpg differ diff --git a/images/imgs_kws_nicla/KWS_PROJ_TRAIN_BLK.jpg b/images/imgs_kws_nicla/KWS_PROJ_TRAIN_BLK.jpg new file mode 100644 index 00000000..3e3d02ce Binary files /dev/null and b/images/imgs_kws_nicla/KWS_PROJ_TRAIN_BLK.jpg differ diff --git a/images/imgs_kws_nicla/MFCC.jpg b/images/imgs_kws_nicla/MFCC.jpg new file mode 100644 index 00000000..f5fe2752 Binary files /dev/null and b/images/imgs_kws_nicla/MFCC.jpg differ diff --git a/images/imgs_kws_nicla/audio_capt.jpg b/images/imgs_kws_nicla/audio_capt.jpg new file mode 100644 index 00000000..3af7c31a Binary files /dev/null and b/images/imgs_kws_nicla/audio_capt.jpg differ diff --git a/images/imgs_kws_nicla/code_ide.jpg b/images/imgs_kws_nicla/code_ide.jpg new file mode 100644 index 00000000..29a6022a Binary files /dev/null and b/images/imgs_kws_nicla/code_ide.jpg differ diff --git a/images/imgs_kws_nicla/dataset.jpg b/images/imgs_kws_nicla/dataset.jpg new file mode 100644 index 00000000..0acc65b8 Binary files /dev/null and b/images/imgs_kws_nicla/dataset.jpg differ diff --git a/images/imgs_kws_nicla/deploy.jpg b/images/imgs_kws_nicla/deploy.jpg new file mode 100644 index 00000000..b8795102 Binary files /dev/null and b/images/imgs_kws_nicla/deploy.jpg differ diff --git a/images/imgs_kws_nicla/ei_MFCC.jpg b/images/imgs_kws_nicla/ei_MFCC.jpg new file mode 100644 index 00000000..cd78a331 Binary files /dev/null and b/images/imgs_kws_nicla/ei_MFCC.jpg differ diff --git a/images/imgs_kws_nicla/ei_data_collection.jpg b/images/imgs_kws_nicla/ei_data_collection.jpg new file mode 100644 index 00000000..17630c85 Binary files /dev/null and b/images/imgs_kws_nicla/ei_data_collection.jpg differ diff --git a/images/imgs_kws_nicla/feat_expl.jpg b/images/imgs_kws_nicla/feat_expl.jpg new file mode 100644 index 00000000..26a39788 Binary files /dev/null and b/images/imgs_kws_nicla/feat_expl.jpg differ diff --git a/images/imgs_kws_nicla/files.jpg b/images/imgs_kws_nicla/files.jpg new file mode 100644 index 00000000..bfd6c435 Binary files /dev/null and b/images/imgs_kws_nicla/files.jpg differ diff --git a/images/imgs_kws_nicla/hey_google.png b/images/imgs_kws_nicla/hey_google.png new file mode 100644 index 00000000..a244f375 Binary files /dev/null and b/images/imgs_kws_nicla/hey_google.png differ diff --git a/images/imgs_kws_nicla/impulse.jpg b/images/imgs_kws_nicla/impulse.jpg new file mode 100644 index 00000000..cb9d0ae7 Binary files /dev/null and b/images/imgs_kws_nicla/impulse.jpg differ diff --git a/images/imgs_kws_nicla/install_zip.jpg b/images/imgs_kws_nicla/install_zip.jpg new file mode 100644 index 00000000..7771227b Binary files /dev/null and b/images/imgs_kws_nicla/install_zip.jpg differ diff --git a/images/imgs_kws_nicla/model.jpg b/images/imgs_kws_nicla/model.jpg new file mode 100644 index 00000000..e48ef1a3 Binary files /dev/null and b/images/imgs_kws_nicla/model.jpg differ diff --git a/images/imgs_kws_nicla/models_1d-2d.jpg b/images/imgs_kws_nicla/models_1d-2d.jpg new file mode 100644 index 00000000..f04dc3c5 Binary files /dev/null and b/images/imgs_kws_nicla/models_1d-2d.jpg differ diff --git a/images/imgs_kws_nicla/pa_block.jpg b/images/imgs_kws_nicla/pa_block.jpg new file mode 100644 index 00000000..79700aab Binary files /dev/null and b/images/imgs_kws_nicla/pa_block.jpg differ diff --git a/images/imgs_kws_nicla/pers_ass.jpg b/images/imgs_kws_nicla/pers_ass.jpg new file mode 100644 index 00000000..b0871d82 Binary files /dev/null and b/images/imgs_kws_nicla/pers_ass.jpg differ diff --git a/images/imgs_kws_nicla/phone.jpg b/images/imgs_kws_nicla/phone.jpg new file mode 100644 index 00000000..8f818118 Binary files /dev/null and b/images/imgs_kws_nicla/phone.jpg differ diff --git a/images/imgs_kws_nicla/split.jpg b/images/imgs_kws_nicla/split.jpg new file mode 100644 index 00000000..b2aa68f1 Binary files /dev/null and b/images/imgs_kws_nicla/split.jpg differ diff --git a/images/imgs_kws_nicla/test.jpg b/images/imgs_kws_nicla/test.jpg new file mode 100644 index 00000000..bcd0d7fd Binary files /dev/null and b/images/imgs_kws_nicla/test.jpg differ diff --git a/images/imgs_kws_nicla/train_errors.jpg b/images/imgs_kws_nicla/train_errors.jpg new file mode 100644 index 00000000..1402f29a Binary files /dev/null and b/images/imgs_kws_nicla/train_errors.jpg differ diff --git a/images/imgs_kws_nicla/train_graphs.jpg b/images/imgs_kws_nicla/train_graphs.jpg new file mode 100644 index 00000000..05074f76 Binary files /dev/null and b/images/imgs_kws_nicla/train_graphs.jpg differ diff --git a/images/imgs_kws_nicla/train_result.jpg b/images/imgs_kws_nicla/train_result.jpg new file mode 100644 index 00000000..6b40fe61 Binary files /dev/null and b/images/imgs_kws_nicla/train_result.jpg differ diff --git a/images/imgs_kws_nicla/upload.jpg b/images/imgs_kws_nicla/upload.jpg new file mode 100644 index 00000000..a38a481e Binary files /dev/null and b/images/imgs_kws_nicla/upload.jpg differ diff --git a/images/imgs_kws_nicla/yes.jpg b/images/imgs_kws_nicla/yes.jpg new file mode 100644 index 00000000..6741e31a Binary files /dev/null and b/images/imgs_kws_nicla/yes.jpg differ diff --git a/images/imgs_kws_nicla/yes_no.jpg b/images/imgs_kws_nicla/yes_no.jpg new file mode 100644 index 00000000..a53fbce2 Binary files /dev/null and b/images/imgs_kws_nicla/yes_no.jpg differ diff --git a/images_2/media/image1.png b/images/imgs_niclav_sys/image1.png similarity index 100% rename from images_2/media/image1.png rename to images/imgs_niclav_sys/image1.png diff --git a/images_2/media/image10.png b/images/imgs_niclav_sys/image10.png similarity index 100% rename from images_2/media/image10.png rename to images/imgs_niclav_sys/image10.png diff --git a/images_2/media/image11.jpg b/images/imgs_niclav_sys/image11.jpg similarity index 100% rename from images_2/media/image11.jpg rename to images/imgs_niclav_sys/image11.jpg diff --git a/images_2/media/image12.png b/images/imgs_niclav_sys/image12.png similarity index 100% rename from images_2/media/image12.png rename to images/imgs_niclav_sys/image12.png diff --git a/images_2/media/image13.jpg b/images/imgs_niclav_sys/image13.jpg similarity index 100% rename from images_2/media/image13.jpg rename to images/imgs_niclav_sys/image13.jpg diff --git a/images_2/media/image14.jpg b/images/imgs_niclav_sys/image14.jpg similarity index 100% rename from images_2/media/image14.jpg rename to images/imgs_niclav_sys/image14.jpg diff --git a/images_2/media/image15.jpg b/images/imgs_niclav_sys/image15.jpg similarity index 100% rename from images_2/media/image15.jpg rename to images/imgs_niclav_sys/image15.jpg diff --git a/images_2/media/image16.png b/images/imgs_niclav_sys/image16.png similarity index 100% rename from images_2/media/image16.png rename to images/imgs_niclav_sys/image16.png diff --git a/images_2/media/image17.png b/images/imgs_niclav_sys/image17.png similarity index 100% rename from images_2/media/image17.png rename to images/imgs_niclav_sys/image17.png diff --git a/images_2/media/image18.jpg b/images/imgs_niclav_sys/image18.jpg similarity index 100% rename from images_2/media/image18.jpg rename to images/imgs_niclav_sys/image18.jpg diff --git a/images_2/media/image19.jpg b/images/imgs_niclav_sys/image19.jpg similarity index 100% rename from images_2/media/image19.jpg rename to images/imgs_niclav_sys/image19.jpg diff --git a/images_2/media/image2.jpg b/images/imgs_niclav_sys/image2.jpg similarity index 100% rename from images_2/media/image2.jpg rename to images/imgs_niclav_sys/image2.jpg diff --git a/images_2/media/image20.jpg b/images/imgs_niclav_sys/image20.jpg similarity index 100% rename from images_2/media/image20.jpg rename to images/imgs_niclav_sys/image20.jpg diff --git a/images_2/media/image21.png b/images/imgs_niclav_sys/image21.png similarity index 100% rename from images_2/media/image21.png rename to images/imgs_niclav_sys/image21.png diff --git a/images_2/media/image22.jpg b/images/imgs_niclav_sys/image22.jpg similarity index 100% rename from images_2/media/image22.jpg rename to images/imgs_niclav_sys/image22.jpg diff --git a/images_2/media/image23.jpg b/images/imgs_niclav_sys/image23.jpg similarity index 100% rename from images_2/media/image23.jpg rename to images/imgs_niclav_sys/image23.jpg diff --git a/images/imgs_niclav_sys/image24.jpg b/images/imgs_niclav_sys/image24.jpg new file mode 100644 index 00000000..084cff5d Binary files /dev/null and b/images/imgs_niclav_sys/image24.jpg differ diff --git a/images_2/media/image25.png b/images/imgs_niclav_sys/image25.png similarity index 100% rename from images_2/media/image25.png rename to images/imgs_niclav_sys/image25.png diff --git a/images_2/media/image26.jpg b/images/imgs_niclav_sys/image26.jpg similarity index 100% rename from images_2/media/image26.jpg rename to images/imgs_niclav_sys/image26.jpg diff --git a/images_2/media/image27.png b/images/imgs_niclav_sys/image27.png similarity index 100% rename from images_2/media/image27.png rename to images/imgs_niclav_sys/image27.png diff --git a/images_2/media/image28.png b/images/imgs_niclav_sys/image28.png similarity index 100% rename from images_2/media/image28.png rename to images/imgs_niclav_sys/image28.png diff --git a/images_2/media/image29.jpg b/images/imgs_niclav_sys/image29.jpg similarity index 100% rename from images_2/media/image29.jpg rename to images/imgs_niclav_sys/image29.jpg diff --git a/images_2/media/image3.png b/images/imgs_niclav_sys/image3.png similarity index 100% rename from images_2/media/image3.png rename to images/imgs_niclav_sys/image3.png diff --git a/images_2/media/image4.png b/images/imgs_niclav_sys/image4.png similarity index 100% rename from images_2/media/image4.png rename to images/imgs_niclav_sys/image4.png diff --git a/images_2/media/image5.png b/images/imgs_niclav_sys/image5.png similarity index 100% rename from images_2/media/image5.png rename to images/imgs_niclav_sys/image5.png diff --git a/images_2/media/image6.png b/images/imgs_niclav_sys/image6.png similarity index 100% rename from images_2/media/image6.png rename to images/imgs_niclav_sys/image6.png diff --git a/images_2/media/image7.png b/images/imgs_niclav_sys/image7.png similarity index 100% rename from images_2/media/image7.png rename to images/imgs_niclav_sys/image7.png diff --git a/images_2/media/image8.png b/images/imgs_niclav_sys/image8.png similarity index 100% rename from images_2/media/image8.png rename to images/imgs_niclav_sys/image8.png diff --git a/images_2/media/image9.png b/images/imgs_niclav_sys/image9.png similarity index 100% rename from images_2/media/image9.png rename to images/imgs_niclav_sys/image9.png diff --git a/images/imgs_object_detection_fomo/cv_obj_detect.jpg b/images/imgs_object_detection_fomo/cv_obj_detect.jpg new file mode 100644 index 00000000..7918fa17 Binary files /dev/null and b/images/imgs_object_detection_fomo/cv_obj_detect.jpg differ diff --git a/images/imgs_object_detection_fomo/data_folder.jpg b/images/imgs_object_detection_fomo/data_folder.jpg new file mode 100644 index 00000000..ca0c0b31 Binary files /dev/null and b/images/imgs_object_detection_fomo/data_folder.jpg differ diff --git a/images/imgs_object_detection_fomo/img_1.png b/images/imgs_object_detection_fomo/img_1.png new file mode 100644 index 00000000..3cf9f13e Binary files /dev/null and b/images/imgs_object_detection_fomo/img_1.png differ diff --git a/images/imgs_object_detection_fomo/img_10.png b/images/imgs_object_detection_fomo/img_10.png new file mode 100644 index 00000000..d3aae23e Binary files /dev/null and b/images/imgs_object_detection_fomo/img_10.png differ diff --git a/images/imgs_object_detection_fomo/img_11.jpg b/images/imgs_object_detection_fomo/img_11.jpg new file mode 100644 index 00000000..b6da7df9 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_11.jpg differ diff --git a/images/imgs_object_detection_fomo/img_12.png b/images/imgs_object_detection_fomo/img_12.png new file mode 100644 index 00000000..ac4550c2 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_12.png differ diff --git a/images/imgs_object_detection_fomo/img_13.jpg b/images/imgs_object_detection_fomo/img_13.jpg new file mode 100644 index 00000000..bf3683d9 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_13.jpg differ diff --git a/images/imgs_object_detection_fomo/img_14.png b/images/imgs_object_detection_fomo/img_14.png new file mode 100644 index 00000000..be87da2c Binary files /dev/null and b/images/imgs_object_detection_fomo/img_14.png differ diff --git a/images/imgs_object_detection_fomo/img_15.png b/images/imgs_object_detection_fomo/img_15.png new file mode 100644 index 00000000..6b20b7f2 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_15.png differ diff --git a/images/imgs_object_detection_fomo/img_16.png b/images/imgs_object_detection_fomo/img_16.png new file mode 100644 index 00000000..88e3ceb9 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_16.png differ diff --git a/images/imgs_object_detection_fomo/img_17.png b/images/imgs_object_detection_fomo/img_17.png new file mode 100644 index 00000000..5c1b7669 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_17.png differ diff --git a/images/imgs_object_detection_fomo/img_18.png b/images/imgs_object_detection_fomo/img_18.png new file mode 100644 index 00000000..b82d860a Binary files /dev/null and b/images/imgs_object_detection_fomo/img_18.png differ diff --git a/images/imgs_object_detection_fomo/img_19.png b/images/imgs_object_detection_fomo/img_19.png new file mode 100644 index 00000000..af210f25 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_19.png differ diff --git a/images/imgs_object_detection_fomo/img_2.png b/images/imgs_object_detection_fomo/img_2.png new file mode 100644 index 00000000..c00e93d2 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_2.png differ diff --git a/images/imgs_object_detection_fomo/img_20.png b/images/imgs_object_detection_fomo/img_20.png new file mode 100644 index 00000000..6880f101 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_20.png differ diff --git a/images/imgs_object_detection_fomo/img_21.png b/images/imgs_object_detection_fomo/img_21.png new file mode 100644 index 00000000..ef3e4af4 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_21.png differ diff --git a/images/imgs_object_detection_fomo/img_22.png b/images/imgs_object_detection_fomo/img_22.png new file mode 100644 index 00000000..b49d9abb Binary files /dev/null and b/images/imgs_object_detection_fomo/img_22.png differ diff --git a/images/imgs_object_detection_fomo/img_23.png b/images/imgs_object_detection_fomo/img_23.png new file mode 100644 index 00000000..ee070d80 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_23.png differ diff --git a/images/imgs_object_detection_fomo/img_24.png b/images/imgs_object_detection_fomo/img_24.png new file mode 100644 index 00000000..5057db8d Binary files /dev/null and b/images/imgs_object_detection_fomo/img_24.png differ diff --git a/images/imgs_object_detection_fomo/img_25.png b/images/imgs_object_detection_fomo/img_25.png new file mode 100644 index 00000000..e3ad0add Binary files /dev/null and b/images/imgs_object_detection_fomo/img_25.png differ diff --git a/images/imgs_object_detection_fomo/img_26.png b/images/imgs_object_detection_fomo/img_26.png new file mode 100644 index 00000000..9802e642 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_26.png differ diff --git a/images/imgs_object_detection_fomo/img_27.jpg b/images/imgs_object_detection_fomo/img_27.jpg new file mode 100644 index 00000000..55ec9e05 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_27.jpg differ diff --git a/images/imgs_object_detection_fomo/img_28.jpg b/images/imgs_object_detection_fomo/img_28.jpg new file mode 100644 index 00000000..3d2caadc Binary files /dev/null and b/images/imgs_object_detection_fomo/img_28.jpg differ diff --git a/images/imgs_object_detection_fomo/img_3.png b/images/imgs_object_detection_fomo/img_3.png new file mode 100644 index 00000000..0d854e0e Binary files /dev/null and b/images/imgs_object_detection_fomo/img_3.png differ diff --git a/images/imgs_object_detection_fomo/img_4.png b/images/imgs_object_detection_fomo/img_4.png new file mode 100644 index 00000000..4654de3f Binary files /dev/null and b/images/imgs_object_detection_fomo/img_4.png differ diff --git a/images/imgs_object_detection_fomo/img_5.jpg b/images/imgs_object_detection_fomo/img_5.jpg new file mode 100644 index 00000000..349cd606 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_5.jpg differ diff --git a/images/imgs_object_detection_fomo/img_6.png b/images/imgs_object_detection_fomo/img_6.png new file mode 100644 index 00000000..6771c762 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_6.png differ diff --git a/images/imgs_object_detection_fomo/img_7.png b/images/imgs_object_detection_fomo/img_7.png new file mode 100644 index 00000000..fac11fd1 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_7.png differ diff --git a/images/imgs_object_detection_fomo/img_8.png b/images/imgs_object_detection_fomo/img_8.png new file mode 100644 index 00000000..08efe60e Binary files /dev/null and b/images/imgs_object_detection_fomo/img_8.png differ diff --git a/images/imgs_object_detection_fomo/img_9.png b/images/imgs_object_detection_fomo/img_9.png new file mode 100644 index 00000000..68aedc11 Binary files /dev/null and b/images/imgs_object_detection_fomo/img_9.png differ diff --git a/images/imgs_object_detection_fomo/proj_goal.jpg b/images/imgs_object_detection_fomo/proj_goal.jpg new file mode 100644 index 00000000..7873a4cd Binary files /dev/null and b/images/imgs_object_detection_fomo/proj_goal.jpg differ diff --git a/images/imgs_object_detection_fomo/samples.jpg b/images/imgs_object_detection_fomo/samples.jpg new file mode 100644 index 00000000..157c73e2 Binary files /dev/null and b/images/imgs_object_detection_fomo/samples.jpg differ diff --git a/images_2/media/image24.gif b/images_2/media/image24.gif deleted file mode 100644 index 0ed868cb..00000000 Binary files a/images_2/media/image24.gif and /dev/null differ diff --git a/kws_feature_eng.qmd b/kws_feature_eng.qmd new file mode 100644 index 00000000..44739100 --- /dev/null +++ b/kws_feature_eng.qmd @@ -0,0 +1,143 @@ +# Audio Feature Engineering {.unnumbered} + +## Introduction + +In this hands-on tutorial, the emphasis is on the critical role that feature engineering plays in optimizing the performance of machine learning models applied to audio classification tasks, such as speech recognition. It is essential to be aware that the performance of any machine learning model relies heavily on the quality of features used, and we will deal with "under-the-hood" mechanics of feature extraction, mainly focusing on Mel-frequency Cepstral Coefficients (MFCCs), a cornerstone in the field of audio signal processing. + +Machine learning models, especially traditional algorithms, don't understand audio waves. They understand numbers arranged in some meaningful way, i.e., features. These features encapsulate the characteristics of the audio signal, making it easier for models to distinguish between different sounds. + +> This tutorial will deal with generating features specifically for audio classification. This can be particularly interesting for applying machine learning to a variety of audio data, whether for speech recognition, music categorization, insect classification based on wingbeat sounds, or other sound analysis tasks + +## The KWS + +The most common TinyML application is Keyword Spotting (KWS), a subset of the broader field of speech recognition. While general speech recognition aims to transcribe all spoken words into text, Keyword Spotting focuses on detecting specific "keywords" or "wake words" in a continuous audio stream. The system is trained to recognize these keywords as predefined phrases or words, such as *yes* or *no*. In short, KWS is a specialized form of speech recognition with its own set of challenges and requirements. + +Here a typical KWS Process using MFCC Feature Converter: + +![](images/imgs_kws_feature_eng/kws_diagram.jpg){fig-align="center" width="7.29in"} + +#### Applications of KWS: + +- **Voice Assistants**: In devices like Amazon's Alexa or Google Home, KWS is used to detect the wake word ("Alexa" or "Hey Google") to activate the device. +- **Voice-Activated Controls**: In automotive or industrial settings, KWS can be used to initiate specific commands like "Start engine" or "Turn off lights." +- **Security Systems**: Voice-activated security systems may use KWS to authenticate users based on a spoken passphrase. +- **Telecommunication Services**: Customer service lines may use KWS to route calls based on spoken keywords. + +#### Differences from General Speech Recognition: + +- **Computational Efficiency**: KWS is usually designed to be less computationally intensive than full speech recognition, as it only needs to recognize a small set of phrases. +- **Real-time Processing**: KWS often operates in real-time and is optimized for low-latency detection of keywords. +- **Resource Constraints**: KWS models are often designed to be lightweight, so they can run on devices with limited computational resources, like microcontrollers or mobile phones. +- **Focused Task**: While general speech recognition models are trained to handle a broad range of vocabulary and accents, KWS models are fine-tuned to recognize specific keywords, often in noisy environments accurately. + +## Introduction to Audio Signals + +Understanding the basic properties of audio signals is crucial for effective feature extraction and, ultimately, for successfully applying machine learning algorithms in audio classification tasks. Audio signals are complex waveforms that capture fluctuations in air pressure over time. These signals can be characterized by several fundamental attributes: sampling rate, frequency, and amplitude. + +- **Frequency and Amplitude**: [Frequency](https://en.wikipedia.org/wiki/Audio_frequency) refers to the number of oscillations a waveform undergoes per unit time and is also measured in Hz. In the context of audio signals, different frequencies correspond to different pitches. [Amplitude](https://en.wikipedia.org/wiki/Amplitude), on the other hand, measures the magnitude of the oscillations and correlates with the loudness of the sound. Both frequency and amplitude are essential features that capture audio signals' tonal and rhythmic qualities. + +- **Sampling Rate**: The [sampling rate](https://en.wikipedia.org/wiki/Sampling_(signal_processing)), often denoted in Hertz (Hz), defines the number of samples taken per second when digitizing an analog signal. A higher sampling rate allows for a more accurate digital representation of the signal but also demands more computational resources for processing. Typical sampling rates include 44.1 kHz for CD-quality audio and 16 kHz or 8 kHz for speech recognition tasks. Understanding the trade-offs in selecting an appropriate sampling rate is essential for balancing accuracy and computational efficiency. In general, with TinyML projects, we work with 16KHz. Altough music tones can be heard at frequencies up to 20 kHz, voice maxes out at 8 kHz. Traditional telephone systems use an 8 kHz sampling frequency. + +> For an accurate representation of the signal, the sampling rate must be at least twice the highest frequency present in the signal. + +- **Time Domain vs. Frequency Domain**: Audio signals can be analyzed in the time and frequency domains. In the time domain, a signal is represented as a waveform where the amplitude is plotted against time. This representation helps to observe temporal features like onset and duration but the signal's tonal characteristics are not well evidenced. Conversely, a frequency domain representation provides a view of the signal's constituent frequencies and their respective amplitudes, typically obtained via a Fourier Transform. This is invaluable for tasks that require understanding the signal's spectral content, such as identifying musical notes or speech phonemes (our case). + +The image below shows the words `YES` and `NO` with typical representations in the Time (Raw Audio) and Frequency domains: + +![](images/imgs_kws_feature_eng/time_vs_freq.jpg){fig-align="center" width="6.5in"} + +### Why Not Raw Audio? + +While using raw audio data directly for machine learning tasks may seem tempting, this approach presents several challenges that make it less suitable for building robust and efficient models. + +Using raw audio data for Keyword Spotting (KWS), for example, on TinyML devices poses challenges due to its high dimensionality (using a 16 kHz sampling rate), computational complexity for capturing temporal features, susceptibility to noise, and lack of semantically meaningful features, making feature extraction techniques like MFCCs a more practical choice for resource-constrained applications. + +Here are some additional details of the critical issues associated with using raw audio: + +- **High Dimensionality**: Audio signals, especially those sampled at high rates, result in large amounts of data. For example, a 1-second audio clip sampled at 16 kHz will have 16,000 individual data points. High-dimensional data increases computational complexity, leading to longer training times and higher computational costs, making it impractical for resource-constrained environments. Furthermore, the wide dynamic range of audio signals requires a significant amount of bits per sample, while conveying little useful information. + +- **Temporal Dependencies**: Raw audio signals have temporal structures that simple machine learning models may find hard to capture. While recurrent neural networks like [LSTMs](https://annals-csis.org/Volume_18/drp/pdf/185.pdf) can model such dependencies, they are computationally intensive and tricky to train on tiny devices. + +- **Noise and Variability**: Raw audio signals often contain background noise and other non-essential elements affecting model performance. Additionally, the same sound can have different characteristics based on various factors such as distance from the microphone, the orientation of the sound source, and acoustic properties of the environment, adding to the complexity of the data. + +- **Lack of Semantic Meaning**: Raw audio doesn't inherently contain semantically meaningful features for classification tasks. Features like pitch, tempo, and spectral characteristics, which can be crucial for speech recognition, are not directly accessible from raw waveform data. + +- **Signal Redundancy**: Audio signals often contain redundant information, with certain portions of the signal contributing little to no value to the task at hand. This redundancy can make learning inefficient and potentially lead to overfitting. + +For these reasons, feature extraction techniques such as Mel-frequency Cepstral Coefficients (MFCCs), Mel-Frequency Energies (MFEs), and simple Spectograms are commonly used to transform raw audio data into a more manageable and informative format. These features capture the essential characteristics of the audio signal while reducing dimensionality and noise, facilitating more effective machine learning. + +## Introduction to MFCCs + +### What are MFCCs? + +[Mel-frequency Cepstral Coefficients (MFCCs)](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) are a set of features derived from the spectral content of an audio signal. They are based on human auditory perceptions and are commonly used to capture the phonetic characteristics of an audio signal. The MFCCs are computed through a multi-step process that includes pre-emphasis, framing, windowing, applying the Fast Fourier Transform (FFT) to convert the signal to the frequency domain, and finally, applying the Discrete Cosine Transform (DCT). The result is a compact representation of the original audio signal's spectral characteristics. + +The image below shows the words `YES` and `NO` in their MFCC representation: + +![](images/imgs_kws_feature_eng/yes_no_mfcc.jpg){fig-align="center" width="6.5in"} + +> This [video](https://youtu.be/SJo7vPgRlBQ?si=KSgzmDg8DtSVqzXp) explains the Mel Frequency Cepstral Coefficients (MFCC) and how to compute them. + +### Why are MFCCs important? + +MFCCs are crucial for several reasons, particularly in the context of Keyword Spotting (KWS) and TinyML: + +- **Dimensionality Reduction**: MFCCs capture essential spectral characteristics of the audio signal while significantly reducing the dimensionality of the data, making it ideal for resource-constrained TinyML applications. +- **Robustness**: MFCCs are less susceptible to noise and variations in pitch and amplitude, providing a more stable and robust feature set for audio classification tasks. +- **Human Auditory System Modeling**: The Mel scale in MFCCs approximates the human ear's response to different frequencies, making them practical for speech recognition where human-like perception is desired. +- **Computational Efficiency**: The process of calculating MFCCs is computationally efficient, making it well-suited for real-time applications on hardware with limited computational resources. + +In summary, MFCCs offer a balance of information richness and computational efficiency, making them popular for audio classification tasks, particularly in constrained environments like TinyML. + +### Computing MFCCs + +The computation of Mel-frequency Cepstral Coefficients (MFCCs) involves several key steps. Let's walk through these, which are particularly important for Keyword Spotting (KWS) tasks on TinyML devices. + +- **Pre-emphasis**: The first step is pre-emphasis, which is applied to accentuate the high-frequency components of the audio signal and balance the frequency spectrum. This is achieved by applying a filter that amplifies the difference between consecutive samples. The formula for pre-emphasis is: y(t) = x(t) - $\alpha$ x(t-1) , where $\alpha$ is the pre-emphasis factor, typically around 0.97. + +- **Framing**: Audio signals are divided into short frames (the *frame length*), usually 20 to 40 milliseconds. This is based on the assumption that frequencies in a signal are stationary over a short period. Framing helps in analyzing the signal in such small time slots. The *frame stride* (or step) will displace one frame and the adjacent. Those steps could be sequential or overlapped. + +- **Windowing**: Each frame is then windowed to minimize the discontinuities at the frame boundaries. A commonly used window function is the Hamming window. Windowing prepares the signal for a Fourier transform by minimizing the edge effects. The image below shows three frames (10, 20, and 30) and the time samples after windowing (note that the frame length and frame stride are 20 ms): + +![](images/imgs_kws_feature_eng/frame_wind.jpg){fig-align="center" width="6.5in"} + +- **Fast Fourier Transform (FFT)** The Fast Fourier Transform (FFT) is applied to each windowed frame to convert it from the time domain to the frequency domain. The FFT gives us a complex-valued representation that includes both magnitude and phase information. However, for MFCCs, only the magnitude is used to calculate the Power Spectrum. The power spectrum is the square of the magnitude spectrum and measures the energy present at each frequency component. + +> The power spectrum $P(f)$ of a signal $x(t)$ is defined as $P(f) = |X(f)|^2$, where $X(f)$ is the Fourier Transform of $x(t)$. By squaring the magnitude of the Fourier Transform, we emphasize *stronger* frequencies over *weaker* ones, thereby capturing more relevant spectral characteristics of the audio signal. This is important in applications like audio classification, speech recognition, and Keyword Spotting (KWS), where the focus is on identifying distinct frequency patterns that characterize different classes of audio or phonemes in speech. + +![](images/imgs_kws_feature_eng/frame_to_fft.jpg){fig-align="center" width="6.5in"} + +- **Mel Filter Banks**: The frequency domain is then mapped to the [Mel scale](https://en.wikipedia.org/wiki/Mel_scale), which approximates the human ear's response to different frequencies. The idea is to extract more features (more filter banks) in the lower frequencies and less in the high frequencies. Thus, it performs well on sounds distinguished by the human ear. Typically, 20 to 40 triangular filters extract the Mel-frequency energies. These energies are then log-transformed to convert multiplicative factors into additive ones, making them more suitable for further processing. + +![](images/imgs_kws_feature_eng/melbank-1_00.hires.jpg){fig-align="center" width="6.5in"} + +- **Discrete Cosine Transform (DCT)**: The last step is to apply the [Discrete Cosine Transform (DCT)](https://en.wikipedia.org/wiki/Discrete_cosine_transform) to the log Mel energies. The DCT helps to decorrelate the energies, effectively compressing the data and retaining only the most discriminative features. Usually, the first 12-13 DCT coefficients are retained, forming the final MFCC feature vector. + +![](images/imgs_kws_feature_eng/mfcc_final.jpg){fig-align="center" width="6.5in"} + +## Hands-On using Python + +Let's apply what we discussed while working on an actual audio sample. Open the notebook on Google CoLab and extract the MLCC features on your audio samples: [\[Open In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/Audio_Data_Analysis.ipynb) + +## Conclusion + +### **What** Feature Extraction technique **should we use?** + +Mel-frequency Cepstral Coefficients (MFCCs), Mel-Frequency Energies (MFEs), or Spectrogram are techniques for representing audio data, which are often helpful in different contexts. + +In general, MFCCs are more focused on capturing the envelope of the power spectrum, which makes them less sensitive to fine-grained spectral details but more robust to noise. This is often desirable for speech-related tasks. On the other hand, spectrograms or MFEs preserve more detailed frequency information, which can be advantageous in tasks that require discrimination based on fine-grained spectral content. + +#### MFCCs are particularly strong for: + +1. **Speech Recognition**: MFCCs are excellent for identifying phonetic content in speech signals. +2. **Speaker Identification**: They can be used to distinguish between different speakers based on voice characteristics. +3. **Emotion Recognition**: MFCCs can capture the nuanced variations in speech indicative of emotional states. +4. **Keyword Spotting**: Especially in TinyML, where low computational complexity and small feature size are crucial. + +#### Spectrograms or MFEs are often more suitable for: + +1. **Music Analysis**: Spectrograms can capture harmonic and timbral structures in music, which is essential for tasks like genre classification, instrument recognition, or music transcription. +2. **Environmental Sound Classification**: In recognizing non-speech, environmental sounds (e.g., rain, wind, traffic), the full spectrogram can provide more discriminative features. +3. **Birdsong Identification**: The intricate details of bird calls are often better captured using spectrograms. +4. **Bioacoustic Signal Processing**: In applications like dolphin or bat call analysis, the fine-grained frequency information in a spectrogram can be essential. +5. **Audio Quality Assurance**: Spectrograms are often used in professional audio analysis to identify unwanted noises, clicks, or other artifacts. diff --git a/kws_nicla.qmd b/kws_nicla.qmd new file mode 100644 index 00000000..7229dbd0 --- /dev/null +++ b/kws_nicla.qmd @@ -0,0 +1,561 @@ +# Keyword Spotting (KWS) {.unnumbered} + +## Introduction + +<<<<<<< Updated upstream +Having already explored the Nicla Vision board in Computer Vision applications, such as *Image Classification* and *Object Detection*, we are now shifting our focus to voice-activated applications with a project on Keyword Spotting (KWS). + +As introduced in the *Feature Engineering for Audio Classification,* Hands-On tutorial, Keyword Spotting (KWS) is integral to many voice recognition systems, enabling devices to respond to specific words or phrases. While this technology underpins popular devices like Google Assistant or Amazon Alexa, it's equally applicable and achievable on smaller, low-power devices. This tutorial will guide you through implementing a KWS system using TinyML on the Nicla Vision development board equipped with a digital microphone. +======= +Having already explored the Nicla Vision board in the *Image Classification* and *Object Detection* applications, we are now shifting our focus to voice-activated applications with a project on Keyword Spotting (KWS). + +As introduced in the *Feature Engineering for Audio Classification* Hands-On tutorial, Keyword Spotting (KWS) is integrated into many voice recognition systems, enabling devices to respond to specific words or phrases. While this technology underpins popular devices like Google Assistant or Amazon Alexa, it's equally applicable and feasible on smaller, low-power devices. This tutorial will guide you through implementing a KWS system using TinyML on the Nicla Vision development board equipped with a digital microphone. +>>>>>>> Stashed changes + +Our model will be designed to recognize keywords that can trigger device wake-up or specific actions, bringing them to life with voice-activated commands. + +## How does a voice assistant work? + +As said, *voice assistants* on the market, like Google Home or Amazon Echo-Dot, only react to humans when they are "waked up" by particular keywords such as " Hey Google" on the first one and "Alexa" on the second. + +![](images/imgs_kws_nicla/hey_google.png){fig-align="center" width="6.5in"} + +In other words, recognizing voice commands is based on a multi-stage model or Cascade Detection. + +![](images/imgs_kws_nicla/pa_block.jpg){fig-align="center" width="6.5in"} + +<<<<<<< Updated upstream +**Stage 1:** A smaller microprocessor inside the Echo Dot or Google Home **continuously** listens to the sound, waiting for the keyword to be spotted. For such detection, a TinyML model at the edge is used (KWS application). +======= +**Stage 1:** A small microprocessor inside the Echo Dot or Google Home continuously listens, waiting for the keyword to be spotted, using a TinyML model at the edge (KWS application). +>>>>>>> Stashed changes + +**Stage 2:** Only when triggered by the KWS application on Stage 1 is the data sent to the cloud and processed on a larger model. + +The video below shows an example of a Google Assistant being programmed on a Raspberry Pi (Stage 2), with an Arduino Nano 33 BLE as the tinyML device (Stage 1). + +{{< video https://youtu.be/e_OPgcnsyvM width="480" height="270" center >}} + +> To explore the above Google Assistant project, please see the tutorial: [Building an Intelligent Voice Assistant From Scratch](https://www.hackster.io/mjrobot/building-an-intelligent-voice-assistant-from-scratch-2199c3). + +<<<<<<< Updated upstream +In this KWS project, we will focus on Stage 1 ( KWS or Keyword Spotting), where we will use the XIAO ESP2S3 Sense, which has a digital microphone that will be used to spot the keyword. +======= +In this KWS project, we will focus on Stage 1 (KWS or Keyword Spotting), where we will use the XIAO ESP2S3 Sense, which has a digital microphone that will be used to spot the keyword. +>>>>>>> Stashed changes + +## The KWS Hands-On Project + +The diagram below gives an idea of how the final KWS application should work (during inference): + +![](images/imgs_kws_nicla/KWS_PROJ_INF_BLK.jpg){fig-align="center" width="6.5in"} + +Our KWS application will recognize four classes of sound: + +- **YES** (Keyword 1) +- **NO** (Keyword 2) +<<<<<<< Updated upstream +- **NOISE** (no keywords spoken, only background noise is present) +- **UNKNOW** (a mix of different words than YES and NO) + +> Optionally for real-world projects, it is always advised to include different words than keywords, such as "Noise" (or Background) and "Unknow." +======= +- **NOISE** (no words spoken; only background noise is present) +- **UNKNOW** (a mix of different words than YES and NO) + +> For real-world projects, it is always advisable to include other sounds besides the keywords, such as "Noise" (or Background) and "Unknown." +>>>>>>> Stashed changes + +### The Machine Learning workflow + +The main component of the KWS application is its model. So, we must train such a model with our specific keywords, noise, and other words (the "unknown"): + +![](images/imgs_kws_nicla/KWS_PROJ_TRAIN_BLK.jpg){fig-align="center" width="6.5in"} + +## Dataset + +<<<<<<< Updated upstream +The critical component of any Machine Learning Workflow is the **dataset**. Once we have decided on specific keywords (*YES* and NO), we can take advantage of the dataset developed by Pete Warden, ["Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition](https://arxiv.org/pdf/1804.03209.pdf)." This dataset has 35 keywords (with +1,000 samples each), such as yes, no, stop, and go. In some words, we can get 1,500 samples, such as *yes* and *no*. +======= +The critical component of any Machine Learning Workflow is the **dataset**. Once we have decided on specific keywords, in our case (*YES* and NO), we can take advantage of the dataset developed by Pete Warden, ["Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition](https://arxiv.org/pdf/1804.03209.pdf)." This dataset has 35 keywords (with +1,000 samples each), such as yes, no, stop, and go. In words such as *yes* and *no,* we can get 1,500 samples. +>>>>>>> Stashed changes + +You can download a small portion of the dataset from Edge Studio ([Keyword spotting pre-built dataset](https://docs.edgeimpulse.com/docs/pre-built-datasets/keyword-spotting)), which includes samples from the four classes we will use in this project: yes, no, noise, and background. For this, follow the steps below: + +- Download the [keywords dataset.](https://cdn.edgeimpulse.com/datasets/keywords2.zip) +<<<<<<< Updated upstream +- Unzip the file in a location of your choice. + +### Uploading the dataset to the Edge Impulse Studio + +Initiate a new project at Edge Impulse Studio. + +> Here, you can clone the project developed for this hands-on: [Nicla-Vision-KWS](https://studio.edgeimpulse.com/public/292418/latest). + +Select the `Upload Existing Data` tool in the `Data Acquisition` section. Choose the files to be uploaded: + +![](images/imgs_kws_nicla/files.jpg){fig-align="center" width="6.5in"} + +Define the Label, select `Automatically split between train and test,` and `Upload data` to the Studio. Repete to all classes. + +![](images/imgs_kws_nicla/upload.jpg){fig-align="center" width="6.5in"} + +The dataset will now appear in the `Data acquisition` section. Note that the approximately 6,000 samples (1,500 for each class) are split between Train (4,800) and Test (1,200). +======= +- Unzip the file to a location of your choice. + +### Uploading the dataset to the Edge Impulse Studio + +Initiate a new project at Edge Impulse Studio (EIS) and select the `Upload Existing Data` tool in the `Data Acquisition` section. Choose the files to be uploaded: + +![](images/imgs_kws_nicla/files.jpg){fig-align="center" width="6.5in"} + +Define the Label, select `Automatically split between train and test,` and `Upload data` to the EIS. Repeat for all classes. + +![](images/imgs_kws_nicla/upload.jpg){fig-align="center" width="6.5in"} + +The dataset will now appear in the `Data acquisition` section. Note that the approximately 6,000 samples (1,500 for each class) are split into Train (4,800) and Test (1,200) sets. +>>>>>>> Stashed changes + +![](images/imgs_kws_nicla/dataset.jpg){fig-align="center" width="6.5in"} + +### Capturing additional Audio Data + +<<<<<<< Updated upstream +Although we have a lot of data from Pete's dataset, collecting some words spoken by us is advised. When working with accelerometers, creating a dataset with data captured by the same type of sensor is essential. Still, In the case of *sound*, it is different because what we will classify is, in reality, *audio* data. + +> The key difference between sound and audio is their form of energy. Sound is mechanical wave energy (longitudinal sound waves) that propagate through a medium causing variations in pressure within the medium. Audio is made of electrical energy (analog or digital signals) that represent sound electrically. + +The sound waves should be converted to audio data when we speak a keyword. The conversion should be done by sampling the signal generated by the microphone in 16KHz with a 16-bit depth. + +So, any device that can generate audio data with this basic specification (16Khz/16bits) will work fine. As a *device*, we can use the proper NiclaV, a computer, or even your mobile phone. +======= +Although we have a lot of data from Pete's dataset, collecting some words spoken by us is advised. When working with accelerometers, creating a dataset with data captured by the same type of sensor is essential. In the case of *sound*, this is optional because what we will classify is, in reality, *audio* data. + +> The key difference between sound and audio is the type of energy. Sound is mechanical perturbation (longitudinal sound waves) that propagate through a medium, causing variations of pressure in it. Audio is an electrical (analog or digital) signal representing sound. + +When we pronounce a keyword, the sound waves should be converted to audio data. The conversion should be done by sampling the signal generated by the microphone at a 16KHz frequency with 16-bit per sample amplitude. + +So, any device that can generate audio data with this basic specification (16KHz/16bits) will work fine. As a *device*, we can use the NiclaV, a computer, or even your mobile phone. +>>>>>>> Stashed changes + +![](images/imgs_kws_nicla/audio_capt.jpg){fig-align="center" width="6.5in"} + +#### Using the NiclaV and the Edge Impulse Studio + +<<<<<<< Updated upstream +As we learned in the chapter *Setup Nicla Vision*, Edge Impulse officially supports the Nicla Vision, which simplifies the capture of the data from its sensors, such as the microphone. So, please create a new project on the Studio and connect the Nicla to it. For that, follow the steps: + +- Download the most updated [EI Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. + +- Open the zip file on your computer and select the uploader corresponding to your OS: + +![](images_2/media/image17.png){fig-align="center" width="4.416666666666667in"} + +- Put the Nicla-Vision on Boot Mode, pressing the reset button twice. + +![](https://84771188-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGEgcCk4PkS5Pa6uBabld%2Fuploads%2Fgit-blob-111b26f413cd411b29594c377868bba901863233%2Fnicla_bootloader.gif?alt=media) + +- Execute the specific batch code for your OS for uploading the binary *arduino-nicla-vision.bin* to your board. + +Go to your project on the Studio, and on the `Data Acquisition tab`, select `WebUSB`. A window will pop up; choose the option that shows that the `Nicla is paired` and press `[Connect]`. + +You can choose which sensor data to pick in the `Collect Data` section on the `Data Acquisition` tab. Select: `Built-in microphone`, define your `label` (for example, *yes*), the sampling `Frequency`\[16000Hz\], and the `Sample length (ms.)`, for example \[10s\]. `Start sampling`. + +![](images/imgs_kws_nicla/ei_data_collection.jpg){fig-align="center" width="6.5in"} + +All data on Pete's dataset have a 1s length, but the samples recorded have 10s and must be split into 1s samples to be compatible. Click on `three dots` after the sample name and select `Split sample`. +======= +As we learned in the chapter *Setup Nicla Vision*, EIS officially supports the Nicla Vision, which simplifies the capture of the data from its sensors, including the microphone. So, please create a new project on EIS and connect the Nicla to it, following these steps: + +- Download the last updated [EIS Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. + +- Open the zip file on your computer and select the uploader corresponding to your OS: + +![](images/imgs_niclav_sys/image17.png){fig-align="center" width="4.416666666666667in"} + +- Put the NiclaV in Boot Mode by pressing the reset button twice. + +![](https://84771188-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FGEgcCk4PkS5Pa6uBabld%2Fuploads%2Fgit-blob-111b26f413cd411b29594c377868bba901863233%2Fnicla_bootloader.gif?alt=media){fig-align="center" width="6.5in"} + +- Upload the binary *arduino-nicla-vision.bin* to your board by running the batch code corresponding to your OS. + +Go to your project on EIS, and on the `Data Acquisition tab`, select `WebUSB`. A window will pop up; choose the option that shows that the `Nicla is paired` and press `[Connect]`. + +You can choose which sensor data to pick in the `Collect Data` section on the `Data Acquisition` tab. Select: `Built-in microphone`, define your `label` (for example, *yes*), the sampling `Frequency`\[16000Hz\], and the `Sample length (in milliseconds)`, for example \[10s\]. `Start sampling`. + +![](images/imgs_kws_nicla/ei_data_collection.jpg){fig-align="center" width="6.5in"} + +Data on Pete's dataset have a length of 1s, but the recorded samples are 10s long and must be split into 1s samples. Click on `three dots` after the sample name and select `Split sample`. +>>>>>>> Stashed changes + +A window will pop up with the Split tool. + +![](images/imgs_kws_nicla/split.jpg){fig-align="center" width="6.5in"} + +Once inside the tool, split the data into 1-second (1000 ms) records. If necessary, add or remove segments. This procedure should be repeated for all new samples. + +<<<<<<< Updated upstream +#### Using a smartphone and the Studio + +Alternatively, you can use your PC or smartphone to capture audio data with a sampling frequency of 16KHz and a bit depth of 16 Bits. + +Go to `Devices`, and using your phone scan the `QR Code`, and click on the link. A data Collection app will appear in your browser. Select `Collecting Audio`, and define your `Label`, data capture `Length,` and `Category`. +======= +#### Using a smartphone and the EI Studio + +You can also use your PC or smartphone to capture audio data, using a sampling frequency of 16KHz and a bit depth of 16. + +Go to `Devices`, scan the `QR Code` using your phone, and click on the link. A data Collection app will appear in your browser. Select `Collecting Audio`, and define your `Label`, data capture `Length,` and `Category`. +>>>>>>> Stashed changes + +![](images/imgs_kws_nicla/phone.jpg){fig-align="center" width="6.5in"} + +Repeat the same procedure used with the NiclaV. + +<<<<<<< Updated upstream +> Note that any app, such as [Audacity](https://www.audacityteam.org/), can be used for audio recording since you have 16KHz/16-bit depth samples. +======= +> Note that any app, such as [Audacity](https://www.audacityteam.org/), can be used for audio recording, provided you use 16KHz/16-bit depth samples. +>>>>>>> Stashed changes + +## Creating Impulse (Pre-Process / Model definition) + +*An* **impulse** *takes raw data, uses signal processing to extract features, and then uses a learning block to classify new data.* + +### Impulse Design + +![](images/imgs_kws_nicla/impulse.jpg){fig-align="center" width="6.5in"} + +<<<<<<< Updated upstream +First, we will take the data points with a 1-second window, augmenting the data, sliding that window each 500ms. Note that the option zero-pad data is set. It is essential to fill with zeros samples smaller than 1 second (in some cases, some samples can result smaller than the 1000 ms window on the split tool to avoid noises and spikes). + +Each 1-second audio sample should be pre-processed and converted to an image (for example, 13 x 49 x 1). As discussed in the *Feature Engineering for Audio Classification,* Hands-On tutorial, We will use `Audio (MFCC)`, which extracts features from audio signals using [Mel Frequency Cepstral Coefficients](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), which are applicable to the human voice that is our case here. +======= +First, we will take the data points with a 1-second window, augmenting the data and sliding that window in 500ms intervals. Note that the option zero-pad data is set. It is essential to fill with 'zeros' samples smaller than 1 second (in some cases, some samples can result smaller than the 1000 ms window on the split tool to avoid noise and spikes). + +Each 1-second audio sample should be pre-processed and converted to an image (for example, 13 x 49 x 1). As discussed in the *Feature Engineering for Audio Classification* Hands-On tutorial, we will use `Audio (MFCC)`, which extracts features from audio signals using [Mel Frequency Cepstral Coefficients](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum), which are well suited for the human voice, our case here. +>>>>>>> Stashed changes + +Next, we select the `Classification` block to build our model from scratch using a Convolution Neural Network (CNN). + +> Alternatively, you can use the `Transfer Learning (Keyword Spotting)` block, which fine-tunes a pre-trained keyword spotting model on your data. This approach has good performance with relatively small keyword datasets. + +### Pre-Processing (MFCC) + +<<<<<<< Updated upstream +The next step is to create the features to be trained in the next phase: + +We can keep the default parameter values or take advantage of the DSP `Autotune parameters` option, which we will do. + +![](images/imgs_kws_nicla/ei_MFCC.jpg){fig-align="center" width="6.5in"} + +We will take the `Raw features` (our 1-second, 16Khz sampled audio data) and use the MFCC processing block to calculate the `Processed features`. For every 16,000 raw features (16,000 x 1 second), we will get 637 processed features (13 x 49). + +![](images/imgs_kws_nicla/MFCC.jpg){fig-align="center" width="6.5in"} + +The result shows that we will only spend a little memory to pre-process data (16KB), with a latency of 34ms, which is excellent. For example, on an Arduino Nano (Cortex-M4f \@ 64MHz), this same pre-process will take around 480ms. The chosen parameters, such as the `FFT length` \[512\], will significantly impact the latency. +======= +The following step is to create the features to be trained in the next phase: + +We could keep the default parameter values, but we will use the DSP `Autotune parameters` option. + +![](images/imgs_kws_nicla/ei_MFCC.jpg){fig-align="center" width="6.5in"} + +We will take the `Raw features` (our 1-second, 16KHz sampled audio data) and use the MFCC processing block to calculate the `Processed features`. For every 16,000 raw features (16,000 x 1 second), we will get 637 processed features (13 x 49). + +![](images/imgs_kws_nicla/MFCC.jpg){fig-align="center" width="6.5in"} + +The result shows that we only used a small amount of memory to pre-process data (16KB) and a latency of 34ms, which is excellent. For example, on an Arduino Nano (Cortex-M4f \@ 64MHz), the same pre-process will take around 480ms. The parameters chosen, such as the `FFT length` \[512\], will significantly impact the latency. +>>>>>>> Stashed changes + +Now, let's `Save parameters` and move to the `Generated features` tab, where the actual features will be generated. Using [UMAP](https://umap-learn.readthedocs.io/en/latest/), a dimension reduction technique, the `Feature explorer` shows how the features are distributed on a two-dimensional plot. + +![](images/imgs_kws_nicla/feat_expl.jpg){fig-align="center" width="5.9in"} + +<<<<<<< Updated upstream +The result seems OK, with a visual and clear separation from *yes* features (in red) and *no* features (in blue). The *unknown* features seem nearer to the *no space* than the *yes*. This suggests that the keyword *no* has more propensity to false positives. + +### Going under the hood + +To better understand how the raw sound is preprocessed, take a look at the *Feature Engineering for Audio Classification* chapter. You can play with the MFCC features generation in this [notebook](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_MFCC_Analysis.ipynb) or [\[Open In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_MFCC_Analysis.ipynb) + +## Model Design and Training + +We will use a simple Convolution Neural Network (CNN) model, tested with 1D and 2D convolutions. The basic architecture has two blocks of Convolution + MaxPooling (8 and 16 filters, respectively) and a Dropout of 0.25 for the 1D and 0.5 for the 2D. For the last layer, after Flattening 4 neurons, one for each class: + +![For better visualization, open the image on another tab (use the right button)](images/imgs_kws_nicla/models_1d-2d.jpg){fig-align="center" width="6.5in"} +======= +The result seems OK, with a visually clear separation between *yes* features (in red) and *no* features (in blue). The *unknown* features seem nearer to the *no space* than the *yes*. This suggests that the keyword *no* has more propensity to false positives. + +### Going under the hood + +To understand better how the raw sound is preprocessed, look at the *Feature Engineering for Audio Classification* chapter. You can play with the MFCC features generation by downloading this [notebook](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_MFCC_Analysis.ipynb) from GitHub or [\[Opening it In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_MFCC_Analysis.ipynb) + +## Model Design and Training + +We will use a simple Convolution Neural Network (CNN) model, tested with 1D and 2D convolutions. The basic architecture has two blocks of Convolution + MaxPooling (\[8\] and \[16\] filters, respectively) and a Dropout of \[0.25\] for the 1D and \[0.5\] for the 2D. For the last layer, after Flattening, we have \[4\] neurons, one for each class: + +![](images/imgs_kws_nicla/models_1d-2d.jpg){fig-align="center" width="6.5in"} +>>>>>>> Stashed changes + +As hyper-parameters, we will have a `Learning Rate` of \[0.005\] and a model trained by \[100\] epochs. We will also include a data augmentation method based on [SpecAugment](https://arxiv.org/abs/1904.08779). We trained the 1D and the 2D models with the same hyperparameters. The 1D architecture had a better overall result (90.5% accuracy when compared with 88% of the 2D, so we will use the 1D. + +![](images/imgs_kws_nicla/train_result.jpg){fig-align="center" width="6.5in"} + +> Using 1D convolutions is more efficient because it requires fewer parameters than 2D convolutions, making them more suitable for resource-constrained environments. + +<<<<<<< Updated upstream +It is also interesting to pay attention to the 1D Confusion Matrix. The F1 Score for `yes` is 95%, and for `no`, 91%. That was expected by what we saw with the Feature Explorer (`no` and `unknown` at near spaces). In trying to improve the result, you can closely examine the results by the samples with an error. + +![](images/imgs_kws_nicla/train_errors.jpg){fig-align="center" width="6.5in"} + +Try to listen to the samples that went wrong. For example, for `yes`, most of the wrongs were related to a yes pronunciation as "yeh". You can get more samples like that and retrain your model. + +### Going under the hood + +If you want to understand what is happening "under the hood," you can download the pre-processed dataset (`MFCC training data`) from the `Dashboard` tab and run this [Jupyter Notebook](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_CNN_training.ipynb), playing with the code [\[Open In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_CNN_training.ipynb). For example, you can analyze the accuracy by each epoch: +======= +It is also interesting to pay attention to the 1D Confusion Matrix. The F1 Score for `yes` is 95%, and for `no`, 91%. That was expected by what we saw with the Feature Explorer (`no` and `unknown` at close distance). In trying to improve the result, you can inspect closely the results of the samples with an error. + +![](images/imgs_kws_nicla/train_errors.jpg){fig-align="center" width="6.5in"} + +Listen to the samples that went wrong. For example, for `yes`, most of the mistakes were related to a yes pronounced as "yeh". You can acquire additional samples and then retrain your model. + +### Going under the hood + +If you want to understand what is happening "under the hood," you can download the pre-processed dataset (`MFCC training data`) from the `Dashboard` tab and run this [Jupyter Notebook](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_CNN_training.ipynb), playing with the code or [\[Opening it In Colab\]](https://colab.research.google.com/github/Mjrovai/Arduino_Nicla_Vision/blob/main/KWS/KWS_CNN_training.ipynb). For example, you can analyze the accuracy by each epoch: +>>>>>>> Stashed changes + +![](images/imgs_kws_nicla/train_graphs.jpg){fig-align="center" width="6.5in"} + +## Testing + +<<<<<<< Updated upstream +Testing the model with the data put apart before training (Test Data), we got an accuracy of approximately 76%. + +![](images/imgs_kws_nicla/test.jpg){fig-align="center" width="6.5in"} + +Inspecting the F1 score, we can see that for YES. We got 0.90, an excellent result once we expect to use this keyword to primarily "trigger" our KWS project. The worst result (0.70) is for UNKNOWN, what is OK. + +For NO, we got 0.72, which was expected, but one action that can be done here is to take those samples that were not correctly classified, move them to the training dataset, and repeat the training process. + +### Live Classification + +We can proceed with the project but consider that it is possible to perform `Live Classification` using the NiclaV or a smartphone to capture live samples, testing the trained model before deployment on our device. + +## Deploy and Inference + +The Studio will package all the needed libraries, preprocessing functions, and trained models, downloading them to your computer. Go to the `Deployment` section, select the option `Arduino Library`, and at the bottom, choose `Quantized (Int8)` and press the button `Build`. + +![](images/imgs_kws_nicla/deploy.jpg){fig-align="center" width="5.29in"} + +When the `Build` button is selected, a Zip file will be created and downloaded to your computer. On your Arduino IDE, go to the `Sketch` tab, select the option `Add .ZIP Library`, and Choose the.zip file downloaded by the Studio: + +![](images/imgs_kws_nicla/install_zip.jpg){fig-align="center" width="6.5in"} + +Now, it is time for a real test. We will make inferences wholly disconnected from the Studio. Let's use the Nicla Vision code example created when you deploy the Arduino Library. +======= +Testing the model with the data reserved for training (Test Data), we got an accuracy of approximately 76%. + +![](images/imgs_kws_nicla/test.jpg){fig-align="center" width="6.5in"} + +Inspecting the F1 score, we can see that for YES, we got 0.90, an excellent result since we expect to use this keyword as the primary "trigger" of our KWS project. The worst result (0.70) is for UNKNOWN, which is OK. + +For NO, we got 0.72, which was expected, but to improve this result, we can move the samples that were not correctly classified to the training dataset and then repeat the training process. + +### Live Classification + +We can proceed to the project's next step but also consider that it is possible to perform `Live Classification` using the NiclaV or a smartphone to capture live samples, testing the trained model before deployment on our device. + +## Deploy and Inference + +The EIS will package all the needed libraries, preprocessing functions, and trained models, downloading them to your computer. Go to the `Deployment` section, select `Arduino Library`, and at the bottom, choose `Quantized (Int8)` and press `Build`. + +![](images/imgs_kws_nicla/deploy.jpg){fig-align="center" width="5.29in"} + +When the `Build` button is selected, a zip file will be created and downloaded to your computer. On your Arduino IDE, go to the `Sketch` tab, select the option `Add .ZIP Library`, and Choose the .zip file downloaded by EIS: + +![](images/imgs_kws_nicla/install_zip.jpg){fig-align="center" width="6.5in"} + +Now, it is time for a real test. We will make inferences while completely disconnected from the EIS. Let's use the NiclaV code example created when we deployed the Arduino Library. +>>>>>>> Stashed changes + +In your Arduino IDE, go to the `File/Examples` tab, look for your project, and select `nicla-vision/nicla-vision_microphone` (or `nicla-vision_microphone_continuous`) + +![](images/imgs_kws_nicla/code_ide.jpg){fig-align="center" width="6.5in"} + +Press the reset button twice to put the NiclaV in boot mode, upload the sketch to your board, and test some real inferences: + +![](images/imgs_kws_nicla/yes_no.jpg){fig-align="center" width="6.5in"} + +## Post-processing + +<<<<<<< Updated upstream +Now that we know that the model is working by detecting our keywords, let's modify the code so we can see the result with the NiclaV completely offline (disconnected from the PC and powered by a battery). + +The idea is that whenever the keyword YES is detected, the LED Green will be ON; if it is NO, LED Red will be ON, if it is a UNKNOW, LED Blue will be ON; and if it is noise (No Keyword), the LEDs will be OFF. +======= +Now that we know the model is working since it detects our keywords, let's modify the code to see the result with the NiclaV completely offline (disconnected from the PC and powered by a battery, a power bank, or an independent 5V power supply). + +The idea is that whenever the keyword YES is detected, the Green LED will light; if a NO is heard, the Red LED will light, if it is a UNKNOW, the Blue LED will light; and in the presence of noise (No Keyword), the LEDs will be OFF. +>>>>>>> Stashed changes + +We should modify one of the code examples. Let's do it now with the `nicla-vision_microphone_continuous`. + +Start with initializing the LEDs: + +``` cpp +... +void setup() +{ + // Once you finish debugging your code, you can comment or delete the Serial part of the code + Serial.begin(115200); + while (!Serial); + Serial.println("Inferencing - Nicla Vision KWS with LEDs"); + + // Pins for the built-in RGB LEDs on the Arduino NiclaV + pinMode(LEDR, OUTPUT); + pinMode(LEDG, OUTPUT); + pinMode(LEDB, OUTPUT); + + // Ensure the LEDs are OFF by default. + // Note: The RGB LEDs on the Arduino Nicla Vision + // are ON when the pin is LOW, OFF when HIGH. + digitalWrite(LEDR, HIGH); + digitalWrite(LEDG, HIGH); + digitalWrite(LEDB, HIGH); +... +} +``` + +<<<<<<< Updated upstream +Create two functions, `turn_off_leds()` function , to turn off all RGB LEDs +======= +Create two functions, `turn_off_leds()` function, to turn off all RGB LEDs +>>>>>>> Stashed changes + +``` cpp +** + * @brief turn_off_leds function - turn-off all RGB LEDs + */ +void turn_off_leds(){ + digitalWrite(LEDR, HIGH); + digitalWrite(LEDG, HIGH); + digitalWrite(LEDB, HIGH); +} +``` + +<<<<<<< Updated upstream +and another `turn_on_led()` function, which is used to turn on the RGB LEDs depending on the index of the most probable result of the classifier. +======= +Another `turn_on_led()` function is used to turn on the RGB LEDs according to the most probable result of the classifier. +>>>>>>> Stashed changes + +``` cpp +/** + * @brief turn_on_leds function used to turn on the RGB LEDs + * @param[in] pred_index + * no: [0] ==> Red ON + * noise: [1] ==> ALL OFF + * unknown: [2] ==> Blue ON + * Yes: [3] ==> Green ON + */ +void turn_on_leds(int pred_index) { + switch (pred_index) + { + case 0: + turn_off_leds(); + digitalWrite(LEDR, LOW); + break; + + case 1: + turn_off_leds(); + break; + + case 2: + turn_off_leds(); + digitalWrite(LEDB, LOW); + break; + + case 3: + turn_off_leds(); + digitalWrite(LEDG, LOW); + break; + } +} +``` + +And change the `// print the predictions` portion of the code on `loop()`: + +``` cpp +... + + if (++print_results >= (EI_CLASSIFIER_SLICES_PER_MODEL_WINDOW)) { + // print the predictions + ei_printf("Predictions "); + ei_printf("(DSP: %d ms., Classification: %d ms., Anomaly: %d ms.)", + result.timing.dsp, result.timing.classification, result.timing.anomaly); + ei_printf(": \n"); + + int pred_index = 0; // Initialize pred_index + float pred_value = 0; // Initialize pred_value + + for (size_t ix = 0; ix < EI_CLASSIFIER_LABEL_COUNT; ix++) { + if (result.classification[ix].value > pred_value){ + pred_index = ix; + pred_value = result.classification[ix].value; + } + // ei_printf(" %s: ", result.classification[ix].label); + // ei_printf_float(result.classification[ix].value); + // ei_printf("\n"); + } + ei_printf(" PREDICTION: ==> %s with probability %.2f\n", + result.classification[pred_index].label, pred_value); + turn_on_leds (pred_index); + + +#if EI_CLASSIFIER_HAS_ANOMALY == 1 + ei_printf(" anomaly score: "); + ei_printf_float(result.anomaly); + ei_printf("\n"); +#endif + + print_results = 0; + } +} + +... +``` + +You can find the complete code on the [project's GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/KWS/nicla_vision_microphone_continuous_LED). + +<<<<<<< Updated upstream +Upload the sketch to your board and test some real inferences. The idea is that the RGB Green LED will be ON whenever the keyword YES is detected, the same for NO (Red) and any other word that turns on the Blue LED. If silence or background noise is present, the LEDs should be off. In the same way, instead of turning on an LED, this could be a "trigger" for an external device, as we saw in the introduction. +======= +Upload the sketch to your board and test some real inferences. The idea is that the Green LED will be ON whenever the keyword YES is detected, the Red will lit for a NO, and any other word will turn on the Blue LED. All the LEDs should be off if silence or background noise is present. Remember that the same procedure can "trigger" an external device to perform a desired action instead of turning on an LED, as we saw in the introduction. +>>>>>>> Stashed changes + +{{< video https://youtu.be/25Rd76OTXLY width="480" height="270" center >}} + +## Conclusion + +<<<<<<< Updated upstream +> On the [GitHub repository](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/KWS), you will find the notebooks and codes used on this hands-on tutorial. + +Before we finish, consider that Sound Classification is more than just voice. For example, you can develop TinyML projects around sound in several areas, such as: + +- **Security** (Broken Glass detection) +- **Industry** (Anomaly Detection) +- **Medical** (Snore, Toss, Pulmonary diseases) +- **Nature** (Beehive control, insect sound) +======= +> You will find the notebooks and codes used in this hands-on tutorial on the [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/KWS) repository. + +Before we finish, consider that Sound Classification is more than just voice. For example, you can develop TinyML projects around sound in several areas, such as: + +- **Security** (Broken Glass detection, Gunshot) +- **Industry** (Anomaly Detection) +- **Medical** (Snore, Cough, Pulmonary diseases) +- **Nature** (Beehive control, insect sound, pouching mitigation) +>>>>>>> Stashed changes diff --git a/niclav_sys.qmd b/niclav_sys.qmd new file mode 100644 index 00000000..f410f69e --- /dev/null +++ b/niclav_sys.qmd @@ -0,0 +1,307 @@ +# Setup Nicla Vision {.unnumbered} + +## Introduction + +The [Arduino Nicla Vision](https://docs.arduino.cc/hardware/nicla-vision) (sometimes called *NiclaV*) is a development board that includes two processors that can run tasks in parallel. It is part of a family of development boards with the same form factor but designed for specific tasks, such as the [Nicla Sense ME](https://www.bosch-sensortec.com/software-tools/tools/arduino-nicla-sense-me/) and the [Nicla Voice](https://store-usa.arduino.cc/products/nicla-voice?_gl=1*l3abc6*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY5NjM0Mzk1My4xMDIuMS4xNjk2MzQ0MjQ1LjAuMC4w). The *Niclas* can efficiently run processes created with TensorFlow™ Lite. For example, one of the cores of the NiclaV runs a computer vision algorithm on the fly (inference), while the other executes low-level operations like controlling a motor and communicating or acting as a user interface. The onboard wireless module allows the management of WiFi and Bluetooth Low Energy (BLE) connectivity simultaneously. + +![](images/imgs_niclav_sys/image29.jpg){fig-align="center" width="6.5in"} + +## Hardware + +### Two Parallel Cores + +The central processor is the dual-core [STM32H747,](https://content.arduino.cc/assets/Arduino-Portenta-H7_Datasheet_stm32h747xi.pdf?_gl=1*6quciu*_ga*MTQ3NzE4Mjk4Mi4xNjQwMDIwOTk5*_ga_NEXN8H46L5*MTY0NzQ0NTg1My4xMS4xLjE2NDc0NDYzMzkuMA..) including a Cortex® M7 at 480 MHz and a Cortex® M4 at 240 MHz. The two cores communicate via a Remote Procedure Call mechanism that seamlessly allows calling functions on the other processor. Both processors share all the on-chip peripherals and can run: + +- Arduino sketches on top of the Arm® Mbed™ OS + +- Native Mbed™ applications + +- MicroPython / JavaScript via an interpreter + +- TensorFlow™ Lite + +![](images/imgs_niclav_sys/image22.jpg){fig-align="center" width="6.5in"} + +### Memory + +Memory is crucial for embedded machine learning projects. The NiclaV board can host up to 16 MB of QSPI Flash for storage. However, it is essential to consider that the MCU SRAM is the one to be used with machine learning inferences; the STM32H747 is only 1MB, shared by both processors. This MCU also has incorporated 2MB of FLASH, mainly for code storage. + +### Sensors + +- **Camera**: A GC2145 2 MP Color CMOS Camera. + +- **Microphone**: The `MP34DT05` is an ultra-compact, low-power, omnidirectional, digital MEMS microphone built with a capacitive sensing element and the IC interface. + +- **6-Axis IMU**: 3D gyroscope and 3D accelerometer data from the `LSM6DSOX` 6-axis IMU. + +- **Time of Flight Sensor**: The `VL53L1CBV0FY` Time-of-Flight sensor adds accurate and low power-ranging capabilities to the Nicla Vision. The invisible near-infrared VCSEL laser (including the analog driver) is encapsulated with receiving optics in an all-in-one small module below the camera. + +## Arduino IDE Installation + +Start connecting the board (*microUSB*) to your computer: + +![](images/imgs_niclav_sys/image14.jpg){fig-align="center" width="6.5in"} + +Install the Mbed OS core for Nicla boards in the Arduino IDE. Having the IDE open, navigate to `Tools > Board > Board Manager`, look for Arduino Nicla Vision on the search window, and install the board. + +![](images/imgs_niclav_sys/image2.jpg){fig-align="center" width="6.5in"} + +Next, go to `Tools > Board > Arduino Mbed OS Nicla Boards` and select `Arduino Nicla Vision`. Having your board connected to the USB, you should see the Nicla on Port and select it. + +> Open the Blink sketch on Examples/Basic and run it using the IDE Upload button. You should see the Built-in LED (green RGB) blinking, which means the Nicla board is correctly installed and functional! + +### Testing the Microphone + +On Arduino IDE, go to `Examples > PDM > PDMSerialPlotter`, open and run the sketch. Open the Plotter and see the audio representation from the microphone: + +![](images/imgs_niclav_sys/image9.png){fig-align="center" width="6.5in"} + +> Vary the frequency of the sound you generate and confirm that the mic is working correctly. + +### Testing the IMU + +Before testing the IMU, it will be necessary to install the LSM6DSOX library. For that, go to Library Manager and look for LSM6DSOX. Install the library provided by Arduino: + +![](images/imgs_niclav_sys/image19.jpg){fig-align="center" width="6.5in"} + +Next, go to `Examples > Arduino_LSM6DSOX > SimpleAccelerometer` and run the accelerometer test (you can also run Gyro and board temperature): + +![](images/imgs_niclav_sys/image28.png){fig-align="center" width="6.5in"} + +### Testing the ToF (Time of Flight) Sensor + +As we did with IMU, it is necessary to install the VL53L1X ToF library. For that, go to Library Manager and look for VL53L1X. Install the library provided by Pololu: + +![](images/imgs_niclav_sys/image15.jpg){fig-align="center" width="6.5in"} + +Next, run the sketch [proximity_detection.ino](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/distance_image_meter.py): + +![](images/imgs_niclav_sys/image12.png){fig-align="center" width="6.5in"} + +On the Serial Monitor, you will see the distance from the camera to an object in front of it (max of 4m). + +![](images/imgs_niclav_sys/image13.jpg){fig-align="center" width="6.5in"} + +### Testing the Camera + +We can also test the camera using, for example, the code provided on `Examples > Camera > CameraCaptureRawBytes`. We cannot see the image directly, but it is possible to get the raw image data generated by the camera. + +Anyway, the best test with the camera is to see a live image. For that, we will use another IDE, the OpenMV. + +## Installing the OpenMV IDE + +OpenMV IDE is the premier integrated development environment with OpenMV Cameras like the one on the Nicla Vision. It features a powerful text editor, debug terminal, and frame buffer viewer with a histogram display. We will use MicroPython to program the camera. + +Go to the [OpenMV IDE page](https://openmv.io/pages/download), download the correct version for your Operating System, and follow the instructions for its installation on your computer. + +![](images/imgs_niclav_sys/image21.png){fig-align="center" width="6.5in"} + +The IDE should open, defaulting to the helloworld_1.py code on its Code Area. If not, you can open it from `Files > Examples > HelloWord > helloword.py` + +![](images/imgs_niclav_sys/image7.png){fig-align="center" width="6.5in"} + +Any messages sent through a serial connection (using print() or error messages) will be displayed on the **Serial Terminal** during run time. The image captured by a camera will be displayed in the **Camera Viewer** Area (or Frame Buffer) and in the Histogram area, immediately below the Camera Viewer. + +OpenMV IDE is the premier integrated development environment with OpenMV Cameras and the Arduino Pro boards. It features a powerful text editor, debug terminal, and frame buffer viewer with a histogram display. We will use MicroPython to program the Nicla Vision. + +> Before connecting the Nicla to the OpenMV IDE, ensure you have the latest bootloader version. Go to your Arduino IDE, select the Nicla board, and open the sketch on `Examples > STM_32H747_System STM_32H747_updateBootloader`. Upload the code to your board. The Serial Monitor will guide you. + +After updating the bootloader, put the Nicla Vision in bootloader mode by double-pressing the reset button on the board. The built-in green LED will start fading in and out. Now return to the OpenMV IDE and click on the connect icon (Left ToolBar): + +![](images/imgs_niclav_sys/image23.jpg){fig-align="center" width="4.010416666666667in"} + +A pop-up will tell you that a board in DFU mode was detected and ask how you would like to proceed. First, select `Install the latest release firmware (vX.Y.Z)`. This action will install the latest OpenMV firmware on the Nicla Vision. + +![](images/imgs_niclav_sys/image10.png){fig-align="center" width="6.5in"} + +You can leave the option `Erase internal file system` unselected and click `[OK]`. + +Nicla's green LED will start flashing while the OpenMV firmware is uploaded to the board, and a terminal window will then open, showing the flashing progress. + +![](images/imgs_niclav_sys/image5.png){fig-align="center" width="4.854166666666667in"} + +Wait until the green LED stops flashing and fading. When the process ends, you will see a message saying, "DFU firmware update complete!". Press `[OK]`. + +![](images/imgs_niclav_sys/image1.png){fig-align="center" width="3.875in"} + +A green play button appears when the Nicla Vison connects to the Tool Bar. + +![](images/imgs_niclav_sys/image18.jpg){fig-align="center" width="4.791666666666667in"} + +Also, note that a drive named "NO NAME" will appear on your computer.: + +![](images/imgs_niclav_sys/image3.png){fig-align="center" width="6.447916666666667in"} + +Every time you press the `[RESET]` button on the board, it automatically executes the *main.py* script stored on it. You can load the [main.py](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/main.py) code on the IDE (`File > Open File...`). + +![](images/imgs_niclav_sys/image16.png){fig-align="center" width="4.239583333333333in"} + +> This code is the "Blink" code, confirming that the HW is OK. + +For testing the camera, let's run *helloword_1.py*. For that, select the script on `File > Examples > HelloWorld > helloword.py`, + +When clicking the green play button, the MicroPython script (*hellowolrd.py*) on the Code Area will be uploaded and run on the Nicla Vision. On-Camera Viewer, you will start to see the video streaming. The Serial Monitor will show us the FPS (Frames per second), which should be around 14fps. + +![](images/imgs_niclav_sys/image6.png){fig-align="center" width="6.5in"} + +Here is the [helloworld.py](http://helloworld.py/) script: + +``` python +# Hello World Example 2 +# +# Welcome to the OpenMV IDE! Click on the green run arrow button below to run the script! + +import sensor, image, time + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.skip_frames(time = 2000) # Wait for settings take effect. +clock = time.clock() # Create a clock object to track the FPS. + +while(True): + clock.tick() # Update the FPS clock. + img = sensor.snapshot() # Take a picture and return the image. + print(clock.fps()) +``` + +In [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision), you can find the Python scripts used here. + +The code can be split into two parts: + +- **Setup**: Where the libraries are imported, initialized and the variables are defined and initiated. + +- **Loop**: (while loop) part of the code that runs continually. The image (*img* variable) is captured (one frame). Each of those frames can be used for inference in Machine Learning Applications. + +To interrupt the program execution, press the red `[X]` button. + +> Note: OpenMV Cam runs about half as fast when connected to the IDE. The FPS should increase once disconnected. + +In the [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython), You can find other Python scripts. Try to test the onboard sensors. + +## Connecting the Nicla Vision to Edge Impulse Studio + +We will need the Edge Impulse Studio later in other exercises. [Edge Impulse](https://www.edgeimpulse.com/) is a leading development platform for machine learning on edge devices. + +Edge Impulse officially supports the Nicla Vision. So, for starting, please create a new project on the Studio and connect the Nicla to it. For that, follow the steps: + +- Download the most updated [EI Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. + +- Open the zip file on your computer and select the uploader corresponding to your OS: + +![](images/imgs_niclav_sys/image17.png){fig-align="center" width="4.416666666666667in"} + +- Put the Nicla-Vision on Boot Mode, pressing the reset button twice. + +- Execute the specific batch code for your OS for uploading the binary *arduino-nicla-vision.bin* to your board. + +Go to your project on the Studio, and on the `Data Acquisition tab`, select `WebUSB` (1). A window will pop up; choose the option that shows that the `Nicla is paired` (2) and press `[Connect]` (3). + +![](images/imgs_niclav_sys/image27.png){fig-align="center" width="6.5in"} + +In the *Collect Data* section on the `Data Acquisition` tab, you can choose which sensor data to pick. + +![](images/imgs_niclav_sys/image25.png){fig-align="center" width="6.5in"} + +For example. `IMU data`: + +![](images/imgs_niclav_sys/image8.png){fig-align="center" width="6.5in"} + +Or Image (`Camera`): + +![](images/imgs_niclav_sys/image4.png){fig-align="center" width="6.5in"} + +And so on. You can also test an external sensor connected to the `ADC` (Nicla pin 0) and the other onboard sensors, such as the microphone and the ToF. + +## Expanding the Nicla Vision Board (optional) + +A last item to be explored is that sometimes, during prototyping, it is essential to experiment with external sensors and devices, and an excellent expansion to the Nicla is the [Arduino MKR Connector Carrier (Grove compatible)](https://store-usa.arduino.cc/products/arduino-mkr-connector-carrier-grove-compatible). + +The shield has 14 Grove connectors: five single analog inputs (A0-A5), one double analog input (A5/A6), five single digital I/Os (D0-D4), one double digital I/O (D5/D6), one I2C (TWI), and one UART (Serial). All connectors are 5V compatible. + +> Note that all 17 Nicla Vision pins will be connected to the Shield Groves, but some Grove connections remain disconnected. + +![](images/imgs_niclav_sys/image20.jpg){fig-align="center" width="6.5in"} + +This shield is MKR compatible and can be used with the Nicla Vision and Portenta. + +![](images/imgs_niclav_sys/image26.jpg){fig-align="center" width="4.34375in"} + +For example, suppose that on a TinyML project, you want to send inference results using a LoRaWAN device and add information about local luminosity. Often, with offline operations, a local low-power display such as an OLED is advised. This setup can be seen here: + +![](images/imgs_niclav_sys/image11.jpg){fig-align="center" width="6.5in"} + +The [Grove Light Sensor](https://wiki.seeedstudio.com/Grove-Light_Sensor/) would be connected to one of the single Analog pins (A0/PC4), the [LoRaWAN device](https://wiki.seeedstudio.com/Grove_LoRa_E5_New_Version/) to the UART, and the [OLED](https://arduino.cl/producto/display-oled-grove/) to the I2C connector. + +The Nicla Pins 3 (Tx) and 4 (Rx) are connected with the Serial Shield connector. The UART communication is used with the LoRaWan device. Here is a simple code to use the UART: + +``` python +# UART Test - By: marcelo_rovai - Sat Sep 23 2023 + +import time +from pyb import UART +from pyb import LED + +redLED = LED(1) # built-in red LED + +# Init UART object. +# Nicla Vision's UART (TX/RX pins) is on "LP1" +uart = UART("LP1", 9600) + +while(True): + uart.write("Hello World!\r\n") + redLED.toggle() + time.sleep_ms(1000) +``` + +To verify that the UART is working, you should, for example, connect another device as the Arduino UNO, displaying "Hello Word" on the Serial Monitor. Here is the [code](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Arduino-IDE/teste_uart_UNO/teste_uart_UNO.ino). + +![](images/imgs_niclav_sys/image24.jpg){fig-align="center" width="2.8125in"} + +Below is the *Hello World code* to be used with the I2C OLED. The MicroPython SSD1306 OLED driver (ssd1306.py), created by Adafruit, should also be uploaded to the Nicla (the ssd1306.py script can be found in [GitHub](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/ssd1306.py)). + +``` python +# Nicla_OLED_Hello_World - By: marcelo_rovai - Sat Sep 30 2023 + +#Save on device: MicroPython SSD1306 OLED driver, I2C and SPI interfaces created by Adafruit +import ssd1306 + +from machine import I2C +i2c = I2C(1) + +oled_width = 128 +oled_height = 64 +oled = ssd1306.SSD1306_I2C(oled_width, oled_height, i2c) + +oled.text('Hello, World', 10, 10) +oled.show() +``` + +Finally, here is a simple script to read the ADC value on pin "PC4" (Nicla pin A0): + +``` python + +# Light Sensor (A0) - By: marcelo_rovai - Wed Oct 4 2023 + +import pyb +from time import sleep + +adc = pyb.ADC(pyb.Pin("PC4")) # create an analog object from a pin +val = adc.read() # read an analog value + +while (True): + + val = adc.read() + print ("Light={}".format (val)) + sleep (1) +``` + +The ADC can be used for other sensor variables, such as [Temperature](https://wiki.seeedstudio.com/Grove-Temperature_Sensor_V1.2/). + +> Note that the above scripts ([[downloaded from Github]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main/Micropython)) introduce only how to connect external devices with the Nicla Vision board using MicroPython. + +## Conclusion + +The Arduino Nicla Vision is an excellent *tiny device* for industrial and professional uses! However, it is powerful, trustworthy, low power, and has suitable sensors for the most common embedded machine learning applications such as vision, movement, sensor fusion, and sound. + +> On the [GitHub repository,](https://github.com/Mjrovai/Arduino_Nicla_Vision/tree/main) you will find the last version of all the codes used or commented on in this hands-on exercise. diff --git a/object_detection_fomo.qmd b/object_detection_fomo.qmd new file mode 100644 index 00000000..1e38750f --- /dev/null +++ b/object_detection_fomo.qmd @@ -0,0 +1,307 @@ +# Object Detection {.unnumbered} + +## Introduction + +This is a continuation of **CV on Nicla Vision**, now exploring **Object Detection** on microcontrollers. + +![](images/imgs_object_detection_fomo/cv_obj_detect.jpg){fig-align="center" width="6.5in"} + +### Object Detection versus Image Classification + +The main task with Image Classification models is to produce a list of the most probable object categories present on an image, for example, to identify a tabby cat just after his dinner: + +![](images/imgs_object_detection_fomo/img_1.png){fig-align="center"} + +But what happens when the cat jumps near the wine glass? The model still only recognizes the predominant category on the image, the tabby cat: + +![](images/imgs_object_detection_fomo/img_2.png){fig-align="center"} + +And what happens if there is not a dominant category on the image? + +![](images/imgs_object_detection_fomo/img_3.png){fig-align="center"} + +The model identifies the above image completely wrong as an "ashcan," possibly due to the color tonalities. + +> The model used in all previous examples is the *MobileNet*, trained with a large dataset, the *ImageNet*. + +To solve this issue, we need another type of model, where not only **multiple categories** (or labels) can be found but also **where** the objects are located on a given image. + +As we can imagine, such models are much more complicated and bigger, for example, the **MobileNetV2 SSD FPN-Lite 320x320, trained with the COCO dataset.** This pre-trained object detection model is designed to locate up to 10 objects within an image, outputting a bounding box for each object detected. The below image is the result of such a model running on a Raspberry Pi: + +![](images/imgs_object_detection_fomo/img_4.png){fig-align="center" width="6.5in"} + +Those models used for Object detection (such as the MobileNet SSD or YOLO) usually have several MB in size, which is OK for use with Raspberry Pi but unsuitable for use with embedded devices, where the RAM usually is lower than 1M Bytes. + +### An innovative solution for Object Detection: FOMO + +[Edge Impulse launched in 2022, **FOMO** (Faster Objects, More Objects)](https://docs.edgeimpulse.com/docs/edge-impulse-studio/learning-blocks/object-detection/fomo-object-detection-for-constrained-devices), a novel solution to perform object detection on embedded devices, not only on the Nicla Vision (Cortex M7) but also on Cortex M4F CPUs (Arduino Nano33 and OpenMV M4 series) as well the Espressif ESP32 devices (ESP-CAM and XIAO ESP32S3 Sense). + +In this Hands-On exercise, we will explore using FOMO with Object Detection, not entering many details about the model itself. To understand more about how the model works, you can go into the [official FOMO announcement](https://www.edgeimpulse.com/blog/announcing-fomo-faster-objects-more-objects) by Edge Impulse, where Louis Moreau and Mat Kelcey explain in detail how it works. + +## The Object Detection Project Goal + +All Machine Learning projects need to start with a detailed goal. Let's assume we are in an industrial facility and must sort and count **wheels** and special **boxes**. + +![](images/imgs_object_detection_fomo/proj_goal.jpg){fig-align="center" width="6.5in"} + +In other words, we should perform a multi-label classification, where each image can have three classes: + +- Background (No objects) + +- Box + +- Wheel + +Here are some not labeled image samples that we should use to detect the objects (wheels and boxes): + +![](images/imgs_object_detection_fomo/samples.jpg){fig-align="center" width="6.5in"} + +We are interested in which object is in the image, its location (centroid), and how many we can find on it. The object's size is not detected with FOMO, as with MobileNet SSD or YOLO, where the Bounding Box is one of the model outputs. + +We will develop the project using the Nicla Vision for image capture and model inference. The ML project will be developed using the Edge Impulse Studio. But before starting the object detection project in the Studio, let's create a *raw dataset* (not labeled) with images that contain the objects to be detected. + +## Data Collection + +We can use the Edge Impulse Studio, the OpenMV IDE, your phone, or other devices for the image capture. Here, we will use again the OpenMV IDE for our purpose. + +### Collecting Dataset with OpenMV IDE + +First, create in your computer a folder where your data will be saved, for example, "data." Next, on the OpenMV IDE, go to Tools \> Dataset Editor and select New Dataset to start the dataset collection: + +![](images/imgs_object_detection_fomo/data_folder.jpg){fig-align="center" width="6.5in"} + +Edge impulse suggests that the objects should be of similar size and not overlapping for better performance. This is OK in an industrial facility, where the camera should be fixed, keeping the same distance from the objects to be detected. Despite that, we will also try with mixed sizes and positions to see the result. + +> We will not create separate folders for our images because each contains multiple labels. + +Connect the Nicla Vision to the OpenMV IDE and run the `dataset_capture_script.py`. Clicking on the Capture Image button will start capturing images: + +![](images/imgs_object_detection_fomo/img_5.jpg){fig-align="center" width="6.5in"} + +We suggest around 50 images mixing the objects and varying the number of each appearing on the scene. Try to capture different angles, backgrounds, and light conditions. + +> The stored images use a QVGA frame size 320x240 and RGB565 (color pixel format). + +After capturing your dataset, close the Dataset Editor Tool on the `Tools > Dataset Editor`. + +## Edge Impulse Studio + +### Setup the project + +Go to [Edge Impulse Studio,](https://www.edgeimpulse.com/) enter your credentials at **Login** (or create an account), and start a new project. + +![](images/imgs_object_detection_fomo/img_6.png){fig-align="center" width="6.5in"} + +> Here, you can clone the project developed for this hands-on: [NICLA_Vision_Object_Detection](https://studio.edgeimpulse.com/public/292737/latest). + +On your Project Dashboard, go down and on **Project info** and select **Bounding boxes (object detection)** and Nicla Vision as your Target Device: + +![](images/imgs_object_detection_fomo/img_7.png){fig-align="center" width="6.5in"} + +### Uploading the unlabeled data + +On Studio, go to the `Data acquisition` tab, and on the `UPLOAD DATA` section, upload from your computer files captured. + +![](images/imgs_object_detection_fomo/img_8.png){fig-align="center" width="6.5in"} + +> You can leave for the Studio to split your data automatically between Train and Test or do it manually. + +![](images/imgs_object_detection_fomo/img_9.png){fig-align="center" width="6.5in"} + +All the not labeled images (51) were uploaded but they still need to be labeled appropriately before using them as a dataset in the project. The Studio has a tool for that purpose, which you can find in the link `Labeling queue (51)`. + +There are two ways you can use to perform AI-assisted labeling on the Edge Impulse Studio (free version): + +- Using yolov5 +- Tracking objects between frames + +> Edge Impulse launched an [auto-labeling feature](https://docs.edgeimpulse.com/docs/edge-impulse-studio/data-acquisition/auto-labeler) for Enterprise customers, easing labeling tasks in object detection projects. + +Ordinary objects can quickly be identified and labeled using an existing library of pre-trained object detection models from YOLOv5 (trained with the COCO dataset). But since, in our case, the objects are not part of COCO datasets, we should select the option of `tracking objects`. With this option, once you draw bounding boxes and label the images in one frame, the objects will be tracked automatically from frame to frame, *partially* labeling the new ones (not all are correctly labeled). + +> You can use the [EI uploader](https://docs.edgeimpulse.com/docs/tools/edge-impulse-cli/cli-uploader#bounding-boxes) to import your data if you already have a labeled dataset containing bounding boxes. + +### Labeling the Dataset + +Starting with the first image of your unlabeled data, use your mouse to drag a box around an object to add a label. Then click **Save labels** to advance to the next item. + +![](images/imgs_object_detection_fomo/img_10.png){fig-align="center" width="6.5in"} + +Continue with this process until the queue is empty. At the end, all images should have the objects labeled as those samples below: + +![](images/imgs_object_detection_fomo/img_11.jpg){fig-align="center" width="6.5in"} + +Next, review the labeled samples on the `Data acquisition` tab. If one of the labels was wrong, you can edit it using the *`three dots`* menu after the sample name: + +![](images/imgs_object_detection_fomo/img_12.png){fig-align="center" width="6.5in"} + +You will be guided to replace the wrong label, correcting the dataset. + +![](images/imgs_object_detection_fomo/img_13.jpg){fig-align="center" width="6.5in"} + +## The Impulse Design + +In this phase, you should define how to: + +- **Pre-processing** consists of resizing the individual images from `320 x 240` to `96 x 96` and squashing them (squared form, without cropping). Afterwards, the images are converted from RGB to Grayscale. + +- **Design a Model,** in this case, "Object Detection." + +![](images/imgs_object_detection_fomo/img_14.png){fig-align="center" width="6.5in"} + +### Preprocessing all dataset + +In this section, select **Color depth** as `Grayscale`, which is suitable for use with FOMO models and Save `parameters`. + +![](images/imgs_object_detection_fomo/img_15.png){fig-align="center" width="6.5in"} + +The Studio moves automatically to the next section, `Generate features`, where all samples will be pre-processed, resulting in a dataset with individual 96x96x1 images or 9,216 features. + +![](images/imgs_object_detection_fomo/img_16.png){fig-align="center" width="6.5in"} + +The feature explorer shows that all samples evidence a good separation after the feature generation. + +> One of the samples (46) apparently is in the wrong space, but clicking on it can confirm that the labeling is correct. + +## Model Design, Training, and Test + +We will use FOMO, an object detection model based on MobileNetV2 (alpha 0.35) designed to coarsely segment an image into a grid of **background** vs **objects of interest** (here, *boxes* and *wheels*). + +FOMO is an innovative machine learning model for object detection, which can use up to 30 times less energy and memory than traditional models like Mobilenet SSD and YOLOv5. FOMO can operate on microcontrollers with less than 200 KB of RAM. The main reason this is possible is that while other models calculate the object's size by drawing a square around it (bounding box), FOMO ignores the size of the image, providing only the information about where the object is located in the image, by means of its centroid coordinates. + +**How FOMO works?** + +FOMO takes the image in grayscale and divides it into blocks of pixels using a factor of 8. For the input of 96x96, the grid would be 12x12 (96/8=12). Next, FOMO will run a classifier through each pixel block to calculate the probability that there is a box or a wheel in each of them and, subsequently, determine the regions which have the highest probability of containing the object (If a pixel block has no objects, it will be classified as *background*). From the overlap of the final region, the FOMO provides the coordinates (related to the image dimensions) of the centroid of this region. + +![](images/imgs_object_detection_fomo/img_17.png){fig-align="center" width="6.5in"} + +For training, we should select a pre-trained model. Let's use the **`FOMO (Faster Objects, More Objects) MobileNetV2 0.35`\`.** This model uses around 250KB RAM and 80KB of ROM (Flash), which suits well with our board since it has 1MB of RAM and ROM. + +![](images/imgs_object_detection_fomo/img_18.png){fig-align="center" width="6.5in"} + +Regarding the training hyper-parameters, the model will be trained with: + +- Epochs: 60, +- Batch size: 32 +- Learning Rate: 0.001. + +For validation during training, 20% of the dataset (*validation_dataset*) will be spared. For the remaining 80% (*train_dataset*), we will apply Data Augmentation, which will randomly flip, change the size and brightness of the image, and crop them, artificially increasing the number of samples on the dataset for training. + +As a result, the model ends with practically 1.00 in the F1 score, with a similar result when using the Test data. + +> Note that FOMO automatically added a 3rd label background to the two previously defined (*box* and *wheel*). + +![](images/imgs_object_detection_fomo/img_19.png){fig-align="center" width="6.5in"} + +> In object detection tasks, accuracy is generally not the primary [evaluation metric](https://learnopencv.com/mean-average-precision-map-object-detection-model-evaluation-metric/). Object detection involves classifying objects and providing bounding boxes around them, making it a more complex problem than simple classification. The issue is that we do not have the bounding box, only the centroids. In short, using accuracy as a metric could be misleading and may not provide a complete understanding of how well the model is performing. Because of that, we will use the F1 score. + +### Test model with "Live Classification" + +Since Edge Impulse officially supports the Nicla Vision, let's connect it to the Studio. For that, follow the steps: + +- Download the [last EI Firmware](https://cdn.edgeimpulse.com/firmware/arduino-nicla-vision.zip) and unzip it. + +- Open the zip file on your computer and select the uploader related to your OS: + +![](images_2/media/image17.png){fig-align="center" width="4.416666666666667in"} + +- Put the Nicla-Vision on Boot Mode, pressing the reset button twice. + +- Execute the specific batch code for your OS for uploading the binary (`arduino-nicla-vision.bin`) to your board. + +Go to `Live classification` section at EI Studio, and using *webUSB,* connect your Nicla Vision: + +![](images/imgs_object_detection_fomo/img_20.png){fig-align="center" width="6.5in"} + +Once connected, you can use the Nicla to capture actual images to be tested by the trained model on Edge Impulse Studio. + +![](images/imgs_object_detection_fomo/img_21.png){fig-align="center" width="6.5in"} + +One thing to be noted is that the model can produce false positives and negatives. This can be minimized by defining a proper `Confidence Threshold` (use the `Three dots` menu for the set-up). Try with 0.8 or more. + +## Deploying the Model + +Select OpenMV Firmware on the Deploy Tab and press \[Build\]. + +![](images/imgs_object_detection_fomo/img_22.png){fig-align="center" width="6.5in"} + +When you try to connect the Nicla with the OpenMV IDE again, it will try to update its FW. Choose the option `Load a specific firmware` instead. + +![](images/imgs_object_detection_fomo/img_24.png){fig-align="center"} + +You will find a ZIP file on your computer from the Studio. Open it: + +![](images/imgs_object_detection_fomo/img_23.png){fig-align="center" width="6.5in"} + +Load the .bin file to your board: + +![](images/imgs_object_detection_fomo/img_25.png){fig-align="center" width="6.5in"} + +After the download is finished, a pop-up message will be displayed. `Press OK`, and open the script **ei_object_detection.py** downloaded from the Studio. + +Before running the script, let's change a few lines. Note that you can leave the window definition as 240 x 240 and the camera capturing images as QVGA/RGB. The captured image will be pre-processed by the FW deployed from Edge Impulse + +``` python +# Edge Impulse - OpenMV Object Detection Example + +import sensor, image, time, os, tf, math, uos, gc + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None +``` + +Redefine the minimum confidence, for example, to 0.8 to minimize false positives and negatives. + +``` python +min_confidence = 0.8 +``` + +Change if necessary, the color of the circles that will be used to display the detected object's centroid for a better contrast. + +``` python +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +colors = [ # Add more colors if you are detecting more than 7 types of classes at once. + (255, 255, 0), # background: yellow (not used) + ( 0, 255, 0), # cube: green + (255, 0, 0), # wheel: red + ( 0, 0, 255), # not used + (255, 0, 255), # not used + ( 0, 255, 255), # not used + (255, 255, 255), # not used +] +``` + +Keep the remaining code as it is and press the `green Play button` to run the code: + +![](images/imgs_object_detection_fomo/img_26.png){fig-align="center" width="6.5in"} + +On the camera view, we can see the objects with their centroids marked with 12 pixel-fixed circles (each circle has a distinct color, depending on its class). On the Serial Terminal, the model shows the labels detected and their position on the image window (240X240). + +> Be ware that the coordinate origin is in the upper left corner. + +![](images/imgs_object_detection_fomo/img_27.jpg){fig-align="center" width="624"} + +Note that the frames per second rate is around 8 fps (similar to what we got with the Image Classification project). This happens because FOMO is cleverly built over a CNN model, not with an object detection model like the SSD MobileNet. For example, when running a MobileNetV2 SSD FPN-Lite 320x320 model on a Raspberry Pi 4, the latency is around 5 times higher (around 1.5 fps) + +Here is a short video showing the inference results: {{< video https://youtu.be/JbpoqRp3BbM width="480" height="270" center >}} + +## Conclusion + +FOMO is a significant leap in the image processing space, as Louis Moreau and Mat Kelcey put it during its launch in 2022: + +> FOMO is a ground-breaking algorithm that brings real-time object detection, tracking, and counting to microcontrollers for the first time. + +Multiple possibilities exist for exploring object detection (and, more precisely, counting them) on embedded devices, for example, to explore the Nicla doing sensor fusion (camera + microphone) and object detection. This can be very useful on projects involving bees, for example. + +![](images/imgs_object_detection_fomo/img_28.jpg){fig-align="center" width="624"}