diff --git a/ embedded_ml_exercise.qmd b/ embedded_ml_exercise.qmd deleted file mode 100644 index 34634dcd..00000000 --- a/ embedded_ml_exercise.qmd +++ /dev/null @@ -1,747 +0,0 @@ -4 Embedded AI - -# Exercise - Image Classification - -### **Introduction** - -As we initiate our studies into embedded machine learning or tinyML, -it\'s impossible to overlook the transformative impact of Computer -Vision (CV) and Artificial Intelligence (AI) in our lives. These two -intertwined disciplines redefine what machines can perceive and -accomplish, from autonomous vehicles and robotics to healthcare and -surveillance. - -More and more, we are facing an artificial intelligence (AI) revolution -where, as stated by Gartner, **Edge AI** has a very high impact -potential, and **it is for now**! - -![](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image2.jpg){width="4.729166666666667in" -height="4.895833333333333in"} - -In the \"bull-eye\" of emerging technologies, radar is the *Edge -Computer Vision*, and when we talk about Machine Learning (ML) applied -to vision, the first thing that comes to mind is **Image -Classification**, a kind of ML \"Hello World\"! - -This exercise will explore a computer vision project utilizing -Convolutional Neural Networks (CNNs) for real-time image classification. -Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a -pre-trained MobileNet model and adapt it for edge deployment. The focus -will be optimizing the model to run efficiently on resource-constrained -hardware without sacrificing accuracy. - -We\'ll employ techniques like quantization and pruning to reduce the -computational load. By the end of this tutorial, you\'ll have a working -prototype capable of classifying images in real time, all running on a -low-power embedded system based on the Arduino Nicla Vision board. - -### **Computer Vision** - -At its core, computer vision aims to enable machines to interpret and -make decisions based on visual data from the world---essentially -mimicking the capability of the human optical system. Conversely, AI is -a broader field encompassing machine learning, natural language -processing, and robotics, among other technologies. When you bring AI -algorithms into computer vision projects, you supercharge the system\'s -ability to understand, interpret, and react to visual stimuli. - -When discussing Computer Vision projects applied to embedded devices, -the most common applications that come to mind are *Image -Classification* and *Object Detection*. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image15.jpg){width="6.5in" -height="2.8333333333333335in"} - -Both models can be implemented on tiny devices like the Arduino Nicla -Vision and used on real projects. Let\'s start with the first one. - -### **Image Classification Project** - -The first step in any ML project is to define our goal. In this case, it -is to detect and classify two specific objects present in one image. For -this project, we will use two small toys: a *robot* and a small -Brazilian parrot (named *Periquito*). Also, we will collect images of a -*background* where those two objects are absent. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image36.jpg){width="6.5in" -height="3.638888888888889in"} - -### **Data Collection** - -Once you have defined your Machine Learning project goal, the next and -most crucial step is the dataset collection. You can use the Edge -Impulse Studio, the OpenMV IDE we installed, or even your phone for the -image capture. Here, we will use the OpenMV IDE for that. - -**Collecting Dataset with OpenMV IDE** - -First, create in your computer a folder where your data will be saved, -for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset -Editor and select New Dataset to start the dataset collection: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image29.png){width="6.291666666666667in" -height="4.010416666666667in"} - -The IDE will ask you to open the file where your data will be saved and -choose the \"data\" folder that was created. Note that new icons will -appear on the Left panel. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image46.png){width="0.9583333333333334in" -height="1.5208333333333333in"} - -Using the upper icon (1), enter with the first class name, for example, -\"periquito\": - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image22.png){width="3.25in" -height="2.65625in"} - -Run the dataset_capture_script.py, and clicking on the bottom icon (2), -will start capturing images: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image43.png){width="6.5in" -height="4.041666666666667in"} - -Repeat the same procedure with the other classes - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image6.jpg){width="6.5in" -height="3.0972222222222223in"} - -> *We suggest around 60 images from each category. Try to capture -> different angles, backgrounds, and light conditions.* - -The stored images use a QVGA frame size 320x240 and RGB565 (color pixel -format). - -After capturing your dataset, close the Dataset Editor Tool on the Tools -\> Dataset Editor. - -On your computer, you will end with a dataset that contains three -classes: periquito, robot, and background. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image20.png){width="6.5in" -height="2.2083333333333335in"} - -You should return to Edge Impulse Studio and upload the dataset to your -project. - -### **Training the model with Edge Impulse Studio** - -We will use the Edge Impulse Studio for training our model. Enter your -account credentials at Edge Impulse and create a new project: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image45.png){width="6.5in" -height="4.263888888888889in"} - -> *Here, you can clone a similar project:* -> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).* - -### **Dataset** - -Using the EI Studio (or *Studio*), we will pass over four main steps to -have our model ready for use on the Nicla Vision board: Dataset, -Impulse, Tests, and Deploy (on the Edge Device, in this case, the -NiclaV). - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image41.jpg){width="6.5in" -height="4.194444444444445in"} - -Regarding the Dataset, it is essential to point out that our Original -Dataset, captured with the OpenMV IDE, will be split into three parts: -Training, Validation, and Test. The Test Set will be divided from the -beginning and left a part to be used only in the Test phase after -training. The Validation Set will be used during training. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image7.jpg){width="6.5in" -height="4.763888888888889in"} - -On Studio, go to the Data acquisition tab, and on the UPLOAD DATA -section, upload from your computer the files from chosen categories: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image39.png){width="6.5in" -height="4.263888888888889in"} - -Left to the Studio to automatically split the original dataset into -training and test and choose the label related to that specific data: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image30.png){width="6.5in" -height="4.263888888888889in"} - -Repeat the procedure for all three classes. At the end, you should see -your \"raw data in the Studio: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image11.png){width="6.5in" -height="4.263888888888889in"} - -The Studio allows you to explore your data, showing a complete view of -all the data in your project. You can clear, inspect, or change labels -by clicking on individual data items. In our case, a simple project, the -data seems OK. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image44.png){width="6.5in" -height="4.263888888888889in"} - -### **The Impulse Design** - -In this phase, we should define how to: - -- Pre-process our data, which consists of resizing the individual - > images and determining the color depth to use (RGB or Grayscale) - > and - -- Design a Model that will be \"Transfer Learning (Images)\" to - > fine-tune a pre-trained MobileNet V2 image classification model on - > our data. This method performs well even with relatively small - > image datasets (around 150 images in our case). - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image23.jpg){width="6.5in" -height="4.0in"} - -Transfer Learning with MobileNet offers a streamlined approach to model -training, which is especially beneficial for resource-constrained -environments and projects with limited labeled data. MobileNet, known -for its lightweight architecture, is a pre-trained model that has -already learned valuable features from a large dataset (ImageNet). - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image9.jpg){width="6.5in" -height="1.9305555555555556in"} - -By leveraging these learned features, you can train a new model for your -specific task with fewer data and computational resources yet achieve -competitive accuracy. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image32.jpg){width="6.5in" -height="2.3055555555555554in"} - -This approach significantly reduces training time and computational -cost, making it ideal for quick prototyping and deployment on embedded -devices where efficiency is paramount. - -Go to the Impulse Design Tab and create the *impulse*, defining an image -size of 96x96 and squashing them (squared form, without crop). Select -Image and Transfer Learning blocks. Save the Impulse. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image16.png){width="6.5in" -height="4.263888888888889in"} - -### **Image Pre-Processing** - -All input QVGA/RGB565 images will be converted to 27,640 features -(96x96x3). - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image17.png){width="6.5in" -height="4.319444444444445in"} - -Press \[Save parameters\] and Generate all features: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image5.png){width="6.5in" -height="4.263888888888889in"} - -### **Model Design** - -In 2007, Google introduced -[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), -a family of general-purpose computer vision neural networks designed -with mobile devices in mind to support classification, detection, and -more. MobileNets are small, low-latency, low-power models parameterized -to meet the resource constraints of various use cases. in 2018, Google -launched [MobileNetV2: Inverted Residuals and Linear -Bottlenecks](https://arxiv.org/abs/1801.04381). - -MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded -vision applications but differ in architectural complexity and -performance. While both use depthwise separable convolutions to reduce -the computational cost, MobileNet V2 introduces Inverted Residual Blocks -and Linear Bottlenecks to enhance performance. These new features allow -V2 to capture more complex features using fewer parameters, making it -computationally more efficient and generally more accurate than its -predecessor. Additionally, V2 employs a non-linear activation in the -intermediate expansion layer. Still, it uses a linear activation for the -bottleneck layer, a design choice found to preserve important -information through the network better. MobileNet V2 offers a more -optimized architecture for higher accuracy and efficiency and will be -used in this project. - -Although the base MobileNet architecture is already tiny and has low -latency, many times, a specific use case or application may require the -model to be smaller and faster. MobileNets introduces a straightforward -parameter α (alpha) called width multiplier to construct these smaller, -less computationally expensive models. The role of the width multiplier -α is to thin a network uniformly at each layer. - -Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2 -(96x96 and 160x160 images), with several different **α** values (from -0.05 to 1.0). For example, you will get the highest accuracy with V2, -160x160 images, and α=1.0. Of course, there is a trade-off. The higher -the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be -needed to run the model, implying more latency. The smaller footprint -will be obtained at another extreme with MobileNetV1 and α=0.10 (around -53.2K RAM and 101K ROM). - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image27.jpg){width="6.5in" -height="3.5277777777777777in"} - -For this project, we will use **MobileNetV2 96x96 0.1**, which estimates -a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla -Vision with 1MB of SRAM. On the Transfer Learning Tab, select this -model: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image24.png){width="6.5in" -height="4.263888888888889in"} - -Another necessary technique to be used with Deep Learning is **Data -Augmentation**. Data augmentation is a method that can help improve the -accuracy of machine learning models, creating additional artificial -data. A data augmentation system makes small, random changes to your -training data during the training process (such as flipping, cropping, -or rotating the images). - -Under the rood, here you can see how Edge Impulse implements a data -Augmentation policy on your data: - - ----------------------------------------------------------------------- - \# Implements the data augmentation policy\ - def augment_image(image, label):\ - \# Flips the image randomly\ - image = tf.image.random_flip_left_right(image)\ - \ - \# Increase the image size, then randomly crop it down to\ - \# the original dimensions\ - resize_factor = random.uniform(1, 1.2)\ - new_height = math.floor(resize_factor \* INPUT_SHAPE\[0\])\ - new_width = math.floor(resize_factor \* INPUT_SHAPE\[1\])\ - image = tf.image.resize_with_crop_or_pad(image, new_height, new_width)\ - image = tf.image.random_crop(image, size=INPUT_SHAPE)\ - \ - \# Vary the brightness of the image\ - image = tf.image.random_brightness(image, max_delta=0.2)\ - \ - return image, label - ----------------------------------------------------------------------- - - ----------------------------------------------------------------------- - -Exposure to these variations during training can help prevent your model -from taking shortcuts by \"memorizing\" superficial clues in your -training data, meaning it may better reflect the deep underlying -patterns in your dataset. - -The final layer of our model will have 12 neurons with a 15% dropout for -overfitting prevention. Here is the Training result: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image31.jpg){width="6.5in" -height="3.5in"} - -The result is excellent, with 77ms of latency, which should result in -13fps (frames per second) during inference. - -### **Model Testing** - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image10.jpg){width="6.5in" -height="3.8472222222222223in"} - -Now, you should take the data put apart at the start of the project and -run the trained model having them as input: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image34.png){width="3.1041666666666665in" -height="1.7083333333333333in"} - -The result was, again, excellent. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image12.png){width="6.5in" -height="4.263888888888889in"} - -### **Deploying the model** - -At this point, we can deploy the trained model as.tflite and use the -OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ -or an Arduino library. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image28.jpg){width="6.5in" -height="3.763888888888889in"} - -**Arduino Library** - -First, Let\'s deploy it as an Arduino Library: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image48.png){width="6.5in" -height="4.263888888888889in"} - -You should install the library as.zip on the Arduino IDE and run the -sketch nicla_vision_camera.ino available in Examples under your library -name. - -> *Note that Arduino Nicla Vision has, by default, 512KB of RAM -> allocated for the M7 core and an additional 244KB on the M4 address -> space. In the code, this allocation was changed to 288 kB to guarantee -> that the model will run on the device -> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).* - -The result was good, with 86ms of measured latency. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image25.jpg){width="6.5in" -height="3.4444444444444446in"} - -Here is a short video showing the inference results: -[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o) - -**OpenMV** - -It is possible to deploy the trained model to be used with OpenMV in two -ways: as a library and as a firmware. - -Three files are generated as a library: the.tflite model, a list with -the labels, and a simple MicroPython script that can make inferences -using the model. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image26.png){width="6.5in" -height="1.0in"} - -Running this model as a.tflite directly in the Nicla was impossible. So, -we can sacrifice the accuracy using a smaller model or deploy the model -as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates -optimized models, libraries, and frameworks needed to make the -inference. Let\'s explore this last one. - -Select OpenMV Firmware on the Deploy Tab and press \[Build\]. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image3.png){width="6.5in" -height="4.263888888888889in"} - -On your computer, you will find a ZIP file. Open it: - -![Pasted Graphic -64.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image33.png){width="6.5in" -height="2.625in"} - -Use the Bootloader tool on the OpenMV IDE to load the FW on your board: - -![Pasted Graphic -63.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image35.jpg){width="6.5in" -height="3.625in"} - -Select the appropriate file (.bin for Nicla-Vision): - -![Pasted Graphic -65.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image8.png){width="6.5in" -height="1.9722222222222223in"} - -After the download is finished, press OK: - -![DFU firmware update -complete!.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image40.png){width="3.875in" -height="5.708333333333333in"} - -If a message says that the FW is outdated, DO NOT UPGRADE. Select -\[NO\]. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image42.png){width="4.572916666666667in" -height="2.875in"} - -Now, open the script **ei_image_classification.py** that was downloaded -from the Studio and the.bin file for the Nicla. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image14.png){width="6.5in" -height="4.0in"} - -And run it. Pointing the camera to the objects we want to classify, the -inference result will be displayed on the Serial Terminal. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image37.png){width="6.5in" -height="3.736111111111111in"} - -**Changing Code to add labels:** - -The code provided by Edge Impulse can be modified so that we can see, -for test reasons, the inference result directly on the image displayed -on the OpenMV IDE. - -[[Upload the code from -GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) -or modify it as below: - - ----------------------------------------------------------------------- - \# Marcelo Rovai - NICLA Vision - Image Classification\ - \# Adapted from Edge Impulse - OpenMV Image Classification Example\ - \# \@24Aug23\ - \ - import sensor, image, time, os, tf, uos, gc\ - \ - sensor.reset() \# Reset and initialize the sensor.\ - sensor.set_pixformat(sensor.RGB565) \# Set pxl fmt to RGB565 (or - GRAYSCALE)\ - sensor.set_framesize(sensor.QVGA) \# Set frame size to QVGA (320x240)\ - sensor.set_windowing((240, 240)) \# Set 240x240 window.\ - sensor.skip_frames(time=2000) \# Let the camera adjust.\ - \ - net = None\ - labels = None\ - \ - try:\ - \# Load built in model\ - labels, net = tf.load_builtin_model(\'trained\')\ - except Exception as e:\ - raise Exception(e)\ - \ - clock = time.clock()\ - while(True):\ - clock.tick() \# Starts tracking elapsed time.\ - \ - img = sensor.snapshot()\ - \ - \# default settings just do one detection\ - for obj in net.classify(img,\ - min_scale=1.0,\ - scale_mul=0.8,\ - x_overlap=0.5,\ - y_overlap=0.5):\ - fps = clock.fps()\ - lat = clock.avg()\ - \ - print(\"\*\*\*\*\*\*\*\*\*\*\\nPrediction:\")\ - img.draw_rectangle(obj.rect())\ - \# This combines the labels and confidence values into a list of - tuples\ - predictions_list = list(zip(labels, obj.output()))\ - \ - max_val = predictions_list\[0\]\[1\]\ - max_lbl = \'background\'\ - for i in range(len(predictions_list)):\ - val = predictions_list\[i\]\[1\]\ - lbl = predictions_list\[i\]\[0\]\ - \ - if val \> max_val:\ - max_val = val\ - max_lbl = lbl\ - \ - \# Print label with the highest probability\ - if max_val \< 0.5:\ - max_lbl = \'uncertain\'\ - print(\"{} with a prob of {:.2f}\".format(max_lbl, max_val))\ - print(\"FPS: {:.2f} fps ==\> latency: {:.0f} ms\".format(fps, lat))\ - \ - \# Draw label with highest probability to image viewer\ - img.draw_string(\ - 10, 10,\ - max_lbl + \"\\n{:.2f}\".format(max_val),\ - mono_space = False,\ - scale=2\ - ) - ----------------------------------------------------------------------- - - ----------------------------------------------------------------------- - -Here you can see the result: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image47.jpg){width="6.5in" -height="2.9444444444444446in"} - -Note that the latency (136 ms) is almost double what we got directly -with the Arduino IDE. This is because we are using the IDE as an -interface and the time to wait for the camera to be ready. If we start -the clock just before the inference: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image13.jpg){width="6.5in" -height="2.0972222222222223in"} - -The latency will drop to only 71 ms. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image1.jpg){width="3.5520833333333335in" -height="1.53125in"} - -### ***OpenMV Cam runs about half as fast when connected to the IDE. The FPS should increase once disconnected.*** - -### **Post-Processing with LEDs** - -When working with embedded machine learning, we are looking for devices -that can continually proceed with the inference and result, taking some -action directly on the physical world and not displaying the result on a -connected computer. To simulate this, we will define one LED to light up -for each one of the possible inference results. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image38.jpg){width="6.5in" -height="3.236111111111111in"} - -For that, we should [[upload the code from -GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) -or change the last code to include the LEDs: - - ----------------------------------------------------------------------- - \# Marcelo Rovai - NICLA Vision - Image Classification with LEDs\ - \# Adapted from Edge Impulse - OpenMV Image Classification Example\ - \# \@24Aug23\ - \ - import sensor, image, time, os, tf, uos, gc, pyb\ - \ - ledRed = pyb.LED(1)\ - ledGre = pyb.LED(2)\ - ledBlu = pyb.LED(3)\ - \ - sensor.reset() \# Reset and initialize the sensor.\ - sensor.set_pixformat(sensor.RGB565) \# Set pixl fmt to RGB565 (or - GRAYSCALE)\ - sensor.set_framesize(sensor.QVGA) \# Set frame size to QVGA (320x240)\ - sensor.set_windowing((240, 240)) \# Set 240x240 window.\ - sensor.skip_frames(time=2000) \# Let the camera adjust.\ - \ - net = None\ - labels = None\ - \ - ledRed.off()\ - ledGre.off()\ - ledBlu.off()\ - \ - try:\ - \# Load built in model\ - labels, net = tf.load_builtin_model(\'trained\')\ - except Exception as e:\ - raise Exception(e)\ - \ - clock = time.clock()\ - \ - \ - def setLEDs(max_lbl):\ - \ - if max_lbl == \'uncertain\':\ - ledRed.on()\ - ledGre.off()\ - ledBlu.off()\ - \ - if max_lbl == \'periquito\':\ - ledRed.off()\ - ledGre.on()\ - ledBlu.off()\ - \ - if max_lbl == \'robot\':\ - ledRed.off()\ - ledGre.off()\ - ledBlu.on()\ - \ - if max_lbl == \'background\':\ - ledRed.off()\ - ledGre.off()\ - ledBlu.off()\ - \ - \ - while(True):\ - img = sensor.snapshot()\ - clock.tick() \# Starts tracking elapsed time.\ - \ - \# default settings just do one detection.\ - for obj in net.classify(img,\ - min_scale=1.0,\ - scale_mul=0.8,\ - x_overlap=0.5,\ - y_overlap=0.5):\ - fps = clock.fps()\ - lat = clock.avg()\ - \ - print(\"\*\*\*\*\*\*\*\*\*\*\\nPrediction:\")\ - img.draw_rectangle(obj.rect())\ - \# This combines the labels and confidence values into a list of - tuples\ - predictions_list = list(zip(labels, obj.output()))\ - \ - max_val = predictions_list\[0\]\[1\]\ - max_lbl = \'background\'\ - for i in range(len(predictions_list)):\ - val = predictions_list\[i\]\[1\]\ - lbl = predictions_list\[i\]\[0\]\ - \ - if val \> max_val:\ - max_val = val\ - max_lbl = lbl\ - \ - \# Print label and turn on LED with the highest probability\ - if max_val \< 0.8:\ - max_lbl = \'uncertain\'\ - \ - setLEDs(max_lbl)\ - \ - print(\"{} with a prob of {:.2f}\".format(max_lbl, max_val))\ - print(\"FPS: {:.2f} fps ==\> latency: {:.0f} ms\".format(fps, lat))\ - \ - \# Draw label with highest probability to image viewer\ - img.draw_string(\ - 10, 10,\ - max_lbl + \"\\n{:.2f}\".format(max_val),\ - mono_space = False,\ - scale=2\ - ) - ----------------------------------------------------------------------- - - ----------------------------------------------------------------------- - -Now, each time that a class gets a result superior of 0.8, the -correspondent LED will be light on as below: - -- Led Red 0n: Uncertain (no one class is over 0.8) - -- Led Green 0n: Periquito \> 0.8 - -- Led Blue 0n: Robot \> 0.8 - -- All LEDs Off: Background \> 0.8 - -Here is the result: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image18.jpg){width="6.5in" -height="3.6527777777777777in"} - -In more detail - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image21.jpg){width="6.5in" -height="2.0972222222222223in"} - -### **Image Classification (non-official) Benchmark** - -Several development boards can be used for embedded machine learning -(tinyML), and the most common ones for Computer Vision applications -(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the -Arduinos Nicla Vison, and Portenta. - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image19.jpg){width="6.5in" -height="4.194444444444445in"} - -Using the opportunity, the same trained model was deployed on the -ESP-CAM, the XIAO, and Portenta (in this one, the model was trained -again, using grayscaled images to be compatible with its camera. Here is -the result, deploying the models as Arduino\'s Library: - -![image.png](vertopal_7e7df0957e594c66a4c5a3e59a82f457/media/image4.jpg){width="6.5in" -height="3.4444444444444446in"} - -### **Conclusion** - -Before we finish, consider that Computer Vision is more than just image -classification. For example, you can develop Edge Machine Learning -projects around vision in several areas, such as: - -- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer - > vision algorithms to navigate and make decisions. - -- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, - > and CT scan image analysis - -- **Retail**: Automated checkout systems that identify products as - > they pass through a scanner. - -- **Security and Surveillance**: Facial recognition, anomaly - > detection, and object tracking in real-time video feeds. - -- **Augmented Reality**: Object detection and classification to - > overlay digital information in the real world. - -- **Industrial Automation**: Visual inspection of products, predictive - > maintenance, and robot and drone guidance. - -- **Agriculture**: Drone-based crop monitoring and automated - > harvesting. - -- **Natural Language Processing**: Image captioning and visual - > question answering. - -- **Gesture Recognition**: For gaming, sign language translation, and - > human-machine interaction. - -- **Content Recommendation**: Image-based recommendation systems in - > e-commerce.