From d17e8ef1679ccb808d241ca13fabaa1d62d92c94 Mon Sep 17 00:00:00 2001 From: Marcelo Rovai Date: Fri, 6 Oct 2023 15:07:35 -0300 Subject: [PATCH 1/2] Add files via upload --- embedded_ml_exercise.qmd | 742 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 742 insertions(+) create mode 100644 embedded_ml_exercise.qmd diff --git a/embedded_ml_exercise.qmd b/embedded_ml_exercise.qmd new file mode 100644 index 00000000..8b9a0f8a --- /dev/null +++ b/embedded_ml_exercise.qmd @@ -0,0 +1,742 @@ +--- +title: "4 Embedded AI - Exercise: Image Classification" +format: + html: + code-fold: false +execute: + eval: false +jupyter: python3 +--- + + +### **Introduction** + +As we initiate our studies into embedded machine learning or tinyML, +it\'s impossible to overlook the transformative impact of Computer +Vision (CV) and Artificial Intelligence (AI) in our lives. These two +intertwined disciplines redefine what machines can perceive and +accomplish, from autonomous vehicles and robotics to healthcare and +surveillance. + +More and more, we are facing an artificial intelligence (AI) revolution +where, as stated by Gartner, **Edge AI** has a very high impact +potential, and **it is for now**! + +![](images_4/media/image2.jpg){width="4.729166666666667in" +height="4.895833333333333in"} + +In the \"bull-eye\" of emerging technologies, radar is the *Edge +Computer Vision*, and when we talk about Machine Learning (ML) applied +to vision, the first thing that comes to mind is **Image +Classification**, a kind of ML \"Hello World\"! + +This exercise will explore a computer vision project utilizing +Convolutional Neural Networks (CNNs) for real-time image classification. +Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a +pre-trained MobileNet model and adapt it for edge deployment. The focus +will be optimizing the model to run efficiently on resource-constrained +hardware without sacrificing accuracy. + +We\'ll employ techniques like quantization and pruning to reduce the +computational load. By the end of this tutorial, you\'ll have a working +prototype capable of classifying images in real time, all running on a +low-power embedded system based on the Arduino Nicla Vision board. + +### **Computer Vision** + +At its core, computer vision aims to enable machines to interpret and +make decisions based on visual data from the world---essentially +mimicking the capability of the human optical system. Conversely, AI is +a broader field encompassing machine learning, natural language +processing, and robotics, among other technologies. When you bring AI +algorithms into computer vision projects, you supercharge the system\'s +ability to understand, interpret, and react to visual stimuli. + +When discussing Computer Vision projects applied to embedded devices, +the most common applications that come to mind are *Image +Classification* and *Object Detection*. + +![](images_4/media/image15.jpg){width="6.5in" +height="2.8333333333333335in"} + +Both models can be implemented on tiny devices like the Arduino Nicla +Vision and used on real projects. Let\'s start with the first one. + +### **Image Classification Project** + +The first step in any ML project is to define our goal. In this case, it +is to detect and classify two specific objects present in one image. For +this project, we will use two small toys: a *robot* and a small +Brazilian parrot (named *Periquito*). Also, we will collect images of a +*background* where those two objects are absent. + +![](images_4/media/image36.jpg){width="6.5in" +height="3.638888888888889in"} + +### **Data Collection** + +Once you have defined your Machine Learning project goal, the next and +most crucial step is the dataset collection. You can use the Edge +Impulse Studio, the OpenMV IDE we installed, or even your phone for the +image capture. Here, we will use the OpenMV IDE for that. + +**Collecting Dataset with OpenMV IDE** + +First, create in your computer a folder where your data will be saved, +for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset +Editor and select New Dataset to start the dataset collection: + +![](images_4/media/image29.png){width="6.291666666666667in" +height="4.010416666666667in"} + +The IDE will ask you to open the file where your data will be saved and +choose the \"data\" folder that was created. Note that new icons will +appear on the Left panel. + +![](images_4/media/image46.png){width="0.9583333333333334in" +height="1.5208333333333333in"} + +Using the upper icon (1), enter with the first class name, for example, +\"periquito\": + +![](images_4/media/image22.png){width="3.25in" +height="2.65625in"} + +Run the dataset_capture_script.py, and clicking on the bottom icon (2), +will start capturing images: + +![](images_4/media/image43.png){width="6.5in" +height="4.041666666666667in"} + +Repeat the same procedure with the other classes + +![](images_4/media/image6.jpg){width="6.5in" +height="3.0972222222222223in"} + +> *We suggest around 60 images from each category. Try to capture +> different angles, backgrounds, and light conditions.* + +The stored images use a QVGA frame size 320x240 and RGB565 (color pixel +format). + +After capturing your dataset, close the Dataset Editor Tool on the Tools +\> Dataset Editor. + +On your computer, you will end with a dataset that contains three +classes: periquito, robot, and background. + +![](images_4/media/image20.png){width="6.5in" +height="2.2083333333333335in"} + +You should return to Edge Impulse Studio and upload the dataset to your +project. + +### **Training the model with Edge Impulse Studio** + +We will use the Edge Impulse Studio for training our model. Enter your +account credentials at Edge Impulse and create a new project: + +![](images_4/media/image45.png){width="6.5in" +height="4.263888888888889in"} + +> *Here, you can clone a similar project:* +> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).* + +### **Dataset** + +Using the EI Studio (or *Studio*), we will pass over four main steps to +have our model ready for use on the Nicla Vision board: Dataset, +Impulse, Tests, and Deploy (on the Edge Device, in this case, the +NiclaV). + +![](images_4/media/image41.jpg){width="6.5in" +height="4.194444444444445in"} + +Regarding the Dataset, it is essential to point out that our Original +Dataset, captured with the OpenMV IDE, will be split into three parts: +Training, Validation, and Test. The Test Set will be divided from the +beginning and left a part to be used only in the Test phase after +training. The Validation Set will be used during training. + +![](images_4/media/image7.jpg){width="6.5in" +height="4.763888888888889in"} + +On Studio, go to the Data acquisition tab, and on the UPLOAD DATA +section, upload from your computer the files from chosen categories: + +![](images_4/media/image39.png){width="6.5in" +height="4.263888888888889in"} + +Left to the Studio to automatically split the original dataset into +training and test and choose the label related to that specific data: + +![](images_4/media/image30.png){width="6.5in" +height="4.263888888888889in"} + +Repeat the procedure for all three classes. At the end, you should see +your \"raw data in the Studio: + +![](images_4/media/image11.png){width="6.5in" +height="4.263888888888889in"} + +The Studio allows you to explore your data, showing a complete view of +all the data in your project. You can clear, inspect, or change labels +by clicking on individual data items. In our case, a simple project, the +data seems OK. + +![](images_4/media/image44.png){width="6.5in" +height="4.263888888888889in"} + +### **The Impulse Design** + +In this phase, we should define how to: + +- Pre-process our data, which consists of resizing the individual + > images and determining the color depth to use (RGB or Grayscale) + > and + +- Design a Model that will be \"Transfer Learning (Images)\" to + > fine-tune a pre-trained MobileNet V2 image classification model on + > our data. This method performs well even with relatively small + > image datasets (around 150 images in our case). + +![](images_4/media/image23.jpg){width="6.5in" +height="4.0in"} + +Transfer Learning with MobileNet offers a streamlined approach to model +training, which is especially beneficial for resource-constrained +environments and projects with limited labeled data. MobileNet, known +for its lightweight architecture, is a pre-trained model that has +already learned valuable features from a large dataset (ImageNet). + +![](images_4/media/image9.jpg){width="6.5in" +height="1.9305555555555556in"} + +By leveraging these learned features, you can train a new model for your +specific task with fewer data and computational resources yet achieve +competitive accuracy. + +![](images_4/media/image32.jpg){width="6.5in" +height="2.3055555555555554in"} + +This approach significantly reduces training time and computational +cost, making it ideal for quick prototyping and deployment on embedded +devices where efficiency is paramount. + +Go to the Impulse Design Tab and create the *impulse*, defining an image +size of 96x96 and squashing them (squared form, without crop). Select +Image and Transfer Learning blocks. Save the Impulse. + +![](images_4/media/image16.png){width="6.5in" +height="4.263888888888889in"} + +### **Image Pre-Processing** + +All input QVGA/RGB565 images will be converted to 27,640 features +(96x96x3). + +![](images_4/media/image17.png){width="6.5in" +height="4.319444444444445in"} + +Press \[Save parameters\] and Generate all features: + +![](images_4/media/image5.png){width="6.5in" +height="4.263888888888889in"} + +### **Model Design** + +In 2007, Google introduced +[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), +a family of general-purpose computer vision neural networks designed +with mobile devices in mind to support classification, detection, and +more. MobileNets are small, low-latency, low-power models parameterized +to meet the resource constraints of various use cases. in 2018, Google +launched [MobileNetV2: Inverted Residuals and Linear +Bottlenecks](https://arxiv.org/abs/1801.04381). + +MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded +vision applications but differ in architectural complexity and +performance. While both use depthwise separable convolutions to reduce +the computational cost, MobileNet V2 introduces Inverted Residual Blocks +and Linear Bottlenecks to enhance performance. These new features allow +V2 to capture more complex features using fewer parameters, making it +computationally more efficient and generally more accurate than its +predecessor. Additionally, V2 employs a non-linear activation in the +intermediate expansion layer. Still, it uses a linear activation for the +bottleneck layer, a design choice found to preserve important +information through the network better. MobileNet V2 offers a more +optimized architecture for higher accuracy and efficiency and will be +used in this project. + +Although the base MobileNet architecture is already tiny and has low +latency, many times, a specific use case or application may require the +model to be smaller and faster. MobileNets introduces a straightforward +parameter α (alpha) called width multiplier to construct these smaller, +less computationally expensive models. The role of the width multiplier +α is to thin a network uniformly at each layer. + +Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2 +(96x96 and 160x160 images), with several different **α** values (from +0.05 to 1.0). For example, you will get the highest accuracy with V2, +160x160 images, and α=1.0. Of course, there is a trade-off. The higher +the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be +needed to run the model, implying more latency. The smaller footprint +will be obtained at another extreme with MobileNetV1 and α=0.10 (around +53.2K RAM and 101K ROM). + +![](images_4/media/image27.jpg){width="6.5in" +height="3.5277777777777777in"} + +For this project, we will use **MobileNetV2 96x96 0.1**, which estimates +a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla +Vision with 1MB of SRAM. On the Transfer Learning Tab, select this +model: + +![](images_4/media/image24.png){width="6.5in" +height="4.263888888888889in"} + +Another necessary technique to be used with Deep Learning is **Data +Augmentation**. Data augmentation is a method that can help improve the +accuracy of machine learning models, creating additional artificial +data. A data augmentation system makes small, random changes to your +training data during the training process (such as flipping, cropping, +or rotating the images). + +Under the rood, here you can see how Edge Impulse implements a data +Augmentation policy on your data: + +```{python} +# Implements the data augmentation policy +def augment_image(image, label): + # Flips the image randomly + image = tf.image.random_flip_left_right(image) + + # Increase the image size, then randomly crop it down to + # the original dimensions + resize_factor = random.uniform(1, 1.2) + new_height = math.floor(resize_factor * INPUT_SHAPE[0]) + new_width = math.floor(resize_factor * INPUT_SHAPE[1]) + image = tf.image.resize_with_crop_or_pad(image, new_height, new_width) + image = tf.image.random_crop(image, size=INPUT_SHAPE) + + # Vary the brightness of the image + image = tf.image.random_brightness(image, max_delta=0.2) + + return image, label + +``` +Exposure to these variations during training can help prevent your model +from taking shortcuts by \"memorizing\" superficial clues in your +training data, meaning it may better reflect the deep underlying +patterns in your dataset. + +The final layer of our model will have 12 neurons with a 15% dropout for +overfitting prevention. Here is the Training result: + +![](images_4/media/image31.jpg){width="6.5in" +height="3.5in"} + +The result is excellent, with 77ms of latency, which should result in +13fps (frames per second) during inference. + +### **Model Testing** + +![](images_4/media/image10.jpg){width="6.5in" +height="3.8472222222222223in"} + +Now, you should take the data put apart at the start of the project and +run the trained model having them as input: + +![](images_4/media/image34.png){width="3.1041666666666665in" +height="1.7083333333333333in"} + +The result was, again, excellent. + +![](images_4/media/image12.png){width="6.5in" +height="4.263888888888889in"} + +### **Deploying the model** + +At this point, we can deploy the trained model as.tflite and use the +OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ +or an Arduino library. + +![](images_4/media/image28.jpg){width="6.5in" +height="3.763888888888889in"} + +**Arduino Library** + +First, Let\'s deploy it as an Arduino Library: + +![](images_4/media/image48.png){width="6.5in" +height="4.263888888888889in"} + +You should install the library as.zip on the Arduino IDE and run the +sketch nicla_vision_camera.ino available in Examples under your library +name. + +> *Note that Arduino Nicla Vision has, by default, 512KB of RAM +> allocated for the M7 core and an additional 244KB on the M4 address +> space. In the code, this allocation was changed to 288 kB to guarantee +> that the model will run on the device +> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).* + +The result was good, with 86ms of measured latency. + +![](images_4/media/image25.jpg){width="6.5in" +height="3.4444444444444446in"} + +Here is a short video showing the inference results: +[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o) + +**OpenMV** + +It is possible to deploy the trained model to be used with OpenMV in two +ways: as a library and as a firmware. + +Three files are generated as a library: the.tflite model, a list with +the labels, and a simple MicroPython script that can make inferences +using the model. + +![](images_4/media/image26.png){width="6.5in" +height="1.0in"} + +Running this model as a.tflite directly in the Nicla was impossible. So, +we can sacrifice the accuracy using a smaller model or deploy the model +as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates +optimized models, libraries, and frameworks needed to make the +inference. Let\'s explore this last one. + +Select OpenMV Firmware on the Deploy Tab and press \[Build\]. + +![](images_4/media/image3.png){width="6.5in" +height="4.263888888888889in"} + +On your computer, you will find a ZIP file. Open it: + +![](images_4/media/image33.png){width="6.5in" +height="2.625in"} + +Use the Bootloader tool on the OpenMV IDE to load the FW on your board: + +![](images_4/media/image35.jpg){width="6.5in" +height="3.625in"} + +Select the appropriate file (.bin for Nicla-Vision): + +![](images_4/media/image8.png){width="6.5in" +height="1.9722222222222223in"} + +After the download is finished, press OK: + +![](images_4/media/image40.png){width="3.875in" +height="5.708333333333333in"} + +If a message says that the FW is outdated, DO NOT UPGRADE. Select +\[NO\]. + +![](images_4/media/image42.png){width="4.572916666666667in" +height="2.875in"} + +Now, open the script **ei_image_classification.py** that was downloaded +from the Studio and the.bin file for the Nicla. + +![](images_4/media/image14.png){width="6.5in" +height="4.0in"} + +And run it. Pointing the camera to the objects we want to classify, the +inference result will be displayed on the Serial Terminal. + +![](images_4/media/image37.png){width="6.5in" +height="3.736111111111111in"} + +**Changing Code to add labels:** + +The code provided by Edge Impulse can be modified so that we can see, +for test reasons, the inference result directly on the image displayed +on the OpenMV IDE. + +[[Upload the code from +GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) +or modify it as below: + +```{python} +# Marcelo Rovai - NICLA Vision - Image Classification +# Adapted from Edge Impulse - OpenMV Image Classification Example +# @24Aug23 + +import sensor, image, time, os, tf, uos, gc + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None + +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +clock = time.clock() +while(True): + clock.tick() # Starts tracking elapsed time. + + img = sensor.snapshot() + + # default settings just do one detection + for obj in net.classify(img, + min_scale=1.0, + scale_mul=0.8, + x_overlap=0.5, + y_overlap=0.5): + fps = clock.fps() + lat = clock.avg() + + print("**********\nPrediction:") + img.draw_rectangle(obj.rect()) + # This combines the labels and confidence values into a list of tuples + predictions_list = list(zip(labels, obj.output())) + + max_val = predictions_list[0][1] + max_lbl = 'background' + for i in range(len(predictions_list)): + val = predictions_list[i][1] + lbl = predictions_list[i][0] + + if val > max_val: + max_val = val + max_lbl = lbl + + # Print label with the highest probability + if max_val < 0.5: + max_lbl = 'uncertain' + print("{} with a prob of {:.2f}".format(max_lbl, max_val)) + print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) + + # Draw label with highest probability to image viewer + img.draw_string( + 10, 10, + max_lbl + "\n{:.2f}".format(max_val), + mono_space = False, + scale=2 + ) + +``` + +Here you can see the result: + +![](images_4/media/image47.jpg){width="6.5in" +height="2.9444444444444446in"} + +Note that the latency (136 ms) is almost double what we got directly +with the Arduino IDE. This is because we are using the IDE as an +interface and the time to wait for the camera to be ready. If we start +the clock just before the inference: + +![](images_4/media/image13.jpg){width="6.5in" +height="2.0972222222222223in"} + +The latency will drop to only 71 ms. + +![](images_4/media/image1.jpg){width="3.5520833333333335in" +height="1.53125in"} + +> *The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected.* + +### **Post-Processing with LEDs** + +When working with embedded machine learning, we are looking for devices +that can continually proceed with the inference and result, taking some +action directly on the physical world and not displaying the result on a +connected computer. To simulate this, we will define one LED to light up +for each one of the possible inference results. + +![](images_4/media/image38.jpg){width="6.5in" +height="3.236111111111111in"} + +For that, we should [[upload the code from +GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) +or change the last code to include the LEDs: + +```{python} +# Marcelo Rovai - NICLA Vision - Image Classification with LEDs +# Adapted from Edge Impulse - OpenMV Image Classification Example +# @24Aug23 + +import sensor, image, time, os, tf, uos, gc, pyb + +ledRed = pyb.LED(1) +ledGre = pyb.LED(2) +ledBlu = pyb.LED(3) + +sensor.reset() # Reset and initialize the sensor. +sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE) +sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) +sensor.set_windowing((240, 240)) # Set 240x240 window. +sensor.skip_frames(time=2000) # Let the camera adjust. + +net = None +labels = None + +ledRed.off() +ledGre.off() +ledBlu.off() + +try: + # Load built in model + labels, net = tf.load_builtin_model('trained') +except Exception as e: + raise Exception(e) + +clock = time.clock() + + +def setLEDs(max_lbl): + + if max_lbl == 'uncertain': + ledRed.on() + ledGre.off() + ledBlu.off() + + if max_lbl == 'periquito': + ledRed.off() + ledGre.on() + ledBlu.off() + + if max_lbl == 'robot': + ledRed.off() + ledGre.off() + ledBlu.on() + + if max_lbl == 'background': + ledRed.off() + ledGre.off() + ledBlu.off() + + +while(True): + img = sensor.snapshot() + clock.tick() # Starts tracking elapsed time. + + # default settings just do one detection. + for obj in net.classify(img, + min_scale=1.0, + scale_mul=0.8, + x_overlap=0.5, + y_overlap=0.5): + fps = clock.fps() + lat = clock.avg() + + print("**********\nPrediction:") + img.draw_rectangle(obj.rect()) + # This combines the labels and confidence values into a list of tuples + predictions_list = list(zip(labels, obj.output())) + + max_val = predictions_list[0][1] + max_lbl = 'background' + for i in range(len(predictions_list)): + val = predictions_list[i][1] + lbl = predictions_list[i][0] + + if val > max_val: + max_val = val + max_lbl = lbl + + # Print label and turn on LED with the highest probability + if max_val < 0.8: + max_lbl = 'uncertain' + + setLEDs(max_lbl) + + print("{} with a prob of {:.2f}".format(max_lbl, max_val)) + print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) + + # Draw label with highest probability to image viewer + img.draw_string( + 10, 10, + max_lbl + "\n{:.2f}".format(max_val), + mono_space = False, + scale=2 + ) + +``` + +Now, each time that a class gets a result superior of 0.8, the +correspondent LED will be light on as below: + +- Led Red 0n: Uncertain (no one class is over 0.8) + +- Led Green 0n: Periquito \> 0.8 + +- Led Blue 0n: Robot \> 0.8 + +- All LEDs Off: Background \> 0.8 + +Here is the result: + +![](images_4/media/image18.jpg){width="6.5in" +height="3.6527777777777777in"} + +In more detail + +![](images_4/media/image21.jpg){width="6.5in" +height="2.0972222222222223in"} + +### **Image Classification (non-official) Benchmark** + +Several development boards can be used for embedded machine learning +(tinyML), and the most common ones for Computer Vision applications +(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the +Arduinos Nicla Vison, and Portenta. + +![](images_4/media/image19.jpg){width="6.5in" +height="4.194444444444445in"} + +Using the opportunity, the same trained model was deployed on the +ESP-CAM, the XIAO, and Portenta (in this one, the model was trained +again, using grayscaled images to be compatible with its camera. Here is +the result, deploying the models as Arduino\'s Library: + +![](images_4/media/image4.jpg){width="6.5in" +height="3.4444444444444446in"} + +### **Conclusion** + +Before we finish, consider that Computer Vision is more than just image +classification. For example, you can develop Edge Machine Learning +projects around vision in several areas, such as: + +- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer + > vision algorithms to navigate and make decisions. + +- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, + > and CT scan image analysis + +- **Retail**: Automated checkout systems that identify products as + > they pass through a scanner. + +- **Security and Surveillance**: Facial recognition, anomaly + > detection, and object tracking in real-time video feeds. + +- **Augmented Reality**: Object detection and classification to + > overlay digital information in the real world. + +- **Industrial Automation**: Visual inspection of products, predictive + > maintenance, and robot and drone guidance. + +- **Agriculture**: Drone-based crop monitoring and automated + > harvesting. + +- **Natural Language Processing**: Image captioning and visual + > question answering. + +- **Gesture Recognition**: For gaming, sign language translation, and + > human-machine interaction. + +- **Content Recommendation**: Image-based recommendation systems in + > e-commerce. \ No newline at end of file From d9d07f294c539cd8ee2089ab192d75509a2c519f Mon Sep 17 00:00:00 2001 From: Marcelo Rovai Date: Fri, 6 Oct 2023 15:09:21 -0300 Subject: [PATCH 2/2] Delete embedded_ml_exercise.qmd --- embedded_ml_exercise.qmd | 745 -------------------------------------- 1 file changed, 745 deletions(-) delete mode 100644 embedded_ml_exercise.qmd diff --git a/ embedded_ml_exercise.qmd b/ embedded_ml_exercise.qmd deleted file mode 100644 index d07ef188..00000000 --- a/ embedded_ml_exercise.qmd +++ /dev/null @@ -1,745 +0,0 @@ ---- -title: "4 Embedded AI - Exercise: Image Classification" -format: - html: - code-fold: false -execute: - eval: false -jupyter: python3 ---- - -### **Introduction** - -As we initiate our studies into embedded machine learning or tinyML, -it\'s impossible to overlook the transformative impact of Computer -Vision (CV) and Artificial Intelligence (AI) in our lives. These two -intertwined disciplines redefine what machines can perceive and -accomplish, from autonomous vehicles and robotics to healthcare and -surveillance. - -More and more, we are facing an artificial intelligence (AI) revolution -where, as stated by Gartner, **Edge AI** has a very high impact -potential, and **it is for now**! - -![](images_4/media/image2.jpg){width="4.729166666666667in" -height="4.895833333333333in"} - -In the \"bull-eye\" of emerging technologies, radar is the *Edge -Computer Vision*, and when we talk about Machine Learning (ML) applied -to vision, the first thing that comes to mind is **Image -Classification**, a kind of ML \"Hello World\"! - -This exercise will explore a computer vision project utilizing -Convolutional Neural Networks (CNNs) for real-time image classification. -Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a -pre-trained MobileNet model and adapt it for edge deployment. The focus -will be optimizing the model to run efficiently on resource-constrained -hardware without sacrificing accuracy. - -We\'ll employ techniques like quantization and pruning to reduce the -computational load. By the end of this tutorial, you\'ll have a working -prototype capable of classifying images in real time, all running on a -low-power embedded system based on the Arduino Nicla Vision board. - -### **Computer Vision** - -At its core, computer vision aims to enable machines to interpret and -make decisions based on visual data from the world---essentially -mimicking the capability of the human optical system. Conversely, AI is -a broader field encompassing machine learning, natural language -processing, and robotics, among other technologies. When you bring AI -algorithms into computer vision projects, you supercharge the system\'s -ability to understand, interpret, and react to visual stimuli. - -When discussing Computer Vision projects applied to embedded devices, -the most common applications that come to mind are *Image -Classification* and *Object Detection*. - -![](images_4/media/image15.jpg){width="6.5in" -height="2.8333333333333335in"} - -Both models can be implemented on tiny devices like the Arduino Nicla -Vision and used on real projects. Let\'s start with the first one. - -### **Image Classification Project** - -The first step in any ML project is to define our goal. In this case, it -is to detect and classify two specific objects present in one image. For -this project, we will use two small toys: a *robot* and a small -Brazilian parrot (named *Periquito*). Also, we will collect images of a -*background* where those two objects are absent. - -![](images_4/media/image36.jpg){width="6.5in" -height="3.638888888888889in"} - -### **Data Collection** - -Once you have defined your Machine Learning project goal, the next and -most crucial step is the dataset collection. You can use the Edge -Impulse Studio, the OpenMV IDE we installed, or even your phone for the -image capture. Here, we will use the OpenMV IDE for that. - -**Collecting Dataset with OpenMV IDE** - -First, create in your computer a folder where your data will be saved, -for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset -Editor and select New Dataset to start the dataset collection: - -![](images_4/media/image29.png){width="6.291666666666667in" -height="4.010416666666667in"} - -The IDE will ask you to open the file where your data will be saved and -choose the \"data\" folder that was created. Note that new icons will -appear on the Left panel. - -![](images_4/media/image46.png){width="0.9583333333333334in" -height="1.5208333333333333in"} - -Using the upper icon (1), enter with the first class name, for example, -\"periquito\": - -![](images_4/media/image22.png){width="3.25in" -height="2.65625in"} - -Run the dataset_capture_script.py, and clicking on the bottom icon (2), -will start capturing images: - -![](images_4/media/image43.png){width="6.5in" -height="4.041666666666667in"} - -Repeat the same procedure with the other classes - -![](images_4/media/image6.jpg){width="6.5in" -height="3.0972222222222223in"} - -> *We suggest around 60 images from each category. Try to capture -> different angles, backgrounds, and light conditions.* - -The stored images use a QVGA frame size 320x240 and RGB565 (color pixel -format). - -After capturing your dataset, close the Dataset Editor Tool on the Tools -\> Dataset Editor. - -On your computer, you will end with a dataset that contains three -classes: periquito, robot, and background. - -![](images_4/media/image20.png){width="6.5in" -height="2.2083333333333335in"} - -You should return to Edge Impulse Studio and upload the dataset to your -project. - -### **Training the model with Edge Impulse Studio** - -We will use the Edge Impulse Studio for training our model. Enter your -account credentials at Edge Impulse and create a new project: - -![](images_4/media/image45.png){width="6.5in" -height="4.263888888888889in"} - -> *Here, you can clone a similar project:* -> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).* - -### **Dataset** - -Using the EI Studio (or *Studio*), we will pass over four main steps to -have our model ready for use on the Nicla Vision board: Dataset, -Impulse, Tests, and Deploy (on the Edge Device, in this case, the -NiclaV). - -![](images_4/media/image41.jpg){width="6.5in" -height="4.194444444444445in"} - -Regarding the Dataset, it is essential to point out that our Original -Dataset, captured with the OpenMV IDE, will be split into three parts: -Training, Validation, and Test. The Test Set will be divided from the -beginning and left a part to be used only in the Test phase after -training. The Validation Set will be used during training. - -![](images_4/media/image7.jpg){width="6.5in" -height="4.763888888888889in"} - -On Studio, go to the Data acquisition tab, and on the UPLOAD DATA -section, upload from your computer the files from chosen categories: - -![](images_4/media/image39.png){width="6.5in" -height="4.263888888888889in"} - -Left to the Studio to automatically split the original dataset into -training and test and choose the label related to that specific data: - -![](images_4/media/image30.png){width="6.5in" -height="4.263888888888889in"} - -Repeat the procedure for all three classes. At the end, you should see -your \"raw data in the Studio: - -![](images_4/media/image11.png){width="6.5in" -height="4.263888888888889in"} - -The Studio allows you to explore your data, showing a complete view of -all the data in your project. You can clear, inspect, or change labels -by clicking on individual data items. In our case, a simple project, the -data seems OK. - -![](images_4/media/image44.png){width="6.5in" -height="4.263888888888889in"} - -### **The Impulse Design** - -In this phase, we should define how to: - -- Pre-process our data, which consists of resizing the individual - > images and determining the color depth to use (RGB or Grayscale) - > and - -- Design a Model that will be \"Transfer Learning (Images)\" to - > fine-tune a pre-trained MobileNet V2 image classification model on - > our data. This method performs well even with relatively small - > image datasets (around 150 images in our case). - -![](images_4/media/image23.jpg){width="6.5in" -height="4.0in"} - -Transfer Learning with MobileNet offers a streamlined approach to model -training, which is especially beneficial for resource-constrained -environments and projects with limited labeled data. MobileNet, known -for its lightweight architecture, is a pre-trained model that has -already learned valuable features from a large dataset (ImageNet). - -![](images_4/media/image9.jpg){width="6.5in" -height="1.9305555555555556in"} - -By leveraging these learned features, you can train a new model for your -specific task with fewer data and computational resources yet achieve -competitive accuracy. - -![](images_4/media/image32.jpg){width="6.5in" -height="2.3055555555555554in"} - -This approach significantly reduces training time and computational -cost, making it ideal for quick prototyping and deployment on embedded -devices where efficiency is paramount. - -Go to the Impulse Design Tab and create the *impulse*, defining an image -size of 96x96 and squashing them (squared form, without crop). Select -Image and Transfer Learning blocks. Save the Impulse. - -![](images_4/media/image16.png){width="6.5in" -height="4.263888888888889in"} - -### **Image Pre-Processing** - -All input QVGA/RGB565 images will be converted to 27,640 features -(96x96x3). - -![](images_4/media/image17.png){width="6.5in" -height="4.319444444444445in"} - -Press \[Save parameters\] and Generate all features: - -![](images_4/media/image5.png){width="6.5in" -height="4.263888888888889in"} - -### **Model Design** - -In 2007, Google introduced -[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html), -a family of general-purpose computer vision neural networks designed -with mobile devices in mind to support classification, detection, and -more. MobileNets are small, low-latency, low-power models parameterized -to meet the resource constraints of various use cases. in 2018, Google -launched [MobileNetV2: Inverted Residuals and Linear -Bottlenecks](https://arxiv.org/abs/1801.04381). - -MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded -vision applications but differ in architectural complexity and -performance. While both use depthwise separable convolutions to reduce -the computational cost, MobileNet V2 introduces Inverted Residual Blocks -and Linear Bottlenecks to enhance performance. These new features allow -V2 to capture more complex features using fewer parameters, making it -computationally more efficient and generally more accurate than its -predecessor. Additionally, V2 employs a non-linear activation in the -intermediate expansion layer. Still, it uses a linear activation for the -bottleneck layer, a design choice found to preserve important -information through the network better. MobileNet V2 offers a more -optimized architecture for higher accuracy and efficiency and will be -used in this project. - -Although the base MobileNet architecture is already tiny and has low -latency, many times, a specific use case or application may require the -model to be smaller and faster. MobileNets introduces a straightforward -parameter α (alpha) called width multiplier to construct these smaller, -less computationally expensive models. The role of the width multiplier -α is to thin a network uniformly at each layer. - -Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2 -(96x96 and 160x160 images), with several different **α** values (from -0.05 to 1.0). For example, you will get the highest accuracy with V2, -160x160 images, and α=1.0. Of course, there is a trade-off. The higher -the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be -needed to run the model, implying more latency. The smaller footprint -will be obtained at another extreme with MobileNetV1 and α=0.10 (around -53.2K RAM and 101K ROM). - -![](images_4/media/image27.jpg){width="6.5in" -height="3.5277777777777777in"} - -For this project, we will use **MobileNetV2 96x96 0.1**, which estimates -a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla -Vision with 1MB of SRAM. On the Transfer Learning Tab, select this -model: - -![](images_4/media/image24.png){width="6.5in" -height="4.263888888888889in"} - -Another necessary technique to be used with Deep Learning is **Data -Augmentation**. Data augmentation is a method that can help improve the -accuracy of machine learning models, creating additional artificial -data. A data augmentation system makes small, random changes to your -training data during the training process (such as flipping, cropping, -or rotating the images). - -Under the rood, here you can see how Edge Impulse implements a data -Augmentation policy on your data: - -```{python} -# Implements the data augmentation policy -def augment_image(image, label): - # Flips the image randomly - image = tf.image.random_flip_left_right(image) - - # Increase the image size, then randomly crop it down to - # the original dimensions - resize_factor = random.uniform(1, 1.2) - new_height = math.floor(resize_factor * INPUT_SHAPE[0]) - new_width = math.floor(resize_factor * INPUT_SHAPE[1]) - image = tf.image.resize_with_crop_or_pad(image, new_height, new_width) - image = tf.image.random_crop(image, size=INPUT_SHAPE) - - # Vary the brightness of the image - image = tf.image.random_brightness(image, max_delta=0.2) - - return image, label - -``` -Exposure to these variations during training can help prevent your model -from taking shortcuts by \"memorizing\" superficial clues in your -training data, meaning it may better reflect the deep underlying -patterns in your dataset. - -The final layer of our model will have 12 neurons with a 15% dropout for -overfitting prevention. Here is the Training result: - -![](images_4/media/image31.jpg){width="6.5in" -height="3.5in"} - -The result is excellent, with 77ms of latency, which should result in -13fps (frames per second) during inference. - -### **Model Testing** - -![](images_4/media/image10.jpg){width="6.5in" -height="3.8472222222222223in"} - -Now, you should take the data put apart at the start of the project and -run the trained model having them as input: - -![](images_4/media/image34.png){width="3.1041666666666665in" -height="1.7083333333333333in"} - -The result was, again, excellent. - -![](images_4/media/image12.png){width="6.5in" -height="4.263888888888889in"} - -### **Deploying the model** - -At this point, we can deploy the trained model as.tflite and use the -OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++ -or an Arduino library. - -![](images_4/media/image28.jpg){width="6.5in" -height="3.763888888888889in"} - -**Arduino Library** - -First, Let\'s deploy it as an Arduino Library: - -![](images_4/media/image48.png){width="6.5in" -height="4.263888888888889in"} - -You should install the library as.zip on the Arduino IDE and run the -sketch nicla_vision_camera.ino available in Examples under your library -name. - -> *Note that Arduino Nicla Vision has, by default, 512KB of RAM -> allocated for the M7 core and an additional 244KB on the M4 address -> space. In the code, this allocation was changed to 288 kB to guarantee -> that the model will run on the device -> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).* - -The result was good, with 86ms of measured latency. - -![](images_4/media/image25.jpg){width="6.5in" -height="3.4444444444444446in"} - -Here is a short video showing the inference results: -[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o) - -**OpenMV** - -It is possible to deploy the trained model to be used with OpenMV in two -ways: as a library and as a firmware. - -Three files are generated as a library: the.tflite model, a list with -the labels, and a simple MicroPython script that can make inferences -using the model. - -![](images_4/media/image26.png){width="6.5in" -height="1.0in"} - -Running this model as a.tflite directly in the Nicla was impossible. So, -we can sacrifice the accuracy using a smaller model or deploy the model -as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates -optimized models, libraries, and frameworks needed to make the -inference. Let\'s explore this last one. - -Select OpenMV Firmware on the Deploy Tab and press \[Build\]. - -![](images_4/media/image3.png){width="6.5in" -height="4.263888888888889in"} - -On your computer, you will find a ZIP file. Open it: - -![Pasted Graphic -64.png](images_4/media/image33.png){width="6.5in" -height="2.625in"} - -Use the Bootloader tool on the OpenMV IDE to load the FW on your board: - -![Pasted Graphic -63.png](images_4/media/image35.jpg){width="6.5in" -height="3.625in"} - -Select the appropriate file (.bin for Nicla-Vision): - -![Pasted Graphic -65.png](images_4/media/image8.png){width="6.5in" -height="1.9722222222222223in"} - -After the download is finished, press OK: - -![DFU firmware update -complete!.png](images_4/media/image40.png){width="3.875in" -height="5.708333333333333in"} - -If a message says that the FW is outdated, DO NOT UPGRADE. Select -\[NO\]. - -![](images_4/media/image42.png){width="4.572916666666667in" -height="2.875in"} - -Now, open the script **ei_image_classification.py** that was downloaded -from the Studio and the.bin file for the Nicla. - -![](images_4/media/image14.png){width="6.5in" -height="4.0in"} - -And run it. Pointing the camera to the objects we want to classify, the -inference result will be displayed on the Serial Terminal. - -![](images_4/media/image37.png){width="6.5in" -height="3.736111111111111in"} - -**Changing Code to add labels:** - -The code provided by Edge Impulse can be modified so that we can see, -for test reasons, the inference result directly on the image displayed -on the OpenMV IDE. - -[[Upload the code from -GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py) -or modify it as below: - -```{python} -# Marcelo Rovai - NICLA Vision - Image Classification -# Adapted from Edge Impulse - OpenMV Image Classification Example -# @24Aug23 - -import sensor, image, time, os, tf, uos, gc - -sensor.reset() # Reset and initialize the sensor. -sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE) -sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) -sensor.set_windowing((240, 240)) # Set 240x240 window. -sensor.skip_frames(time=2000) # Let the camera adjust. - -net = None -labels = None - -try: - # Load built in model - labels, net = tf.load_builtin_model('trained') -except Exception as e: - raise Exception(e) - -clock = time.clock() -while(True): - clock.tick() # Starts tracking elapsed time. - - img = sensor.snapshot() - - # default settings just do one detection - for obj in net.classify(img, - min_scale=1.0, - scale_mul=0.8, - x_overlap=0.5, - y_overlap=0.5): - fps = clock.fps() - lat = clock.avg() - - print("**********\nPrediction:") - img.draw_rectangle(obj.rect()) - # This combines the labels and confidence values into a list of tuples - predictions_list = list(zip(labels, obj.output())) - - max_val = predictions_list[0][1] - max_lbl = 'background' - for i in range(len(predictions_list)): - val = predictions_list[i][1] - lbl = predictions_list[i][0] - - if val > max_val: - max_val = val - max_lbl = lbl - - # Print label with the highest probability - if max_val < 0.5: - max_lbl = 'uncertain' - print("{} with a prob of {:.2f}".format(max_lbl, max_val)) - print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) - - # Draw label with highest probability to image viewer - img.draw_string( - 10, 10, - max_lbl + "\n{:.2f}".format(max_val), - mono_space = False, - scale=2 - ) - -``` - -Here you can see the result: - -![](images_4/media/image47.jpg){width="6.5in" -height="2.9444444444444446in"} - -Note that the latency (136 ms) is almost double what we got directly -with the Arduino IDE. This is because we are using the IDE as an -interface and the time to wait for the camera to be ready. If we start -the clock just before the inference: - -![](images_4/media/image13.jpg){width="6.5in" -height="2.0972222222222223in"} - -The latency will drop to only 71 ms. - -![](images_4/media/image1.jpg){width="3.5520833333333335in" -height="1.53125in"} - -### ***OpenMV Cam runs about half as fast when connected to the IDE. The FPS should increase once disconnected.*** - -### **Post-Processing with LEDs** - -When working with embedded machine learning, we are looking for devices -that can continually proceed with the inference and result, taking some -action directly on the physical world and not displaying the result on a -connected computer. To simulate this, we will define one LED to light up -for each one of the possible inference results. - -![](images_4/media/image38.jpg){width="6.5in" -height="3.236111111111111in"} - -For that, we should [[upload the code from -GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py) -or change the last code to include the LEDs: - -```{python} -# Marcelo Rovai - NICLA Vision - Image Classification with LEDs -# Adapted from Edge Impulse - OpenMV Image Classification Example -# @24Aug23 - -import sensor, image, time, os, tf, uos, gc, pyb - -ledRed = pyb.LED(1) -ledGre = pyb.LED(2) -ledBlu = pyb.LED(3) - -sensor.reset() # Reset and initialize the sensor. -sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE) -sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240) -sensor.set_windowing((240, 240)) # Set 240x240 window. -sensor.skip_frames(time=2000) # Let the camera adjust. - -net = None -labels = None - -ledRed.off() -ledGre.off() -ledBlu.off() - -try: - # Load built in model - labels, net = tf.load_builtin_model('trained') -except Exception as e: - raise Exception(e) - -clock = time.clock() - - -def setLEDs(max_lbl): - - if max_lbl == 'uncertain': - ledRed.on() - ledGre.off() - ledBlu.off() - - if max_lbl == 'periquito': - ledRed.off() - ledGre.on() - ledBlu.off() - - if max_lbl == 'robot': - ledRed.off() - ledGre.off() - ledBlu.on() - - if max_lbl == 'background': - ledRed.off() - ledGre.off() - ledBlu.off() - - -while(True): - img = sensor.snapshot() - clock.tick() # Starts tracking elapsed time. - - # default settings just do one detection. - for obj in net.classify(img, - min_scale=1.0, - scale_mul=0.8, - x_overlap=0.5, - y_overlap=0.5): - fps = clock.fps() - lat = clock.avg() - - print("**********\nPrediction:") - img.draw_rectangle(obj.rect()) - # This combines the labels and confidence values into a list of tuples - predictions_list = list(zip(labels, obj.output())) - - max_val = predictions_list[0][1] - max_lbl = 'background' - for i in range(len(predictions_list)): - val = predictions_list[i][1] - lbl = predictions_list[i][0] - - if val > max_val: - max_val = val - max_lbl = lbl - - # Print label and turn on LED with the highest probability - if max_val < 0.8: - max_lbl = 'uncertain' - - setLEDs(max_lbl) - - print("{} with a prob of {:.2f}".format(max_lbl, max_val)) - print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat)) - - # Draw label with highest probability to image viewer - img.draw_string( - 10, 10, - max_lbl + "\n{:.2f}".format(max_val), - mono_space = False, - scale=2 - ) - -``` - -Now, each time that a class gets a result superior of 0.8, the -correspondent LED will be light on as below: - -- Led Red 0n: Uncertain (no one class is over 0.8) - -- Led Green 0n: Periquito \> 0.8 - -- Led Blue 0n: Robot \> 0.8 - -- All LEDs Off: Background \> 0.8 - -Here is the result: - -![](images_4/media/image18.jpg){width="6.5in" -height="3.6527777777777777in"} - -In more detail - -![](images_4/media/image21.jpg){width="6.5in" -height="2.0972222222222223in"} - -### **Image Classification (non-official) Benchmark** - -Several development boards can be used for embedded machine learning -(tinyML), and the most common ones for Computer Vision applications -(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the -Arduinos Nicla Vison, and Portenta. - -![](images_4/media/image19.jpg){width="6.5in" -height="4.194444444444445in"} - -Using the opportunity, the same trained model was deployed on the -ESP-CAM, the XIAO, and Portenta (in this one, the model was trained -again, using grayscaled images to be compatible with its camera. Here is -the result, deploying the models as Arduino\'s Library: - -![](images_4/media/image4.jpg){width="6.5in" -height="3.4444444444444446in"} - -### **Conclusion** - -Before we finish, consider that Computer Vision is more than just image -classification. For example, you can develop Edge Machine Learning -projects around vision in several areas, such as: - -- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer - > vision algorithms to navigate and make decisions. - -- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray, - > and CT scan image analysis - -- **Retail**: Automated checkout systems that identify products as - > they pass through a scanner. - -- **Security and Surveillance**: Facial recognition, anomaly - > detection, and object tracking in real-time video feeds. - -- **Augmented Reality**: Object detection and classification to - > overlay digital information in the real world. - -- **Industrial Automation**: Visual inspection of products, predictive - > maintenance, and robot and drone guidance. - -- **Agriculture**: Drone-based crop monitoring and automated - > harvesting. - -- **Natural Language Processing**: Image captioning and visual - > question answering. - -- **Gesture Recognition**: For gaming, sign language translation, and - > human-machine interaction. - -- **Content Recommendation**: Image-based recommendation systems in - > e-commerce.