forked from harvard-edge/cs249r_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
embedded_ml_exercise.qmd
727 lines (525 loc) · 24.9 KB
/
embedded_ml_exercise.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
# CV on Nicla Vision {.unnumbered}
As we initiate our studies into embedded machine learning or tinyML,
it\'s impossible to overlook the transformative impact of Computer
Vision (CV) and Artificial Intelligence (AI) in our lives. These two
intertwined disciplines redefine what machines can perceive and
accomplish, from autonomous vehicles and robotics to healthcare and
surveillance.
More and more, we are facing an artificial intelligence (AI) revolution
where, as stated by Gartner, **Edge AI** has a very high impact
potential, and **it is for now**!
![](images_4/media/image2.jpg){width="4.729166666666667in"
height="4.895833333333333in"}
In the \"bull-eye\" of emerging technologies, radar is the *Edge
Computer Vision*, and when we talk about Machine Learning (ML) applied
to vision, the first thing that comes to mind is **Image
Classification**, a kind of ML \"Hello World\"!
This exercise will explore a computer vision project utilizing
Convolutional Neural Networks (CNNs) for real-time image classification.
Leveraging TensorFlow\'s robust ecosystem, we\'ll implement a
pre-trained MobileNet model and adapt it for edge deployment. The focus
will be optimizing the model to run efficiently on resource-constrained
hardware without sacrificing accuracy.
We\'ll employ techniques like quantization and pruning to reduce the
computational load. By the end of this tutorial, you\'ll have a working
prototype capable of classifying images in real time, all running on a
low-power embedded system based on the Arduino Nicla Vision board.
## Computer Vision
At its core, computer vision aims to enable machines to interpret and
make decisions based on visual data from the world---essentially
mimicking the capability of the human optical system. Conversely, AI is
a broader field encompassing machine learning, natural language
processing, and robotics, among other technologies. When you bring AI
algorithms into computer vision projects, you supercharge the system\'s
ability to understand, interpret, and react to visual stimuli.
When discussing Computer Vision projects applied to embedded devices,
the most common applications that come to mind are *Image
Classification* and *Object Detection*.
![](images_4/media/image15.jpg){width="6.5in"
height="2.8333333333333335in"}
Both models can be implemented on tiny devices like the Arduino Nicla
Vision and used on real projects. Let\'s start with the first one.
## Image Classification Project
The first step in any ML project is to define our goal. In this case, it
is to detect and classify two specific objects present in one image. For
this project, we will use two small toys: a *robot* and a small
Brazilian parrot (named *Periquito*). Also, we will collect images of a
*background* where those two objects are absent.
![](images_4/media/image36.jpg){width="6.5in"
height="3.638888888888889in"}
## Data Collection
Once you have defined your Machine Learning project goal, the next and
most crucial step is the dataset collection. You can use the Edge
Impulse Studio, the OpenMV IDE we installed, or even your phone for the
image capture. Here, we will use the OpenMV IDE for that.
**Collecting Dataset with OpenMV IDE**
First, create in your computer a folder where your data will be saved,
for example, \"data.\" Next, on the OpenMV IDE, go to Tools \> Dataset
Editor and select New Dataset to start the dataset collection:
![](images_4/media/image29.png){width="6.291666666666667in"
height="4.010416666666667in"}
The IDE will ask you to open the file where your data will be saved and
choose the \"data\" folder that was created. Note that new icons will
appear on the Left panel.
![](images_4/media/image46.png){width="0.9583333333333334in"
height="1.5208333333333333in"}
Using the upper icon (1), enter with the first class name, for example,
\"periquito\":
![](images_4/media/image22.png){width="3.25in"
height="2.65625in"}
Run the dataset_capture_script.py, and clicking on the bottom icon (2),
will start capturing images:
![](images_4/media/image43.png){width="6.5in"
height="4.041666666666667in"}
Repeat the same procedure with the other classes
![](images_4/media/image6.jpg){width="6.5in"
height="3.0972222222222223in"}
> *We suggest around 60 images from each category. Try to capture
> different angles, backgrounds, and light conditions.*
The stored images use a QVGA frame size 320x240 and RGB565 (color pixel
format).
After capturing your dataset, close the Dataset Editor Tool on the Tools
\> Dataset Editor.
On your computer, you will end with a dataset that contains three
classes: periquito, robot, and background.
![](images_4/media/image20.png){width="6.5in"
height="2.2083333333333335in"}
You should return to Edge Impulse Studio and upload the dataset to your
project.
## Training the model with Edge Impulse Studio
We will use the Edge Impulse Studio for training our model. Enter your
account credentials at Edge Impulse and create a new project:
![](images_4/media/image45.png){width="6.5in"
height="4.263888888888889in"}
> *Here, you can clone a similar project:*
> *[NICLA-Vision_Image_Classification](https://studio.edgeimpulse.com/public/273858/latest).*
## Dataset
Using the EI Studio (or *Studio*), we will pass over four main steps to
have our model ready for use on the Nicla Vision board: Dataset,
Impulse, Tests, and Deploy (on the Edge Device, in this case, the
NiclaV).
![](images_4/media/image41.jpg){width="6.5in"
height="4.194444444444445in"}
Regarding the Dataset, it is essential to point out that our Original
Dataset, captured with the OpenMV IDE, will be split into three parts:
Training, Validation, and Test. The Test Set will be divided from the
beginning and left a part to be used only in the Test phase after
training. The Validation Set will be used during training.
![](images_4/media/image7.jpg){width="6.5in"
height="4.763888888888889in"}
On Studio, go to the Data acquisition tab, and on the UPLOAD DATA
section, upload from your computer the files from chosen categories:
![](images_4/media/image39.png){width="6.5in"
height="4.263888888888889in"}
Left to the Studio to automatically split the original dataset into
training and test and choose the label related to that specific data:
![](images_4/media/image30.png){width="6.5in"
height="4.263888888888889in"}
Repeat the procedure for all three classes. At the end, you should see
your \"raw data in the Studio:
![](images_4/media/image11.png){width="6.5in"
height="4.263888888888889in"}
The Studio allows you to explore your data, showing a complete view of
all the data in your project. You can clear, inspect, or change labels
by clicking on individual data items. In our case, a simple project, the
data seems OK.
![](images_4/media/image44.png){width="6.5in"
height="4.263888888888889in"}
## The Impulse Design
In this phase, we should define how to:
- Pre-process our data, which consists of resizing the individual
> images and determining the color depth to use (RGB or Grayscale)
> and
- Design a Model that will be \"Transfer Learning (Images)\" to
> fine-tune a pre-trained MobileNet V2 image classification model on
> our data. This method performs well even with relatively small
> image datasets (around 150 images in our case).
![](images_4/media/image23.jpg){width="6.5in"
height="4.0in"}
Transfer Learning with MobileNet offers a streamlined approach to model
training, which is especially beneficial for resource-constrained
environments and projects with limited labeled data. MobileNet, known
for its lightweight architecture, is a pre-trained model that has
already learned valuable features from a large dataset (ImageNet).
![](images_4/media/image9.jpg){width="6.5in"
height="1.9305555555555556in"}
By leveraging these learned features, you can train a new model for your
specific task with fewer data and computational resources yet achieve
competitive accuracy.
![](images_4/media/image32.jpg){width="6.5in"
height="2.3055555555555554in"}
This approach significantly reduces training time and computational
cost, making it ideal for quick prototyping and deployment on embedded
devices where efficiency is paramount.
Go to the Impulse Design Tab and create the *impulse*, defining an image
size of 96x96 and squashing them (squared form, without crop). Select
Image and Transfer Learning blocks. Save the Impulse.
![](images_4/media/image16.png){width="6.5in"
height="4.263888888888889in"}
### **Image Pre-Processing**
All input QVGA/RGB565 images will be converted to 27,640 features
(96x96x3).
![](images_4/media/image17.png){width="6.5in"
height="4.319444444444445in"}
Press \[Save parameters\] and Generate all features:
![](images_4/media/image5.png){width="6.5in"
height="4.263888888888889in"}
## Model Design
In 2007, Google introduced
[[MobileNetV1]{.underline}](https://research.googleblog.com/2017/06/mobilenets-open-source-models-for.html),
a family of general-purpose computer vision neural networks designed
with mobile devices in mind to support classification, detection, and
more. MobileNets are small, low-latency, low-power models parameterized
to meet the resource constraints of various use cases. in 2018, Google
launched [MobileNetV2: Inverted Residuals and Linear
Bottlenecks](https://arxiv.org/abs/1801.04381).
MobileNet V1 and MobileNet V2 aim for mobile efficiency and embedded
vision applications but differ in architectural complexity and
performance. While both use depthwise separable convolutions to reduce
the computational cost, MobileNet V2 introduces Inverted Residual Blocks
and Linear Bottlenecks to enhance performance. These new features allow
V2 to capture more complex features using fewer parameters, making it
computationally more efficient and generally more accurate than its
predecessor. Additionally, V2 employs a non-linear activation in the
intermediate expansion layer. Still, it uses a linear activation for the
bottleneck layer, a design choice found to preserve important
information through the network better. MobileNet V2 offers a more
optimized architecture for higher accuracy and efficiency and will be
used in this project.
Although the base MobileNet architecture is already tiny and has low
latency, many times, a specific use case or application may require the
model to be smaller and faster. MobileNets introduces a straightforward
parameter α (alpha) called width multiplier to construct these smaller,
less computationally expensive models. The role of the width multiplier
α is to thin a network uniformly at each layer.
Edge Impulse Studio has available MobileNetV1 (96x96 images) and V2
(96x96 and 160x160 images), with several different **α** values (from
0.05 to 1.0). For example, you will get the highest accuracy with V2,
160x160 images, and α=1.0. Of course, there is a trade-off. The higher
the accuracy, the more memory (around 1.3M RAM and 2.6M ROM) will be
needed to run the model, implying more latency. The smaller footprint
will be obtained at another extreme with MobileNetV1 and α=0.10 (around
53.2K RAM and 101K ROM).
![](images_4/media/image27.jpg){width="6.5in"
height="3.5277777777777777in"}
For this project, we will use **MobileNetV2 96x96 0.1**, which estimates
a memory cost of 265.3 KB in RAM. This model should be OK for the Nicla
Vision with 1MB of SRAM. On the Transfer Learning Tab, select this
model:
![](images_4/media/image24.png){width="6.5in"
height="4.263888888888889in"}
Another necessary technique to be used with Deep Learning is **Data
Augmentation**. Data augmentation is a method that can help improve the
accuracy of machine learning models, creating additional artificial
data. A data augmentation system makes small, random changes to your
training data during the training process (such as flipping, cropping,
or rotating the images).
Under the rood, here you can see how Edge Impulse implements a data
Augmentation policy on your data:
```python
# Implements the data augmentation policy
def augment_image(image, label):
# Flips the image randomly
image = tf.image.random_flip_left_right(image)
# Increase the image size, then randomly crop it down to
# the original dimensions
resize_factor = random.uniform(1, 1.2)
new_height = math.floor(resize_factor * INPUT_SHAPE[0])
new_width = math.floor(resize_factor * INPUT_SHAPE[1])
image = tf.image.resize_with_crop_or_pad(image, new_height, new_width)
image = tf.image.random_crop(image, size=INPUT_SHAPE)
# Vary the brightness of the image
image = tf.image.random_brightness(image, max_delta=0.2)
return image, label
```
Exposure to these variations during training can help prevent your model
from taking shortcuts by \"memorizing\" superficial clues in your
training data, meaning it may better reflect the deep underlying
patterns in your dataset.
The final layer of our model will have 12 neurons with a 15% dropout for
overfitting prevention. Here is the Training result:
![](images_4/media/image31.jpg){width="6.5in"
height="3.5in"}
The result is excellent, with 77ms of latency, which should result in
13fps (frames per second) during inference.
## Model Testing
![](images_4/media/image10.jpg){width="6.5in"
height="3.8472222222222223in"}
Now, you should take the data put apart at the start of the project and
run the trained model having them as input:
![](images_4/media/image34.png){width="3.1041666666666665in"
height="1.7083333333333333in"}
The result was, again, excellent.
![](images_4/media/image12.png){width="6.5in"
height="4.263888888888889in"}
## Deploying the model
At this point, we can deploy the trained model as.tflite and use the
OpenMV IDE to run it using MicroPython, or we can deploy it as a C/C++
or an Arduino library.
![](images_4/media/image28.jpg){width="6.5in"
height="3.763888888888889in"}
**Arduino Library**
First, Let\'s deploy it as an Arduino Library:
![](images_4/media/image48.png){width="6.5in"
height="4.263888888888889in"}
You should install the library as.zip on the Arduino IDE and run the
sketch nicla_vision_camera.ino available in Examples under your library
name.
> *Note that Arduino Nicla Vision has, by default, 512KB of RAM
> allocated for the M7 core and an additional 244KB on the M4 address
> space. In the code, this allocation was changed to 288 kB to guarantee
> that the model will run on the device
> (malloc_addblock((void\*)0x30000000, 288 \* 1024);).*
The result was good, with 86ms of measured latency.
![](images_4/media/image25.jpg){width="6.5in"
height="3.4444444444444446in"}
Here is a short video showing the inference results:
[[https://youtu.be/bZPZZJblU-o]{.underline}](https://youtu.be/bZPZZJblU-o)
**OpenMV**
It is possible to deploy the trained model to be used with OpenMV in two
ways: as a library and as a firmware.
Three files are generated as a library: the.tflite model, a list with
the labels, and a simple MicroPython script that can make inferences
using the model.
![](images_4/media/image26.png){width="6.5in"
height="1.0in"}
Running this model as a.tflite directly in the Nicla was impossible. So,
we can sacrifice the accuracy using a smaller model or deploy the model
as an OpenMV Firmware (FW). As an FW, the Edge Impulse Studio generates
optimized models, libraries, and frameworks needed to make the
inference. Let\'s explore this last one.
Select OpenMV Firmware on the Deploy Tab and press \[Build\].
![](images_4/media/image3.png){width="6.5in"
height="4.263888888888889in"}
On your computer, you will find a ZIP file. Open it:
![](images_4/media/image33.png){width="6.5in" height="2.625in"}
Use the Bootloader tool on the OpenMV IDE to load the FW on your board:
![](images_4/media/image35.jpg){width="6.5in" height="3.625in"}
Select the appropriate file (.bin for Nicla-Vision):
![](images_4/media/image8.png){width="6.5in" height="1.9722222222222223in"}
After the download is finished, press OK:
![DFU firmware update complete!.png](images_4/media/image40.png){width="3.875in" height="5.708333333333333in"}
If a message says that the FW is outdated, DO NOT UPGRADE. Select
\[NO\].
![](images_4/media/image42.png){width="4.572916666666667in"
height="2.875in"}
Now, open the script **ei_image_classification.py** that was downloaded
from the Studio and the.bin file for the Nicla.
![](images_4/media/image14.png){width="6.5in"
height="4.0in"}
And run it. Pointing the camera to the objects we want to classify, the
inference result will be displayed on the Serial Terminal.
![](images_4/media/image37.png){width="6.5in"
height="3.736111111111111in"}
**Changing Code to add labels:**
The code provided by Edge Impulse can be modified so that we can see,
for test reasons, the inference result directly on the image displayed
on the OpenMV IDE.
[[Upload the code from
GitHub,]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification.py)
or modify it as below:
```python
# Marcelo Rovai - NICLA Vision - Image Classification
# Adapted from Edge Impulse - OpenMV Image Classification Example
# @24Aug23
import sensor, image, time, os, tf, uos, gc
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pxl fmt to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
try:
# Load built in model
labels, net = tf.load_builtin_model('trained')
except Exception as e:
raise Exception(e)
clock = time.clock()
while(True):
clock.tick() # Starts tracking elapsed time.
img = sensor.snapshot()
# default settings just do one detection
for obj in net.classify(img,
min_scale=1.0,
scale_mul=0.8,
x_overlap=0.5,
y_overlap=0.5):
fps = clock.fps()
lat = clock.avg()
print("**********\nPrediction:")
img.draw_rectangle(obj.rect())
# This combines the labels and confidence values into a list of tuples
predictions_list = list(zip(labels, obj.output()))
max_val = predictions_list[0][1]
max_lbl = 'background'
for i in range(len(predictions_list)):
val = predictions_list[i][1]
lbl = predictions_list[i][0]
if val > max_val:
max_val = val
max_lbl = lbl
# Print label with the highest probability
if max_val < 0.5:
max_lbl = 'uncertain'
print("{} with a prob of {:.2f}".format(max_lbl, max_val))
print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat))
# Draw label with highest probability to image viewer
img.draw_string(
10, 10,
max_lbl + "\n{:.2f}".format(max_val),
mono_space = False,
scale=2
)
```
Here you can see the result:
![](images_4/media/image47.jpg){width="6.5in"
height="2.9444444444444446in"}
Note that the latency (136 ms) is almost double what we got directly
with the Arduino IDE. This is because we are using the IDE as an
interface and the time to wait for the camera to be ready. If we start
the clock just before the inference:
![](images_4/media/image13.jpg){width="6.5in"
height="2.0972222222222223in"}
The latency will drop to only 71 ms.
![](images_4/media/image1.jpg){width="3.5520833333333335in"
height="1.53125in"}
> *The NiclaV runs about half as fast when connected to the IDE. The FPS should increase once disconnected.*
### **Post-Processing with LEDs**
When working with embedded machine learning, we are looking for devices
that can continually proceed with the inference and result, taking some
action directly on the physical world and not displaying the result on a
connected computer. To simulate this, we will define one LED to light up
for each one of the possible inference results.
![](images_4/media/image38.jpg){width="6.5in"
height="3.236111111111111in"}
For that, we should [[upload the code from
GitHub]{.underline}](https://github.com/Mjrovai/Arduino_Nicla_Vision/blob/main/Micropython/nicla_image_classification_LED.py)
or change the last code to include the LEDs:
```python
# Marcelo Rovai - NICLA Vision - Image Classification with LEDs
# Adapted from Edge Impulse - OpenMV Image Classification Example
# @24Aug23
import sensor, image, time, os, tf, uos, gc, pyb
ledRed = pyb.LED(1)
ledGre = pyb.LED(2)
ledBlu = pyb.LED(3)
sensor.reset() # Reset and initialize the sensor.
sensor.set_pixformat(sensor.RGB565) # Set pixl fmt to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA) # Set frame size to QVGA (320x240)
sensor.set_windowing((240, 240)) # Set 240x240 window.
sensor.skip_frames(time=2000) # Let the camera adjust.
net = None
labels = None
ledRed.off()
ledGre.off()
ledBlu.off()
try:
# Load built in model
labels, net = tf.load_builtin_model('trained')
except Exception as e:
raise Exception(e)
clock = time.clock()
def setLEDs(max_lbl):
if max_lbl == 'uncertain':
ledRed.on()
ledGre.off()
ledBlu.off()
if max_lbl == 'periquito':
ledRed.off()
ledGre.on()
ledBlu.off()
if max_lbl == 'robot':
ledRed.off()
ledGre.off()
ledBlu.on()
if max_lbl == 'background':
ledRed.off()
ledGre.off()
ledBlu.off()
while(True):
img = sensor.snapshot()
clock.tick() # Starts tracking elapsed time.
# default settings just do one detection.
for obj in net.classify(img,
min_scale=1.0,
scale_mul=0.8,
x_overlap=0.5,
y_overlap=0.5):
fps = clock.fps()
lat = clock.avg()
print("**********\nPrediction:")
img.draw_rectangle(obj.rect())
# This combines the labels and confidence values into a list of tuples
predictions_list = list(zip(labels, obj.output()))
max_val = predictions_list[0][1]
max_lbl = 'background'
for i in range(len(predictions_list)):
val = predictions_list[i][1]
lbl = predictions_list[i][0]
if val > max_val:
max_val = val
max_lbl = lbl
# Print label and turn on LED with the highest probability
if max_val < 0.8:
max_lbl = 'uncertain'
setLEDs(max_lbl)
print("{} with a prob of {:.2f}".format(max_lbl, max_val))
print("FPS: {:.2f} fps ==> latency: {:.0f} ms".format(fps, lat))
# Draw label with highest probability to image viewer
img.draw_string(
10, 10,
max_lbl + "\n{:.2f}".format(max_val),
mono_space = False,
scale=2
)
```
Now, each time that a class gets a result superior of 0.8, the
correspondent LED will be light on as below:
- Led Red 0n: Uncertain (no one class is over 0.8)
- Led Green 0n: Periquito \> 0.8
- Led Blue 0n: Robot \> 0.8
- All LEDs Off: Background \> 0.8
Here is the result:
![](images_4/media/image18.jpg){width="6.5in"
height="3.6527777777777777in"}
In more detail
![](images_4/media/image21.jpg){width="6.5in"
height="2.0972222222222223in"}
### **Image Classification (non-official) Benchmark**
Several development boards can be used for embedded machine learning
(tinyML), and the most common ones for Computer Vision applications
(with low energy), are the ESP32 CAM, the Seeed XIAO ESP32S3 Sense, the
Arduinos Nicla Vison, and Portenta.
![](images_4/media/image19.jpg){width="6.5in"
height="4.194444444444445in"}
Using the opportunity, the same trained model was deployed on the
ESP-CAM, the XIAO, and Portenta (in this one, the model was trained
again, using grayscaled images to be compatible with its camera. Here is
the result, deploying the models as Arduino\'s Library:
![](images_4/media/image4.jpg){width="6.5in"
height="3.4444444444444446in"}
## Conclusion
Before we finish, consider that Computer Vision is more than just image
classification. For example, you can develop Edge Machine Learning
projects around vision in several areas, such as:
- **Autonomous Vehicles**: Use sensor fusion, lidar data, and computer
> vision algorithms to navigate and make decisions.
- **Healthcare**: Automated diagnosis of diseases through MRI, X-ray,
> and CT scan image analysis
- **Retail**: Automated checkout systems that identify products as
> they pass through a scanner.
- **Security and Surveillance**: Facial recognition, anomaly
> detection, and object tracking in real-time video feeds.
- **Augmented Reality**: Object detection and classification to
> overlay digital information in the real world.
- **Industrial Automation**: Visual inspection of products, predictive
> maintenance, and robot and drone guidance.
- **Agriculture**: Drone-based crop monitoring and automated
> harvesting.
- **Natural Language Processing**: Image captioning and visual
> question answering.
- **Gesture Recognition**: For gaming, sign language translation, and
> human-machine interaction.
- **Content Recommendation**: Image-based recommendation systems in
> e-commerce.