Skip to content

Commit

Permalink
54 - Add model evaluation experiment for perception (#71)
Browse files Browse the repository at this point in the history
  • Loading branch information
MaxJa4 authored Nov 20, 2023
1 parent e844735 commit 27051ae
Show file tree
Hide file tree
Showing 12 changed files with 466 additions and 0 deletions.
198 changes: 198 additions & 0 deletions doc/06_perception/experiments/model_evaluation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# Model Evaluation Experiment

This experiment was conducted to evaluate different kinds of object detection models for usage in the CARLA simulator.

## Installation

1. Create a virtual environment with `python -m venv venv` while being inside the project's folder.
2. Install the requirements with `pip install -r requirements.txt`
1. If you have issues, check out the [Tensorflow-Docs](https://www.tensorflow.org/hub/installation), [PyTorch-Docs](https://pytorch.org/get-started/locally/) or [YOLO-Docs](https://docs.ultralytics.com/quickstart/) for installation
3. For the PyLot models (using Tensorflow), also run the following commands ([source](https://www.tensorflow.org/hub/tutorials/tf2_object_detection)):

```shell
git clone --depth 1 https://github.com/tensorflow/models
sudo apt install -y protobuf-compiler
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
cp object_detection/packages/tf2/setup.py .
python -m pip install .
```

For the YOLO-NAS models, also run `pip install super-gradients==3.1.1`. It is recommended to run these models in a seperate venv, e.g. with `python -m venv venv_yolo` to not mix package versions

## Usage

Simply start `yolo.py`, `pt.py` or `pylot.py` to run the respective models and let it save the results.
`globals.py` only holds some global common constants.

## Results

For all models, 8 image directly captured from the CARLA RGB camera were used which depict different traffic scenarios.

*Legend for tables below*: ++, +, /, -, -- (best to worst)

### Pylot Models (Tensorflow)

The following models were evaluated (sorted descending by recognition performance):

1. faster-rcnn
2. ssd-mobilenet-v1-fpn
3. ssd-mobilenet-fpn-640
4. ssd-mobilenet-v1
5. ssdlite-mobilenet-v2

Images with boundary boxes: [Google Drive](https://drive.google.com/drive/folders/1Jk0KNf1-inO1LN8YWoX0nd796w44zp7_?usp=sharing)

#### Summary

| | Cyclists | Traffic lights | Cars | Noise | Speed |
|-----------------------|----------|----------------|------|-------|-------|
| faster-rcnn | -- | -- | ++ | ++ | -- |
| ssd-mobilenet-v1-fpn | -- | -- | + | ++ | - |
| ssd-mobilenet-fpn-640 | -- | -- | - | ++ | + |
| ssd-mobilenet-v1 | -- | -- | - | ++ | ++ |
| ssdlite-mobilenet-v2 | -- | -- | -- | ++ | ++ |

#### Recognition

Overall, `faster-rcnn` performed the best out of these 5 Pylot models. It was able to recognize most cars reliably. The only downside was, pretty much nothing else was detected - like cyclists, traffic lights or construction sites.

`ssd-mobilenet-v1-fpn` recognized significantly less then `faster-rcnn`, but still better than the rest of the Pylot models.

The difference between `ssd-mobilenet-fpn-640` and `ssd-mobilenet-v1` isn't very noticable, both are rather mediocre.

`ssdlite-mobilenet-v2` only recognized the most obvious cars and tended to group multiple cars into one bounding box.

#### Computation speed

These values are meant to be compared between the models, not as a representative performance indicator in general.
Only the inference time was measured.

| Model | Time | FPS |
|---------------------------|-------|------|
| ssd-mobilenet-v1 | ~4ms | 250 |
| ssdlite-mobilenet-v2 | ~6ms | ~166 |
| ssd-mobilenet-fpn-640 | ~9ms | ~111 |
| ssd-mobilenet-v1-fpn | ~10ms | 100 |
| faster-rcnn | ~26ms | ~38 |

### PyTorch Models

The following models were evaluated (sorted descending by recognition performance):

1. fasterrcnn_resnet50_fpn_v2
2. fasterrcnn_mobilenet_v3_large_320_fpn
3. retinanet_resnet50_fpn_v2

Images with boundary boxes: [Google Drive](https://drive.google.com/drive/folders/1AnTCIx351_aXWTcdGusIR4kTUOcomcy-?usp=sharing)

#### Summary

| Model | Cyclists | Traffic lights | Cars | Noise | Speed |
|---------------------------------------|----------|----------------|------|-------|-------|
| fasterrcnn_resnet50_fpn_v2 | ++ | ++ | ++ | - | -- |
| fasterrcnn_mobilenet_v3_large_320_fpn | ++ | + | ++ | / | ++ |
| retinanet_resnet50_fpn_v2 | ++ | ++ | ++ | -- | - |

#### Recognition

Although, all models performed very well at recognizing all kinds of objects, especially `retinanet_resnet50_fpn_v2` showed a **lot** of noise (see images [above](#pytorch-models)).
When filtering these results for a higher minimum for detection score, these noise issues for all models could perhaps be reduced while keeping the good recogintion performance.

#### Computation speed

These values are meant to be compared between the models, not as a representative performance indicator in general.
Only the inference time was measured.

| Model | Time | FPS |
|---------------------------------------|-------|------|
| fasterrcnn_mobilenet_v3_large_320_fpn | ~6ms | ~166 |
| retinanet_resnet50_fpn_v2 | ~36ms | ~27 |
| fasterrcnn_resnet50_fpn_v2 | ~45ms | ~22 |

### YOLOv8

The model versions are different sizes of the same model.
The following models were evaluated (sorted descending by recognition performance):

1. yolo-rtdetr-x
2. yolo-rtdetr-l
3. yolo-nas-l
4. yolo-nas-m
5. yolo-nas-s
6. yolov8x / yolov8x-seg
7. yolov8l
8. yolov8m
9. yolov8s
10. yolov8n

Images with boundary boxes: [Google Drive](https://drive.google.com/drive/folders/1u6T0Q3kd9FqjiBWMqzlT-3-fglMqlkBB?usp=sharing)

#### Summary

| Model | Cyclists | Traffic lights | Cars | Noise | Speed |
|---------------|----------|----------------|------|-------|-------|
| yolo-rtdetr-x | ++ | ++ | ++ | + | - |
| yolo-rtdetr-l | ++ | ++ | ++ | + | - |
| yolo-nas-l | ++ | ++ | ++ | ++ | + |
| yolo-nas-m | ++ | ++ | ++ | ++ | + |
| yolo-nas-s | ++ | ++ | ++ | ++ | + |
| yolov8x/-seg | ++ | ++ | ++ | ++ | + |
| yolov8l | ++ | ++ | ++ | ++ | + |
| yolov8m | ++ | ++ | ++ | + | ++ |
| yolov8s | + | + | ++ | ++ | ++ |
| yolov8n | + | - | + | ++ | ++ |

#### Recognition

All model version performed very well. Only the smallest (`v8n`) version missed some cars. The v8s version was already visibly better, although the `v8x`, `v8l` and `v8m` versions recognized more details - like instead of just a person, they saw a person and a bicycle underneath.

The same can be said for traffic lights - `v8x`, `v8l` and `v8m` saw them from a larger distance, while `v8n` and `v8s` needed more proximity.

The `YOLO-NAS` family of models are similar to the best `v8` version but with higher confidence scores.

`RT-DETR` recognized a little more details and with higher confidence. But at the same time, they have more noise and see irrelevant objects (can be filtered though).

Throughout all versions, almost no noise (random wrong/duplicate predictions ) was present, without tweaking any values - only some noise with `v8m`.

#### Computation speed

These values are meant to be compared between the models, not as a representative performance indicator in general.
Only the inference time was measured.

| Model | Time | FPS |
|---------------|--------|------|
| yolov8n | ~2ms | 500 |
| yolov8s | ~2ms | 500 |
| yolov8m | ~3ms | ~333 |
| yolov8l | ~4ms | 250 |
| yolov8x/-seg | ~6/7ms | ~166 |
| yolo-nas-l | ~6ms | ~166 |
| yolo-nas-m | ~6ms | ~166 |
| yolo-nas-s | ~7ms | ~142 |
| yolo-rtdetr-l | ~13ms | ~77 |
| yolo-rtdetr-x | ~16ms | ~62 |

## Conclusion

Comparing all models with each other, YOLOv8 is currently by far the winner of the comparison. It has the best speed, lowest noise, most detail and reliable recognition - while also being the easiest to use and configure (compare the .py files yourself).

Since the `v8m` version is sometimes to sensitive and the `v8x` version is the largest/slowest, the `v8l` version is a good middle ground, if performance is important.

If the best detection results are the most important, the `v8x` version and `nas` family should be analyzed further with more images and situations.

For segmentation, also `sam` and `fast-sam` was tested. `Sam` needs multiple seconds for inference and is like `fast-sam` not suitable at all for Carla, as they segment the entire image and e.g. segment individual windows of a car or building.

| ![1619_TF_faster-rcnn.jpg](asset-copies/1619_TF_faster-rcnn.jpg) |
|:--:|
| ^ *Pylot - Faster RCNN (26ms)* ^ |
| ![1619_PT_fasterrcnn_resnet50_fpn_v2.jpg](asset-copies/1619_PT_fasterrcnn_resnet50_fpn_v2.jpg) |
| ^ *Pytorch - Faster RCNN Resnet50 FPN V2 (45ms)* ^ |
| ![1619_yolov8x.jpg](asset-copies/1619_yolov8x.jpg) |
| ^ *YOLOv8x (6ms)* ^ |
| ![1619_yolov8x_seg.jpg](asset-copies/1619_yolov8x_seg.jpg) |
| ^ *YOLOv8x-Seg (7ms)* ^ |
| ![1619_yolo_nas_l.jpg](asset-copies/1619_yolo_nas_l.jpg) |
| ^ *YOLO-nas-l (7ms)* ^ |
| ![1619_yolo_nas_l.jpg](asset-copies/1619_yolo_rtdetr_x.jpg) |
| ^ *YOLO-rtdetr-x (16ms)* ^ |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
12 changes: 12 additions & 0 deletions doc/06_perception/experiments/model_evaluation/globals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
IMAGE_BASE_FOLDER = '/home/maxi/paf23/code/output/12-dev/rgb/center'

IMAGES_FOR_TEST = {
'start': '1600.png',
'intersection': '1619.png',
'traffic_light': '1626.png',
'traffic': '1660.png',
'bicycle_far': '1663.png',
'bicycle_close': '1668.png',
'construction_sign_far': '2658.png',
'construction_sign_close': '2769.png'
}
90 changes: 90 additions & 0 deletions doc/06_perception/experiments/model_evaluation/pt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
'''
Docs: https://pytorch.org/vision/stable/models.html#object-detection
'''

import os
from time import perf_counter
import torch
import torchvision
from torchvision.models.detection.faster_rcnn import FasterRCNN_MobileNet_V3_Large_320_FPN_Weights, FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.models.detection.retinanet import RetinaNet_ResNet50_FPN_V2_Weights
from globals import IMAGE_BASE_FOLDER, IMAGES_FOR_TEST
from torchvision.utils import draw_bounding_boxes
from pathlib import Path
import matplotlib.pyplot as plt
from PIL import Image
from torchvision import transforms
from torchvision.transforms.functional import to_pil_image

ALL_MODELS = {
'fasterrcnn_mobilenet_v3_large_320_fpn': FasterRCNN_MobileNet_V3_Large_320_FPN_Weights,
'fasterrcnn_resnet50_fpn_v2': FasterRCNN_ResNet50_FPN_V2_Weights,
'retinanet_resnet50_fpn_v2': RetinaNet_ResNet50_FPN_V2_Weights,
}


def load_model(model_name):
print('Selected model: ' + model_name)
print('Loading model...', end='')
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
weights = ALL_MODELS[model_name].DEFAULT
model = torchvision.models.detection.__dict__[model_name](weights=weights).to(device)
model.eval()
return model, weights, device


def load_image(image_path, model_weights, device):
img = Image.open(image_path)
img = img.convert('RGB')
img = transforms.Compose([transforms.PILToTensor()])(img)
img = model_weights.transforms()(img)
img = img.unsqueeze_(0)
img = img.to(device)

return img


first_gen = True

for m in ALL_MODELS:
model, weights, device = load_model(m)

for p in IMAGES_FOR_TEST:
image_path = os.path.join(IMAGE_BASE_FOLDER, IMAGES_FOR_TEST[p])
image_np = load_image(image_path, weights, device)

if first_gen:
print('Running warmup inference...')
model(image_np)
first_gen = False

print(f'Running inference for {p}... ')

start_time = perf_counter()

# running inference
results = model(image_np)

elapsed_time = perf_counter() - start_time

# different object detection models have additional results
# all of them are explained in the documentation
result = results[0]

label_id_offset = -1

image_np_with_detections = torch.tensor(image_np * 255, dtype=torch.uint8)
boxes = result['boxes']
scores = result['scores']
labels = [weights.meta["categories"][i] for i in result['labels']]

box = draw_bounding_boxes(image_np_with_detections[0], boxes, labels, colors='red', width=2)
box_img = to_pil_image(box)

file_name = Path(image_path).stem

plt.figure(figsize=(32, 18))
plt.title(f'PyTorch - {m} - {p} - {elapsed_time*1000:.0f}ms', fontsize=30)
plt.imshow(box_img)
plt.savefig(f'{IMAGE_BASE_FOLDER}/result/{file_name}_PT_{m}.jpg')
plt.close()
Loading

0 comments on commit 27051ae

Please sign in to comment.