[Object Detection] Add YOLOv11 Architecture and Presets #1952

DavidLandup0 · 2024-10-23T05:09:17Z

Draft PR for transparency.

Done

Basic components
CIoU Loss
ImageObjectDetector Task
ImageObjectDetectorPreprocessor

Planned

[] YOLOv11 Architecture
[] Object detection workflow
[] Weight conversion script
[] Utils for bounding boxes
[] Presets (n...xl)

Out of Scope

Instance segmentation
Pose estimation
Oriented Object Detection (i.e. rotated bounding boxes)

These will be exported as separate tasks (i.e. ImagePoseEstimator, ImageInstanceSegmentor, their respective preprocessors, etc.) in separate PRs.

API Considerations

There will be lots of reusability between YOLOv11 OD, YOLOv11 Pose, etc. Some functions such as the non-max-supression can be wrapped into generic public layers and reused between object detectors. We could benefit from refactoring these into general utils in KerasHub (currently, they belong to models, such as in the case of RetinaNet).

Some YOLO models are consistent with the same architecture but rely on a different config. Enabling v11 will enable v8 as well, for example. These can be handled through presets. We could turn YOLOv11 into a generic YOLO class, which is configurable through presets and layers. This lets us support multiple versions, but also easily port and publish YOLOv{N} and subsequent versions in the future with minimal code changes (i.e. a layer or two + config).

/cc @divyashreepathihalli @mattdangerw @fchollet for API discussions and considerations.

… yolov11

mattdangerw · 2024-11-21T01:22:39Z

Haven't gone over the code yet, but re the API questions...

We could turn YOLOv11 into a generic YOLO class, which is configurable through presets and layers. This lets us support multiple versions, but also easily port and publish YOLOv{N} and subsequent versions in the future with minimal code changes (i.e. a layer or two + config).

I think this is probably the right call? If we think it will overall reduce the amount of code without becoming a spaghetti of if, sounds like a worthy clean up.

Some functions such as the non-max-supression can be wrapped into generic public layers and reused between object detectors.

In general if we are seeing the same layers being used across different models, and we can write a common one that covers both cases, that's a good time to consider pulling things out as a layer. Adding common routines as a layer will up the requirements a bit (solid testing, good docs with examples, etc.), but that's not an issue just something to keep in mind.

Random and not originating on this PR, but ImageObjectDetector feels a little redundant to me. It's not like there's a text object detector. Can we just name this task ObjectDetector and keep our code a little more concise? @divyashreepathihalli WDYT?

DavidLandup0 · 2024-11-21T12:43:58Z

Thanks for weighing in @mattdangerw!

If we think it will overall reduce the amount of code without becoming a spaghetti of if, sounds like a worthy clean up.

The original repo is written to allow for this customizability, so in principle, it shouldn't be too hard to do it here. Though, the original repo has a lot of proprietary structures and code which we don't want to port, so we'll have to trim down a lot on the sides.

Random and not originating on this PR, but ImageObjectDetector feels a little redundant to me. It's not like there's a text object detector. Can we just name this task ObjectDetector and keep our code a little more concise?

Agreed. It was named ImageObjectDetector to keep consistent with ImageSegmenter, even though we don't have TextSegmenters either. Unless we refactor that, consistency would probably be preferable?

mattdangerw · 2024-11-22T22:59:20Z

Agreed. It was named ImageObjectDetector to keep consistent with ImageSegmenter, even though we don't have TextSegmenters either. Unless we refactor that, consistency would probably be preferable?

Yeah, let's keep consistent for now! Leave as is for this PR.

add ciou loss, object detector task, preprocessor and basic blocks of…

a814a9d

… yolov11

DavidLandup0 marked this pull request as draft October 23, 2024 05:09

DavidLandup0 added 7 commits October 23, 2024 15:57

add more layers, refactor to clearer names, run api_gen

b1d7f68

add attention blocks, fast spatial pyramid pooling, etc

a9eca0e

fix shapes

59f1cd2

fix shapes

82734cd

refactor names

f311ee2

reshape to b h w c

bd802fc

update layers

ed03108

mattdangerw self-requested a review November 21, 2024 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Object Detection] Add YOLOv11 Architecture and Presets #1952

[Object Detection] Add YOLOv11 Architecture and Presets #1952

DavidLandup0 commented Oct 23, 2024 •

edited

Loading

mattdangerw commented Nov 21, 2024

DavidLandup0 commented Nov 21, 2024

mattdangerw commented Nov 22, 2024

[Object Detection] Add YOLOv11 Architecture and Presets #1952

Are you sure you want to change the base?

[Object Detection] Add YOLOv11 Architecture and Presets #1952

Conversation

DavidLandup0 commented Oct 23, 2024 • edited Loading

Done

Planned

Out of Scope

API Considerations

mattdangerw commented Nov 21, 2024

DavidLandup0 commented Nov 21, 2024

mattdangerw commented Nov 22, 2024

DavidLandup0 commented Oct 23, 2024 •

edited

Loading