Modify custom head for distance estimation with YOLOv8 #17585

geun0196 · 2024-11-17T05:38:02Z

geun0196
Nov 17, 2024

I want to modify the Yolov8 model to predict the distance from the camera's position to the detected object.
I saw on the discussions page https://github.com/orgs/ultralytics/discussions/7956 that distance is being estimated using the aspect ratio of the image's width and height.
However, I want the model to directly output the distance as a predicted value, rather than calculating it based on the aspect ratio.

It will be done with 2D Bounding Boxes, and I plan to add the distance value to the last column of the existing label (txt file).
For example: class, confidence, xmin, ymin, xmax, ymax, distance.

I believe I need to modify the Trainer, DataLoader, and custom Loss, but I'm wondering if there is anyone with similar experience.
Is my approach for modifying the model correct? I would appreciate any help or advice on this.

UltralyticsAssistant · 2024-11-17T05:38:32Z

UltralyticsAssistant
Nov 17, 2024
Maintainer

👋 Hello @geun0196, thank you for your interest in Ultralytics 🚀! It sounds like an exciting project you're working on. To get the best advice and insights, I recommend checking out the Docs where you can find comprehensive guides and examples.

If this is a ❓ Question about modifying YOLOv8, please make sure to provide all relevant information that could help troubleshoot or guide you better. Your approach seems interesting, and sharing specific implementation details might also assist in refining your strategy.

In case you're dealing with a 🐛 Bug while modifying the model, providing a minimum reproducible example would be particularly helpful to identify any issues.

We also invite you to join our vibrant Ultralytics community for discussions or advice. You can connect through Discord 🎧 for real-time support, engage in in-depth discussions on Discourse, or explore insights shared by others on our Subreddit.

Upgrade

Make sure you are working with the latest version of the ultralytics package, as updates might have resolved some issues or added useful features:

pip install -U ultralytics

Environments

YOLO models can be deployed in diverse environments. Check out various platforms with necessary dependencies pre-installed:

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all Ultralytics CI tests are currently passing. Rest assured that an Ultralytics engineer will also be here soon to provide further assistance. 👍

0 replies

glenn-jocher · 2024-11-17T10:11:37Z

glenn-jocher
Nov 17, 2024
Maintainer

Thank you for your detailed question about modifying YOLOv8 for distance estimation. You can extend the DetectionTrainer to include distance prediction by following these steps:

First, modify your dataset's labels to include the distance value as an additional column
Extend the DetectionTrainer class to handle the additional distance output

Here's a minimal example:

from ultralytics.models.yolo.detect import DetectionTrainer

class DistanceTrainer(DetectionTrainer):
    def get_model(self, cfg, weights):
        # Modify model architecture to output distance
        model = super().get_model(cfg, weights)
        # Add distance prediction head here
        return model

    def preprocess_batch(self, batch):
        # Handle the additional distance label in your batch
        batch = super().preprocess_batch(batch)
        return batch

    def loss(self, batch, preds):
        # Add distance loss computation
        loss = super().loss(batch, preds)
        # Add custom distance loss component
        return loss

For detailed implementation guidance, please refer to Advanced Customization in our documentation. You'll need to modify the model architecture to output the additional distance value and adjust the loss function accordingly.

0 replies

geun0196 · 2024-11-18T04:11:11Z

geun0196
Nov 18, 2024
Author

Thank you for your response.

To modify the trainer, would it be sufficient to only adjust the model's head and loss function?
In the dataloader Code, since the data and labels are fetched in a one-to-one match(jpg - txt), there shouldn't be much need for further code modifications, right? (Although additional parsing for the distance column might still be necessary...)

3 replies

glenn-jocher Nov 18, 2024
Maintainer

@geun0196 yes, your understanding is correct. The main modifications needed would be:

Extend the model's detection head to output an additional value for distance prediction
Modify the loss function to incorporate distance prediction loss
Add distance parsing in the data loading pipeline

The basic dataloader structure can remain largely unchanged since you're maintaining the one-to-one image-label relationship, just adding the distance value to each label. You can implement this by subclassing the DetectionTrainer class and overriding the necessary methods as shown in the Advanced Customization guide.

For the loss function, you'll likely want to add a regression loss term (like MSE or L1) for the distance prediction alongside the existing detection losses. This can be done by modifying the init_criterion() method in your custom model class.

geun0196 Nov 18, 2024
Author

@glenn-jocher I will try, share errors for a better future!

glenn-jocher Nov 18, 2024
Maintainer

Thank you, @geun0196! If you encounter any issues, feel free to share the error details here for assistance.

geun0196 · 2024-11-22T05:11:37Z

geun0196
Nov 22, 2024
Author

I'm sorry, my initial thought was wrong. Instead of adding the distance value to the label at the end, the right approach seems to be using the (RGB + Depth) information as a 4-channel input. The output would then be the same as the existing YOLO, but with the addition of depth (distance) information.

Additionally, when using RGB-D data for distance estimation in YOLO, I’d appreciate recommendations on whether it’s better to input RGB and Depth maps separately or to concatenate them as a 4-channel input.

Would there be any issues if I proceed in this direction, as suggested by @glenn-jocher?

1 reply

glenn-jocher Nov 22, 2024
Maintainer

@geun0196 using RGB-D data as a 4-channel input for YOLO models can enhance distance estimation capabilities. Concatenating RGB and Depth maps into a single input is a valid approach, but ensure your model architecture supports 4-channel input. Testing both separate and concatenated inputs can help determine the optimal configuration for your specific use case. For further guidance, it might be helpful to explore the community discussions or documentation related to RGB-D processing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ultralytics

Modify custom head for distance estimation with YOLOv8 #17585

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Ultralytics

Modify custom head for distance estimation with YOLOv8 #17585

geun0196 Nov 17, 2024

Replies: 4 comments · 4 replies

UltralyticsAssistant Nov 17, 2024 Maintainer

Upgrade

Environments

Status

glenn-jocher Nov 17, 2024 Maintainer

geun0196 Nov 18, 2024 Author

glenn-jocher Nov 18, 2024 Maintainer

geun0196 Nov 18, 2024 Author

glenn-jocher Nov 18, 2024 Maintainer

geun0196 Nov 22, 2024 Author

glenn-jocher Nov 22, 2024 Maintainer

geun0196
Nov 17, 2024

Replies: 4 comments 4 replies

UltralyticsAssistant
Nov 17, 2024
Maintainer

glenn-jocher
Nov 17, 2024
Maintainer

geun0196
Nov 18, 2024
Author

glenn-jocher Nov 18, 2024
Maintainer

geun0196 Nov 18, 2024
Author

glenn-jocher Nov 18, 2024
Maintainer

geun0196
Nov 22, 2024
Author

glenn-jocher Nov 22, 2024
Maintainer