Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YoLo-WholeBody with depth estimation #446

Open
vpenades opened this issue Feb 10, 2025 · 2 comments
Open

YoLo-WholeBody with depth estimation #446

vpenades opened this issue Feb 10, 2025 · 2 comments

Comments

@vpenades
Copy link

Issue Type

Feature Request

OS

Windows

OS architecture

x86_64

Programming Language

Other

Framework

ONNX

Model name and Weights/Checkpoints URL

YoLo-WholeBody-XXX

Description

I've noticed you've been training lot of YoLo-WholeBody models, and some of them look fantastic!

I was wondering if any of these models supports estimated depths for the bone joints (so the model outputs 3D points instead of 2D points)

Obviously the depth is estimated because it's impossible to know the actual depth of a pixel without a proper depth-map, but I've seen a few models that do a reasonably good job generating Z values, which greatly helps doing things like using the whole body poses for avatar animation.

I understand this is subject to the availability of training images with proper depth data, but I take it these days such data is already freely available?

Relevant Log Output

URL or source code for simple inference testing code

No response

@PINTO0309
Copy link
Owner

PINTO0309 commented Feb 10, 2025

Rather than estimating depth, this could be achieved by using a depth camera or simply combining an ONNX file for depth estimation such as DepthAnythingV2, etc.

However, the important point is that this model (WholebodyXXX) only estimates the visible area. The concept is fundamentally different from the Pose Estimation architecture.

@vpenades
Copy link
Author

vpenades commented Feb 11, 2025

Yes, we have depth cameras, but they're quite expensive... and also we've considered, as you suggested, using DepthAnythingV2.

The problem is that, unlike MediaPipe, neither solution can run at interactive speeds on an average smartphone.

We'll keep looking, thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants