Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

brukew · 2024-10-10T16:17:03Z

Senselab Pose Estimation

Goal:
Integrate robust pose estimation workflows within Senselab

ViTPose performs best on infants - mediapipe and deeplabcut are low on accuracy

Support Plan

Start with MediaPipe
- Easy to implement, lightweight (in terms of computation and speed)
Follow up with:
- OpenPose: Best performance across humans and animals (depending on the specific model): Better performance than MediaPipe, but more computationally intensive
- ViTPose: Best performance across humans and animals (depending on the specific model)
- DeepLabCut: High performance on animal pose tracking

Workflow

Upload media
Receive custom output
Perform further analysis using the output

Version Planning

V1:
- Create pose estimations for images and videos
  - Custom datatypes for output
- Support for MediaPipe
- Visualization support
V2:
- Expand support to OpenPose
V3:
- Real-time pose detection across supported models (?)

Inputs/Outputs

Inputs:
- Image/Video file
- Configurations (e.g., MediaPipe example)
- Model
Outputs:
- PoseImage
  - Keypoints representing the pose skeleton
  - Visualization methods
- PoseVideo
  - Keypoints representing the pose skeleton per video frame
  - Visualization methods

At each step, proper documentation and tests are expected. Tutorials will also be implemented for each workflow (across models and modalities).

github-actions · 2024-10-10T16:17:26Z

👋 Welcome to Senselab!

Thank you for your interest and contribution. Senselab is a comprehensive Python package designed to process behavioral data, including voice and speech patterns, with a focus on reproducibility and robust methodologies. Your issue will be reviewed soon. Stay tuned!

fabiocat93 · 2024-10-10T20:51:06Z

hi @brukew , sounds nice. Do you mind sharing some more detailed insights on your plan?

For example,

what is the interface that you plan to build? What is the input? what is the output?
what are the required quality checks that will determine the acceptability of your PRs? Can be unit tests, docstrings, tutorials, ...
what more tools to you plan to integrate? how did you end up starting with DeepLabCut?

I would recommend clarifying all these aspects before starting coding.
Also, I do recommend thinking about some utility functions for plotting human pose (alone and overlapped to the original picture) and some utility data structure for the human pose so that you can save and process the results of the different models in the same way

brukew · 2024-10-11T15:13:33Z

thanks @fabiocat93, sounds good! Will update the issue as I make the plan.

fabiocat93 · 2024-10-16T18:59:31Z

Senselab Pose Estimation

Goal: Integrate robust pose estimation workflows within Senselab

ViTPose performs best on infants - mediapipe and deeplabcut are low on accuracy

Support Plan

Start with MediaPipe

Easy to implement, lightweight (in terms of computation and speed)

Follow up with:

OpenPose: Best performance across humans and animals (depending on the specific model): Better performance than MediaPipe, but more computationally intensive

ViTPose: Best performance across humans and animals (depending on the specific model)

DeepLabCut: High performance on animal pose tracking

Workflow

Upload media

Receive custom output

Perform further analysis using the output

Version Planning

V1:

Create pose estimations for images and videos

Custom datatypes for output

Support for MediaPipe

Visualization support

V2:

Expand support to OpenPose

V3:

Real-time pose detection across supported models (?)

Inputs/Outputs

Inputs:

Image/Video file

Configurations (e.g., MediaPipe example)

Model

Outputs:

PoseImage

Keypoints representing the pose skeleton

Visualization methods

PoseVideo

Keypoints representing the pose skeleton per video frame

Visualization methods

At each step, proper documentation and tests are expected. Tutorials will also be implemented for each workflow (across models and modalities).

Thank you @brukew . This is good. Here are a couple of comments:

My understanding is that different models capture different joints during their pose estimation. How do you feel about implementing a senselab data structure for the human skeleton so that (i) all models with end up producing such data structure as a response and (ii) comparisons between models is going to be facilitated?
As a follow-up, how about implementing some utilities for visualizing the senselab skelethon? Can be on the original or on a black/white backgrounds. This is surely gonna help testing, demos, models comparison, ...

As a minor note, feel free to reply in a thread instead of simply editing the original text of the issue. this way, we can keep track of all the reasoning process (this is mostly helpful to me, since my memory is not that great! thanks!)

fabiocat93 · 2024-10-16T19:26:43Z

Also, here is a good document overviewing pose estimation as a task: https://medium.com/augmented-startups/top-9-pose-estimation-models-of-2022-70d00b11db43

Consider that you are not necessarily required to implement everything yourself. People have been working in the domain for a while and you can use their models and their utility functions. For instance, here is a related project you should look at:

MMPose (https://github.com/open-mmlab/mmpose?tab=readme-ov-file)

Here are some more models I have tried recently and could be good to add at some point in the future:

YOLO (https://docs.ultralytics.com/tasks/pose/)
Sport2d (https://github.com/davidpagnon/Sports2D)
Alphapose (https://github.com/MVIG-SJTU/AlphaPose)
RTMPose (https://github.com/open-mmlab/mmpose/tree/main/projects/rtmpose)

satra · 2024-10-16T19:27:03Z

there is also SLEAP and DANNCE and TULIP from here https://www.tdunnlab.org/

brukew · 2024-10-16T20:43:07Z

@fabiocat93 Yes, both of your points make sense. The number of joints differs greatly between each model - I assume we want all information retained though so the pose skeleton object will just vary in size/content per model. I will look into how to ensure this is done best - maybe just hardcoding matches between the key points of each model and labeling it the same.

How should I approach using existing toolkits? If I just want a minor visualization utility function for example, would I need to add the whole toolkit as a requirement and import it? or would it be fine to just take the code from their github?

fabiocat93 · 2024-10-16T21:20:02Z

How should I approach using existing toolkits? If I just want a minor visualization utility function for example, would I need to add the whole toolkit as a requirement and import it? or would it be fine to just take the code from their github?

Good question. If you can isolate a specific function that you need, you can simply copy-paste that and report the source+mentioning their LICENSE. Obv, this assumes that their license allows doing that

brukew · 2024-10-17T15:47:02Z

Also, I have been thinking about this in the evaluation sense—not necessarily training or fine-tuning a model. This could definitely be added in the future, but am I right in prioritizing evaluation?

fabiocat93 · 2024-11-14T22:20:15Z

Hey @brukew, do you mind giving me a quick update? I know you keep Satra in the loop with your weekly reports, but I'd love to stay up to speed, too! Thanks!

brukew · 2024-11-15T17:56:43Z

Hey, yeah. Finished up with an implementation for mediapipe - will push a pr for it soon. Having issues with dependencies when running poetry install - if you are available sometime today, would be nice to clear that up.

fabiocat93 · 2024-12-23T17:09:23Z

hi guys @brukew, how is it going with this? I will extend your deadline (from Dec 13 to Jan 14). Please, let me know if you face any blockers

brukew added this to senselab Oct 10, 2024

brukew self-assigned this Oct 10, 2024

brukew converted this from a draft issue Oct 10, 2024

fabiocat93 added the enhancement New feature or request label Oct 10, 2024

fabiocat93 changed the title ~~Task [pose estimation]: integrate DeepLabCut~~ Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) Oct 16, 2024

brukew linked a pull request Nov 19, 2024 that will close this issue

Media Pipe Pose Estimation + Visualization #203

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

brukew commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024 •

edited by fabiocat93

Loading

fabiocat93 commented Oct 10, 2024

brukew commented Oct 11, 2024

fabiocat93 commented Oct 16, 2024

Senselab Pose Estimation

Support Plan

Workflow

Version Planning

Inputs/Outputs

fabiocat93 commented Oct 16, 2024

satra commented Oct 16, 2024

brukew commented Oct 16, 2024 •

edited

Loading

fabiocat93 commented Oct 16, 2024

brukew commented Oct 17, 2024

fabiocat93 commented Nov 14, 2024

brukew commented Nov 15, 2024

fabiocat93 commented Dec 23, 2024

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

Comments

brukew commented Oct 10, 2024 • edited Loading

Senselab Pose Estimation

Support Plan

Workflow

Version Planning

Inputs/Outputs

github-actions bot commented Oct 10, 2024 • edited by fabiocat93 Loading

fabiocat93 commented Oct 10, 2024

brukew commented Oct 11, 2024

fabiocat93 commented Oct 16, 2024

Senselab Pose Estimation

Support Plan

Workflow

Version Planning

Inputs/Outputs

fabiocat93 commented Oct 16, 2024

satra commented Oct 16, 2024

brukew commented Oct 16, 2024 • edited Loading

fabiocat93 commented Oct 16, 2024

brukew commented Oct 17, 2024

fabiocat93 commented Nov 14, 2024

brukew commented Nov 15, 2024

fabiocat93 commented Dec 23, 2024

brukew commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024 •

edited by fabiocat93

Loading

brukew commented Oct 16, 2024 •

edited

Loading