Media Pipe Pose Estimation + Visualization #203

brukew · 2024-11-19T21:28:11Z

Description

Implemented Media Pipe pose estimation. Given an image path, will return a PoseSkeleton object with landmarks of each individual in the image.

Related Issue(s)

https://github.com/orgs/sensein/projects/45/views/3?pane=issue&itemId=82951656&issue=sensein%7Csenselab%7C173

Motivation and Context

This is the initial structure for pose estimation which is a valuable signal for behavior analysis. I will expand to more models and functionality, and with that, this will be more generalized.

How Has This Been Tested?

I tested with different kinds of images and attempts to access invalid properties of the PoseSkeleton object. Unit tests for the new functions + also manually tested for proper visualization.

Screenshots (if appropriate):

Types of changes

Created PoseSkeleton object that contains pose information for individuals in an image. Currently supports MediaPipe pose estimation + visualization functionality.

Checklist:

I have added tests to cover my changes.
All new and existing tests passed.
My code follows the code style of this project.

github-actions

🚀 First Pull Request 🎉

Welcome to Senselab, and thank you for submitting your first pull request! We’re thrilled to have your contribution. Our team will review it as soon as possible. Stay engaged, and let’s make behavioral data analysis even more powerful together!

fabiocat93 · 2024-11-20T02:32:27Z

Thank you, @brukew, for the updates. I have a few suggestions and points to address based on our previous discussions:

Reorganize Code Structure
Please re-organize your code by separating data structures from functionalities (what we refer to as tasks in Senselab).
- The skeleton should be treated as a data structure. It should define the skeleton and optionally include utilities for visualization. Alternatively, visualization can be handled as a standalone task.
- Pose estimation should be a task (generating the skeleton as an output). All human pose estimation models (e.g., MediaPipe, AlphaPose) should conform to a consistent data structure.
- To encourage generalizability, I suggest integrating a second pose estimation tool, such as YOLO due to its simplicity.
Model Inclusion
The current approach of including a model within the source code (e.g., src/senselab/video/tasks/pose_estimation/models/pose_landmarker.task) makes the package unnecessarily heavy. Instead, please ensure models are downloaded as needed. You can take inspiration from this example.
Documentation
Please add a dedicated documentation page:
- Explain human pose estimation as a task, its purpose, and supported models.
- For instance, you can reference this documentation for MediaPipe.
- Feel free to draw inspiration from the existing audio task documentation (though some sections are incomplete).
Tutorial
Create a Jupyter Notebook tutorial to demonstrate:
- How to use the interface.
- What functionalities are available.
  Add this under a video folder in the tutorial/ directory.
Failing Tests
I noticed two tests are failing:
- test_valid_image_single_person
  - AssertionError: "Input and output image shapes should match."
- test_visualization_single_person
  - ValueError: "Input image must contain three-channel BGR data."
    Please double-check these tests to ensure they pass.

fabiocat93

@brukew, I have commented some required changes.

brukew · 2024-11-20T16:39:33Z

Nice, thank you for the feedback @fabiocat93. I will address your comments and ask questions as I go.

fabiocat93 · 2024-12-23T15:23:50Z

hi @brukew , did you have any time to work further on this?

brukew · 2024-12-29T05:40:33Z

hey @fabiocat93, slowed down on development towards the end of the semester. I will continue this month.

brukew · 2025-01-08T09:39:21Z

@fabiocat93 lmk what you think about having separate estimator classes for different models. also, I kept the landmark names as they are originally listed but wondering if I should homogenize them at all.

fabiocat93 · 2025-01-09T23:15:21Z

@fabiocat93 lmk what you think about having separate estimator classes for different models. also, I kept the landmark names as they are originally listed but wondering if I should homogenize them at all.

Good question. You can have multiple estimators, each being a child of the same abstract class and producing a skeleton following the same universal structure

brukew · 2025-01-12T16:43:21Z

addressed model download by downloading models to video/tasks/pose_estimation/models upon use. Let me know if there is a better alternative to this that is more in line with senselab use.

brukew · 2025-01-12T16:47:50Z

also wondering how I can ensure tutorial works (and cells are run in tutorial) if changes haven't been pushed to main branch @fabiocat93. should I wait until after?

fabiocat93 · 2025-01-13T14:09:41Z

addressed model download by downloading models to video/tasks/pose_estimation/models upon use. Let me know if there is a better alternative to this that is more in line with senselab use.

Good for now. But you raised a good point. We will need to implement a customizable cache folder for the package (of course with a default value)

fabiocat93 · 2025-01-13T14:11:52Z

also wondering how I can ensure tutorial works (and cells are run in tutorial) if changes haven't been pushed to main branch @fabiocat93. should I wait until after?

When testing the tutorial locally, you can install the package directly from a specific branch instead of relying on pip install senselab. This allows you to ensure the tutorial runs correctly with the latest changes before they are merged into the main branch

codecov-commenter · 2025-01-15T19:26:30Z

Codecov Report

Attention: Patch coverage is 93.67089% with 20 lines in your changes missing coverage. Please review.

Project coverage is 65.46%. Comparing base (113721a) to head (8458609).
Report is 88 commits behind head on main.

Files with missing lines	Patch %	Lines
src/senselab/video/data_structures/pose.py	88.46%	6 Missing ⚠️
...c/senselab/video/tasks/pose_estimation/estimate.py	94.73%	4 Missing ⚠️
...selab/video/tasks/pose_estimation/visualization.py	85.71%	4 Missing ⚠️
src/senselab/video/tasks/pose_estimation/api.py	88.00%	3 Missing ⚠️
src/senselab/video/tasks/pose_estimation/utils.py	94.28%	2 Missing ⚠️
src/tests/video/tasks/pose_estimation_test.py	98.70%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   60.24%   65.46%   +5.21%     
==========================================
  Files         113      128      +15     
  Lines        4017     4572     +555     
==========================================
+ Hits         2420     2993     +573     
+ Misses       1597     1579      -18

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fabiocat93

@brukew : great work on this! I have a few suggestions and questions for consideration:

I noticed the visualization method in the tutorial doesn't actually visualize the image but returns the array representation. Don’t you think it would make more sense for a visualization method (with the word visualize in the name) to display the image directly? Returning the array (and optionally saving it) could still be included as additional functionality.
Is there a specific reason for having both IndividualPose and ImagePose? Would it make sense to simplify by treating IndividualPose as a special case of ImagePose where there’s only one subject? It might reduce redundancy and improve maintainability.
I noticed there’s no mapping of the skeletons into a more general senseLab skeleton yet, and the visualization method is method-specific rather than generic for senselab. Was there a challenge in unifying this? Having two different skeletons and implementations feels less maintainable. Is there a particular reason for this approach?

Let me know if you’d like to discuss any of these points. I’m happy to help! Also, let me know if you prefer to fix the minor questions and merge and keep the rest of the changes for a future PR or if you want to keep it more ordered and work on this until everything is ready.

fabiocat93 · 2025-01-16T23:18:30Z

A quick note for the entire group about installation: the number of dependencies, especially for video, makes installation quite heavy. Moving forward, we might want to split the package into modules like senselab[audio], senselab[video], and senselab[text]. This would simplify installation based on the user’s needs

brukew · 2025-01-17T09:14:18Z

Hey @fabiocat93, thanks for the comments - I will make all changes before merging. Here are my thoughts:

Yes, that makes sense. I will do that.
I think it gives us greater power for anything we want to do later. For example, if a video has different people over time, we could combine face analysis and pose estimation to aggregate the poses of an individual across frames which would be easier with IndividualPose object (with tweaks). It's also just intuitive to have a separate object for each individual detected. I could take it out for now though.
I can do this. My thinking was that users could have access to whatever the given model output is and they can work with it how they want to, but I figure that defeats the point of having these in Senselab. So I guess I'll make a Senselab keypoint mapping and attempt to not lose any information across the models - so it includes all possible key points across models. What do you think about that?

fabiocat93 · 2025-01-17T14:32:11Z

I think it gives us greater power for anything we want to do later. For example, if a video has different people over time, we could combine face analysis and pose estimation to aggregate the poses of an individual across frames which would be easier with IndividualPose object (with tweaks). It's also just intuitive to have a separate object for each individual detected. I could take it out for now though.

You convinced me with this. but how about always returning back an ImagePose object (which may include as many IndividualPose objects as the number of people detected?)

I can do this. My thinking was that users could have access to whatever the given model output is and they can work with it how they want to, but I figure that defeats the point of having these in Senselab. So I guess I'll make a Senselab keypoint mapping and attempt to not lose any information across the models - so it includes all possible key points across models. What do you think about that?

Sounds great! And please let me know if you need or want to brainstorm it further before implementing it.

brukew · 2025-01-21T21:55:55Z

@fabiocat93 Addressed the visualization and keypoint mapping comments. The unit tests kept failing on Git Hub when a plot was created - my workaround was adding a bool parameter for plotting but let me know if there is another way that you want me to handle that.

fabiocat93

Looks good to me. Nice job @brukew ! Looking forward to seeing more models integrated and some cools analyses based on this!!

brukew linked an issue Nov 19, 2024 that may be closed by this pull request

Task [pose estimation]: implement the general pose estimation API and utilities, plus integrate some initial models (e.g., mediapipe, DeepLabCut) #173

Closed

brukew changed the title ~~173 task pose estimation~~ Media Pipe Pose Estimation + Visualization Nov 19, 2024

github-actions bot reviewed Nov 19, 2024

View reviewed changes

brukew requested a review from fabiocat93 November 19, 2024 21:37

fabiocat93 requested changes Nov 20, 2024

View reviewed changes

fabiocat93 assigned brukew Nov 20, 2024

fabiocat93 added enhancement New feature or request release minor Minor release to-test labels Nov 20, 2024

fabiocat93 marked this pull request as draft November 20, 2024 02:38

brukew added 4 commits November 24, 2024 16:20

Support for media pipe operations

90d014f

Test Suite for mediapipe pose estimation + docstrings

206eed0

reset changes from main

83c31b7

Refactored into data structures and tasks + generalized visualization

269f0a6

brukew force-pushed the 173-task-pose-estimation branch from e1a7696 to 269f0a6 Compare November 24, 2024 21:21

brukew added 2 commits January 2, 2025 01:25

Merge remote-tracking branch 'origin/main' into 173-task-pose-estimation

b83207a

Revision of pose estimation DS + addition of YOLO

bb0efb1

Removed model from package, downloads as needed

9602835

brukew added 3 commits January 12, 2025 09:47

Docs for pose estimation

641aef5

Tutorial for pose estimation

b28b77c

Implement visualization for mediapipe and yolo

f639ff2

brukew added 2 commits January 13, 2025 02:20

Visualization added to tests and tutorial

400ae23

Created api for pose estimation

f54850c

brukew added 2 commits January 13, 2025 16:03

Update pose estimation init

fc20beb

Corrected tutorial

432c4d1

brukew marked this pull request as ready for review January 13, 2025 21:41

brukew requested a review from fabiocat93 January 13, 2025 21:42

brukew added 2 commits January 15, 2025 11:42

Edited tests to use the api

00b3676

FIxed test file paths

4057794

Uodated tutorial with api and more info

3600eca

fabiocat93 added the to-test-gpu label Jan 16, 2025

fabiocat93 added 2 commits January 16, 2025 23:58

Merge branch 'main' into 173-task-pose-estimation

32bc0a1

Update pyproject.toml - dependency versioning with ~=

9e2c82e

fabiocat93 reviewed Jan 16, 2025

View reviewed changes

brukew added 7 commits January 18, 2025 12:12

Plot image with visualize

294739c

Fixed visualization

68f1857

Unified landmarks + visualization

8d646b0

Remove test block in api

216c80b

Remove plot visualization - sanity check

1193edb

Moved visualization plot out of api

352b082

Optionally plot pose image instead of auto.

8458609

fabiocat93 approved these changes Jan 22, 2025

View reviewed changes

fabiocat93 merged commit aa72723 into main Jan 22, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Media Pipe Pose Estimation + Visualization #203

Media Pipe Pose Estimation + Visualization #203

brukew commented Nov 19, 2024 •

edited

Loading

github-actions bot left a comment

fabiocat93 commented Nov 20, 2024

fabiocat93 left a comment

brukew commented Nov 20, 2024

fabiocat93 commented Dec 23, 2024

brukew commented Dec 29, 2024

brukew commented Jan 8, 2025

fabiocat93 commented Jan 9, 2025

brukew commented Jan 12, 2025

brukew commented Jan 12, 2025

fabiocat93 commented Jan 13, 2025

fabiocat93 commented Jan 13, 2025

codecov-commenter commented Jan 15, 2025 •

edited

Loading

fabiocat93 left a comment

fabiocat93 commented Jan 16, 2025

brukew commented Jan 17, 2025

fabiocat93 commented Jan 17, 2025

brukew commented Jan 21, 2025 •

edited

Loading

fabiocat93 left a comment

Media Pipe Pose Estimation + Visualization #203

Media Pipe Pose Estimation + Visualization #203

Conversation

brukew commented Nov 19, 2024 • edited Loading

Description

Related Issue(s)

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

github-actions bot left a comment

Choose a reason for hiding this comment

fabiocat93 commented Nov 20, 2024

fabiocat93 left a comment

Choose a reason for hiding this comment

brukew commented Nov 20, 2024

fabiocat93 commented Dec 23, 2024

brukew commented Dec 29, 2024

brukew commented Jan 8, 2025

fabiocat93 commented Jan 9, 2025

brukew commented Jan 12, 2025

brukew commented Jan 12, 2025

fabiocat93 commented Jan 13, 2025

fabiocat93 commented Jan 13, 2025

codecov-commenter commented Jan 15, 2025 • edited Loading

Codecov Report

fabiocat93 left a comment

Choose a reason for hiding this comment

fabiocat93 commented Jan 16, 2025

brukew commented Jan 17, 2025

fabiocat93 commented Jan 17, 2025

brukew commented Jan 21, 2025 • edited Loading

fabiocat93 left a comment

Choose a reason for hiding this comment

brukew commented Nov 19, 2024 •

edited

Loading

codecov-commenter commented Jan 15, 2025 •

edited

Loading

brukew commented Jan 21, 2025 •

edited

Loading