Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media Pipe Pose Estimation + Visualization #203

Merged
merged 26 commits into from
Jan 22, 2025
Merged

Conversation

brukew
Copy link
Contributor

@brukew brukew commented Nov 19, 2024

Description

Implemented Media Pipe pose estimation. Given an image path, will return a PoseSkeleton object with landmarks of each individual in the image.

Related Issue(s)

https://github.com/orgs/sensein/projects/45/views/3?pane=issue&itemId=82951656&issue=sensein%7Csenselab%7C173

Motivation and Context

This is the initial structure for pose estimation which is a valuable signal for behavior analysis. I will expand to more models and functionality, and with that, this will be more generalized.

How Has This Been Tested?

I tested with different kinds of images and attempts to access invalid properties of the PoseSkeleton object. Unit tests for the new functions + also manually tested for proper visualization.

Screenshots (if appropriate):

Types of changes

Created PoseSkeleton object that contains pose information for individuals in an image. Currently supports MediaPipe pose estimation + visualization functionality.

Checklist:

  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • My code follows the code style of this project.

@brukew brukew changed the title 173 task pose estimation Media Pipe Pose Estimation + Visualization Nov 19, 2024
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 First Pull Request 🎉

Welcome to Senselab, and thank you for submitting your first pull request! We’re thrilled to have your contribution. Our team will review it as soon as possible. Stay engaged, and let’s make behavioral data analysis even more powerful together!

@brukew brukew requested a review from fabiocat93 November 19, 2024 21:37
@fabiocat93
Copy link
Collaborator

Thank you, @brukew, for the updates. I have a few suggestions and points to address based on our previous discussions:

  1. Reorganize Code Structure
    Please re-organize your code by separating data structures from functionalities (what we refer to as tasks in Senselab).

    • The skeleton should be treated as a data structure. It should define the skeleton and optionally include utilities for visualization. Alternatively, visualization can be handled as a standalone task.
    • Pose estimation should be a task (generating the skeleton as an output). All human pose estimation models (e.g., MediaPipe, AlphaPose) should conform to a consistent data structure.
    • To encourage generalizability, I suggest integrating a second pose estimation tool, such as YOLO due to its simplicity.
  2. Model Inclusion
    The current approach of including a model within the source code (e.g., src/senselab/video/tasks/pose_estimation/models/pose_landmarker.task) makes the package unnecessarily heavy. Instead, please ensure models are downloaded as needed. You can take inspiration from this example.

  3. Documentation
    Please add a dedicated documentation page:

    • Explain human pose estimation as a task, its purpose, and supported models.
    • For instance, you can reference this documentation for MediaPipe.
    • Feel free to draw inspiration from the existing audio task documentation (though some sections are incomplete).
  4. Tutorial
    Create a Jupyter Notebook tutorial to demonstrate:

    • How to use the interface.
    • What functionalities are available.
      Add this under a video folder in the tutorial/ directory.
  5. Failing Tests
    I noticed two tests are failing:

    • test_valid_image_single_person
      • AssertionError: "Input and output image shapes should match."
    • test_visualization_single_person
      • ValueError: "Input image must contain three-channel BGR data."
        Please double-check these tests to ensure they pass.

Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brukew, I have commented some required changes.

@fabiocat93 fabiocat93 added enhancement New feature or request release minor Minor release to-test labels Nov 20, 2024
@fabiocat93 fabiocat93 marked this pull request as draft November 20, 2024 02:38
@brukew
Copy link
Contributor Author

brukew commented Nov 20, 2024

Nice, thank you for the feedback @fabiocat93. I will address your comments and ask questions as I go.

@brukew brukew force-pushed the 173-task-pose-estimation branch from e1a7696 to 269f0a6 Compare November 24, 2024 21:21
@fabiocat93
Copy link
Collaborator

hi @brukew , did you have any time to work further on this?

@brukew
Copy link
Contributor Author

brukew commented Dec 29, 2024

hey @fabiocat93, slowed down on development towards the end of the semester. I will continue this month.

@brukew
Copy link
Contributor Author

brukew commented Jan 8, 2025

@fabiocat93 lmk what you think about having separate estimator classes for different models. also, I kept the landmark names as they are originally listed but wondering if I should homogenize them at all.

@fabiocat93
Copy link
Collaborator

@fabiocat93 lmk what you think about having separate estimator classes for different models. also, I kept the landmark names as they are originally listed but wondering if I should homogenize them at all.

Good question. You can have multiple estimators, each being a child of the same abstract class and producing a skeleton following the same universal structure

@brukew
Copy link
Contributor Author

brukew commented Jan 12, 2025

addressed model download by downloading models to video/tasks/pose_estimation/models upon use. Let me know if there is a better alternative to this that is more in line with senselab use.

@brukew
Copy link
Contributor Author

brukew commented Jan 12, 2025

also wondering how I can ensure tutorial works (and cells are run in tutorial) if changes haven't been pushed to main branch @fabiocat93. should I wait until after?

@fabiocat93
Copy link
Collaborator

addressed model download by downloading models to video/tasks/pose_estimation/models upon use. Let me know if there is a better alternative to this that is more in line with senselab use.

Good for now. But you raised a good point. We will need to implement a customizable cache folder for the package (of course with a default value)

@fabiocat93
Copy link
Collaborator

also wondering how I can ensure tutorial works (and cells are run in tutorial) if changes haven't been pushed to main branch @fabiocat93. should I wait until after?

When testing the tutorial locally, you can install the package directly from a specific branch instead of relying on pip install senselab. This allows you to ensure the tutorial runs correctly with the latest changes before they are merged into the main branch

@brukew brukew marked this pull request as ready for review January 13, 2025 21:41
@brukew brukew requested a review from fabiocat93 January 13, 2025 21:42
@codecov-commenter
Copy link

codecov-commenter commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 93.67089% with 20 lines in your changes missing coverage. Please review.

Project coverage is 65.46%. Comparing base (113721a) to head (8458609).
Report is 88 commits behind head on main.

Files with missing lines Patch % Lines
src/senselab/video/data_structures/pose.py 88.46% 6 Missing ⚠️
...c/senselab/video/tasks/pose_estimation/estimate.py 94.73% 4 Missing ⚠️
...selab/video/tasks/pose_estimation/visualization.py 85.71% 4 Missing ⚠️
src/senselab/video/tasks/pose_estimation/api.py 88.00% 3 Missing ⚠️
src/senselab/video/tasks/pose_estimation/utils.py 94.28% 2 Missing ⚠️
src/tests/video/tasks/pose_estimation_test.py 98.70% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
+ Coverage   60.24%   65.46%   +5.21%     
==========================================
  Files         113      128      +15     
  Lines        4017     4572     +555     
==========================================
+ Hits         2420     2993     +573     
+ Misses       1597     1579      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brukew : great work on this! I have a few suggestions and questions for consideration:

  • I noticed the visualization method in the tutorial doesn't actually visualize the image but returns the array representation. Don’t you think it would make more sense for a visualization method (with the word visualize in the name) to display the image directly? Returning the array (and optionally saving it) could still be included as additional functionality.
  • Is there a specific reason for having both IndividualPose and ImagePose? Would it make sense to simplify by treating IndividualPose as a special case of ImagePose where there’s only one subject? It might reduce redundancy and improve maintainability.
  • I noticed there’s no mapping of the skeletons into a more general senseLab skeleton yet, and the visualization method is method-specific rather than generic for senselab. Was there a challenge in unifying this? Having two different skeletons and implementations feels less maintainable. Is there a particular reason for this approach?

Let me know if you’d like to discuss any of these points. I’m happy to help! Also, let me know if you prefer to fix the minor questions and merge and keep the rest of the changes for a future PR or if you want to keep it more ordered and work on this until everything is ready.

@fabiocat93
Copy link
Collaborator

A quick note for the entire group about installation: the number of dependencies, especially for video, makes installation quite heavy. Moving forward, we might want to split the package into modules like senselab[audio], senselab[video], and senselab[text]. This would simplify installation based on the user’s needs

@brukew
Copy link
Contributor Author

brukew commented Jan 17, 2025

Hey @fabiocat93, thanks for the comments - I will make all changes before merging. Here are my thoughts:

  1. Yes, that makes sense. I will do that.

  2. I think it gives us greater power for anything we want to do later. For example, if a video has different people over time, we could combine face analysis and pose estimation to aggregate the poses of an individual across frames which would be easier with IndividualPose object (with tweaks). It's also just intuitive to have a separate object for each individual detected. I could take it out for now though.

  3. I can do this. My thinking was that users could have access to whatever the given model output is and they can work with it how they want to, but I figure that defeats the point of having these in Senselab. So I guess I'll make a Senselab keypoint mapping and attempt to not lose any information across the models - so it includes all possible key points across models. What do you think about that?

@fabiocat93
Copy link
Collaborator

I think it gives us greater power for anything we want to do later. For example, if a video has different people over time, we could combine face analysis and pose estimation to aggregate the poses of an individual across frames which would be easier with IndividualPose object (with tweaks). It's also just intuitive to have a separate object for each individual detected. I could take it out for now though.

You convinced me with this. but how about always returning back an ImagePose object (which may include as many IndividualPose objects as the number of people detected?)

I can do this. My thinking was that users could have access to whatever the given model output is and they can work with it how they want to, but I figure that defeats the point of having these in Senselab. So I guess I'll make a Senselab keypoint mapping and attempt to not lose any information across the models - so it includes all possible key points across models. What do you think about that?

Sounds great! And please let me know if you need or want to brainstorm it further before implementing it.

@brukew
Copy link
Contributor Author

brukew commented Jan 21, 2025

@fabiocat93 Addressed the visualization and keypoint mapping comments. The unit tests kept failing on Git Hub when a plot was created - my workaround was adding a bool parameter for plotting but let me know if there is another way that you want me to handle that.

Copy link
Collaborator

@fabiocat93 fabiocat93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice job @brukew ! Looking forward to seeing more models integrated and some cools analyses based on this!!

@fabiocat93 fabiocat93 merged commit aa72723 into main Jan 22, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants