Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ground truth data and resnet18 output prediction cooridnates not clear #8

Open
ccaccavella opened this issue Jan 16, 2025 · 0 comments

Comments

@ccaccavella
Copy link

ccaccavella commented Jan 16, 2025

Hi all,

Thanks for the great work! I have few doubts:

1. ResNet18 Model Output Interpretation:

  • Normalization: Are the 12 output values from the ResNet18 model normalized (output from predict3_npz)? If so, could you provide details?

  • Coordinate System and Camera Model: The paper mentions the use of a pinhole camera model. However, I have observed instances of negative depth values in the output. Could you clarify the coordinate system employed and how the camera model is defined? Specifically, how should one interpret the translation vector, and what does a negative depth signify in this context?

  • 3D to 2D Projection: I aim to project the 3D coordinates obtained from the ResNet18 model onto the 2D image plane to visualize the hand's location (first 3 values of the output). Could you provide guidance or the correct methodology to accurately perform this projection?

2.Ground Truth Data Format:

  • In the real_eval_data/regular/gt_events/...txt files, each entry comprises 15 values across 150 data points. My understanding is that the ground truth should consist of 12 values. Could you clarify these 15 values?

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant