Ground truth data and resnet18 output prediction cooridnates not clear #8

ccaccavella · 2025-01-16T13:57:11Z

Hi all,

Thanks for the great work! I have few doubts:

1. ResNet18 Model Output Interpretation:

Normalization: Are the 12 output values from the ResNet18 model normalized (output from predict3_npz)? If so, could you provide details?
Coordinate System and Camera Model: The paper mentions the use of a pinhole camera model. However, I have observed instances of negative depth values in the output. Could you clarify the coordinate system employed and how the camera model is defined? Specifically, how should one interpret the translation vector, and what does a negative depth signify in this context?
3D to 2D Projection: I aim to project the 3D coordinates obtained from the ResNet18 model onto the 2D image plane to visualize the hand's location (first 3 values of the output). Could you provide guidance or the correct methodology to accurately perform this projection?

2.Ground Truth Data Format:

In the real_eval_data/regular/gt_events/...txt files, each entry comprises 15 values across 150 data points. My understanding is that the ground truth should consist of 12 values. Could you clarify these 15 values?

Thanks for the help!

Provide feedback