You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your outstanding work! I have a question regarding the training process. When generating the conditioning signal, I understand that the point cloud is obtained from a single frame. However, during rendering, is the camera pose derived from running dust3r on a single frame, or from a clip of 25 frames? If it's the latter, could there be any discrepancies between the pose predicted from 25 frames during rendering and the one predicted from a single frame during inference?
Thank you for your help and for the excellent work you’ve done!
The text was updated successfully, but these errors were encountered:
During training, the camera poses are derived from all 25 frames. During inference, the reference camera pose is not predicted; instead, it is fixed at (r, 0, 0) in the world coordinate system, and the subsequent camera poses are specified by the users, so there should be no discrepancies.
Hi,
Thank you for your outstanding work! I have a question regarding the training process. When generating the conditioning signal, I understand that the point cloud is obtained from a single frame. However, during rendering, is the camera pose derived from running dust3r on a single frame, or from a clip of 25 frames? If it's the latter, could there be any discrepancies between the pose predicted from 25 frames during rendering and the one predicted from a single frame during inference?
Thank you for your help and for the excellent work you’ve done!
The text was updated successfully, but these errors were encountered: