Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about projection #16

Open
muses0229 opened this issue Jan 21, 2024 · 16 comments
Open

question about projection #16

muses0229 opened this issue Jan 21, 2024 · 16 comments

Comments

@muses0229
Copy link

muses0229 commented Jan 21, 2024

Hi,I want to project the world_src_pts onto 2D image plane,and I use the projection() function from your previous work SHERF,and the projected 2d points seems like wrong,could you help me solve this?Following is my code:

src_uv = projection(world_src_pts.reshape(bs, -1, 3), camera_R, camera_T, camera_K) # [bs, N, 6890, 3]
src_uv = src_uv.view(-1, *src_uv.shape[2:])

Here camera_K is the camera intrinsic'),camera_R is the ['R'] in smpl_param,camera_T is the ['Th'] in smpl_param.

@muses0229
Copy link
Author

This is the world_src_pts:
world_src_pts

@muses0229
Copy link
Author

This is my projected 2d points:
src_uv

@muses0229
Copy link
Author

Then I tried to add a translation of 400 on the 2D coordinates and obtained the following result:
src_uv+400

@skhu101
Copy link
Owner

skhu101 commented Jan 22, 2024

Hi, thanks for your interest in our work. You can put the following code in train.py to get the projection results.

RT = torch.cat([torch.tensor(viewpoint_cam.R.transpose()), torch.tensor(viewpoint_cam.T).reshape(3,1)], -1)[None, None].cuda()
xyz = torch.repeat_interleave(torch.tensor(viewpoint_cam.world_vertex)[None, None], repeats=RT.shape[1], dim=1) #[bs, view_num, , 3]
xyz = torch.matmul(RT[:, :, None, :, :3].float(), xyz[..., None].float()) + RT[:, :, None, :, 3:].float()
xyz = torch.matmul(torch.tensor(viewpoint_cam.K)[None, None][:, :, None].float().cuda(), xyz)[..., 0]
xy = xyz[..., :2] / (xyz[..., 2:] + 1e-5)
src_uv = xy.view(-1, *xy.shape[2:])

test_image = gt_image.clone().permute(1,2,0)
test_image[src_uv[0,:,1].type(torch.LongTensor), src_uv[0,:,0].type(torch.LongTensor)] = 1
imageio.imwrite(f'vertex_img.png', (255*test_image).cpu().numpy().astype(np.uint8))

@muses0229
Copy link
Author

Thanks a lot!And I'd like to know what coordinate system is mean3D defined in,in world or in smpl?

@skhu101
Copy link
Owner

skhu101 commented Jan 23, 2024

In world coordinate.

@muses0229
Copy link
Author

Thanks.I found that the evaluation metrics are calculated within the entire range of the image, and I want to only calculate psnr, ssim, and lpips within the mask range of the human body,could you help me solve this?

@skhu101
Copy link
Owner

skhu101 commented Jan 23, 2024

Hi, you can calculate the psnr, ssim, and lpips with the help of a bounding box mask.

@muses0229
Copy link
Author

Thanks for replying,I'd like to calculate the psnr, ssim, and lpips inside human mask(denote as bkgd_mask in your code),I found this code in render.py:

rendering.permute(1,2,0)[bound_mask[0]==0] = 0 if background.sum().item() == 0 else 1

Can I replace bound_mask to bkgd_mask so that I will achieve my purpose?
By the way,I'm a little confused about the code.In my opinion,it seems like gauhuamn is still calculate the metrics within whole image since the zero pixels are not been removed,I think we need to flatten the image and mask,and only preserve the pixels within mask,than the metrics are computed under mask.

@skhu101
Copy link
Owner

skhu101 commented Jan 25, 2024

Thanks for your question. In 3D human reconstruction, we learn both the 3D human part and also the background part. By following the routine of HumanNeRF papers, we can either calculate the metric based on the whole image or the image cropped by a bound box mask.

@muses0229
Copy link
Author

Thanks a lot!Additionally, I‘d like to ask about the requirements for training perspectives and the number of training images in Gauhuman. I found Gauhuman selected training view as [4] and sampled a total of 100 images per 5 frames. In order to compare with other methods, I modified the training setting and set the training view as [0].,and 570 continuous images were taken for training, but the results were very poor. Why did the result drop so baddly?

@skhu101
Copy link
Owner

skhu101 commented Apr 3, 2024

Hi, we follow the setting of instant-nvr for performance comparison. Is the performance drop consistent for both the ZJU_MoCap and MonoCap data sets?

@skhu101 skhu101 closed this as completed Apr 12, 2024
@skhu101 skhu101 reopened this Apr 12, 2024
@JiatengLiu
Copy link

Hi, @skhu101 @yejr0229 . Continue with the problem mentioned above #16 (comment). If the means3D is defined in the world space, why should we tranform the means3D from the smpl space to world space like the code below:
world_src_pts = torch.matmul(smpl_src_pts, R_inv) + Th
That means the transformed means3D is defined in the world space and the before transformed means3D is defined in the smpl space?

@skhu101
Copy link
Owner

skhu101 commented Oct 28, 2024

Hi, means3D is defined in world space. We transform the canonical SMPL pose from world space to SMPL space and then transform the target SMPL pose from SMPL space to the world space.

@JiatengLiu
Copy link

JiatengLiu commented Oct 28, 2024

Hi, could you tell me where do you implement the transformation of means3D from world space to SMPL space? Besides, I want to process inverse LBS in posed means3D, how can I get bweight of each posed means3D?

@skhu101
Copy link
Owner

skhu101 commented Oct 29, 2024

Hi, you can refer to this function to find the transformation details and bweight of each posed means3D.

def coarse_deform_c2source(self, query_pts, params, t_params, t_vertices, lbs_weights=None, correct_Rs=None, return_transl=False):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants