Training results are very poor. #4

xungeer29 · 2023-11-06T07:34:45Z

Thank you very much for making your training code public.
I used your default config file to train the model, just modify the batch_size to 256. The final results achieved 81+ for MPJPE and 88+ for MPVPE.
I don't know why the results are so bad.

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

The text was updated successfully, but these errors were encountered:

walsvid · 2023-11-14T07:04:47Z

Hi @xungeer29. Could you reproduce the metrics mentioned in the readme using the released checkpoint? My suggestion is to first ensure that the results of the official model inference can be reproduced before making any modifications. Additionally, according to the linear scaling rule, when the batch size changes, the learning rate also needs to be adjusted accordingly to achieve similar performance.

xungeer29 · 2023-11-14T09:16:07Z

Hi @xungeer29. Could you reproduce the metrics mentioned in the readme using the released checkpoint? My suggestion is to first ensure that the results of the official model inference can be reproduced before making any modifications. Additionally, according to the linear scaling rule, when the batch size changes, the learning rate also needs to be adjusted accordingly to achieve similar performance.

I can reproduce the metrics mentioned in the readme using the released checkpoint, as shown below

joint mean error:
    left: 10.745436884462833 mm, right: 9.604906663298607 mm
    all: 10.17517177388072 mm
vert mean error:
    left: 10.490822605788708 mm, right: 9.404349140822887 mm
    all: 9.947585873305798 mm
pixel joint mean error:
    left: 6.331959247589111 mm, right: 5.808093070983887 mm
    all: 6.070026397705078 mm
pixel vert mean error:
    left: 6.235781669616699 mm, right: 5.725203037261963 mm
    all: 5.98049259185791 mm
root error: 28.982944786548615 mm

LR only have a small impact on the results and cannot make the network completely non convergent.
And I tried to linearly increase lr based on the batch size, but it also had no effect.

luckyday2022 · 2023-12-04T10:25:59Z

非常感谢您公开您的训练代码。我使用您的默认配置文件来训练模型，只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。我不知道为什么结果这么糟糕。

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

xungeer29 · 2023-12-04T11:03:07Z

非常感谢您公开您的训练代码。我使用您的默认配置文件来训练模型，只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。我不知道为什么结果这么糟糕。

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

luckyday2022 · 2023-12-04T11:41:25Z

非常感谢您公开您的训练代码。我使用您的默认配置文件来训练模型，只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。我不知道为什么结果这么糟糕。

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

Is there something wrong with the test code?

xungeer29 · 2023-12-04T11:50:59Z

非常感谢您公开您的训练代码。我使用您的默认配置文件来训练模型，只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。我不知道为什么结果这么糟糕。

[11/04 02:22:49] Training INFO: [Epoch 49/50][Batch 300/357][lr 0.000000][loss_seg: 0.0541][loss_dense: 0.0002][loss_lovasz: 0.0125][loss_joint_left_uv_0: 0.0055][loss_joint_right_uv_0: 0.0054][loss_mesh_left_uv_0: 0.0071][loss_mesh_right_uv_0: 0.0075][loss_joint_left_xyz_0: 0.0066][loss_joint_right_xyz_0: 0.0064][loss_mesh_left_xyz_0: 0.0080][loss_mesh_right_xyz_0: 0.0082][loss_edge_left_0: 0.0135][loss_edge_right_0: 0.0138][loss_normal_left_0: 0.0294][loss_normal_right_0: 0.0300][loss_offset_0: 0.0033][loss_joint_left_uv_1: 0.0052][loss_joint_right_uv_1: 0.0052][loss_mesh_left_uv_1: 0.0069][loss_mesh_right_uv_1: 0.0068][loss_joint_left_xyz_1: 0.0065][loss_joint_right_xyz_1: 0.0063][loss_mesh_left_xyz_1: 0.0079][loss_mesh_right_xyz_1: 0.0078][loss_edge_left_1: 0.0135][loss_edge_right_1: 0.0137][loss_normal_left_1: 0.0292][loss_normal_right_1: 0.0291][loss_offset_1: 0.0031][loss_joint_left_uv_2: 0.0051][loss_joint_right_uv_2: 0.0050][loss_mesh_left_uv_2: 0.0068][loss_mesh_right_uv_2: 0.0067][loss_joint_left_xyz_2: 0.0065][loss_joint_right_xyz_2: 0.0063][loss_mesh_left_xyz_2: 0.0079][loss_mesh_right_xyz_2: 0.0078][loss_edge_left_2: 0.0135][loss_edge_right_2: 0.0136][loss_normal_left_2: 0.0291][loss_normal_right_2: 0.0291][loss_offset_2: 0.0031]
[11/04 02:24:36] Training INFO: Save checkpoint to ./checkpoints/DIR/checkpoint/latest.pth
[11/04 02:30:25] Training INFO: MPJPE_0: left 80.77075399604499 mm, right 81.86647319326214 mm, AVG 81.31861359465356 mm
[11/04 02:30:25] Training INFO: MPVPE_0: left 87.17533539907605 mm, right 89.05994439241933 mm, AVG 88.11763989574769 mm
[11/04 02:30:25] Training INFO: MPJPE_1: left 80.8181789283659 mm, right 82.71075034258412 mm, AVG 81.76446463547501 mm
[11/04 02:30:25] Training INFO: MPVPE_1: left 87.29970373359382 mm, right 89.37457580776776 mm, AVG 88.33713977068078 mm
[11/04 02:30:25] Training INFO: MPJPE_2: left 81.22921638629016 mm, right 83.13988862084408 mm, AVG 82.18455250356712 mm
[11/04 02:30:25] Training INFO: MPVPE_2: left 87.71276979469785 mm, right 89.84211043399922 mm, AVG 88.77744011434854 mm

Have you solved this problem?

No.

Is there something wrong with the test code?

The testing results using released checkpoints are correct. Have you also encountered the same problem?

luckyday2022 · 2023-12-04T14:18:44Z

只需将batch_size修改为 256。MPJPE的最终结果为81+，MPVPE的最终结果为88+。
我不知道为什么结果这么糟糕。

Have you trained the original model? What's the result?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training results are very poor. #4

Training results are very poor. #4

xungeer29 commented Nov 6, 2023 •

edited

Loading

walsvid commented Nov 14, 2023

xungeer29 commented Nov 14, 2023

luckyday2022 commented Dec 4, 2023

xungeer29 commented Dec 4, 2023

luckyday2022 commented Dec 4, 2023

xungeer29 commented Dec 4, 2023 •

edited

Loading

luckyday2022 commented Dec 4, 2023

Training results are very poor. #4

Training results are very poor. #4

Comments

xungeer29 commented Nov 6, 2023 • edited Loading

walsvid commented Nov 14, 2023

xungeer29 commented Nov 14, 2023

luckyday2022 commented Dec 4, 2023

xungeer29 commented Dec 4, 2023

luckyday2022 commented Dec 4, 2023

xungeer29 commented Dec 4, 2023 • edited Loading

luckyday2022 commented Dec 4, 2023

xungeer29 commented Nov 6, 2023 •

edited

Loading

xungeer29 commented Dec 4, 2023 •

edited

Loading