Experiments on ActivityNet-Captions #44

minjoong507 · 2024-09-03T05:04:39Z

Hi there,

Thank you for sharing your work.

I conducted experiments on Temporal Grounding using TimeChat with ActivityNet-Captions. Based on my experiments using the checkpoint, I obtained the following results: R@1 IoU=0.3: 10.06 | R@1 IoU=0.5: 4.64 | R@1 IoU=0.7: 2.04 | mIoU: 6.87

The performance was poorer than I expected, and it seems to perform worse than previous Video-LLMs that have not been trained on Temporal Grounding, such as Video-ChatGPT and Video-LLaVA.

Furthermore, I've attempted to fine-tune TimeChat on ActivityNet-Captions using the checkpoint, but it still did not perform very well: R@1 IoU=0.3: 49.23 | R@1 IoU=0.5': 13.71 | R@1 IoU=0.7': 1.62 | mIoU: 30.55. I followed the configuration instructions found here and changed anno_dir to use videos in ActivityNet-Captions dataset.

Have you experimented with Temporal Grounding using TimeChat on ActivityNet-Captions? Could you give me some guides to improve this?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments on ActivityNet-Captions #44

Experiments on ActivityNet-Captions #44

minjoong507 commented Sep 3, 2024

Experiments on ActivityNet-Captions #44

Experiments on ActivityNet-Captions #44

Comments

minjoong507 commented Sep 3, 2024