Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments on ActivityNet-Captions #44

Open
minjoong507 opened this issue Sep 3, 2024 · 0 comments
Open

Experiments on ActivityNet-Captions #44

minjoong507 opened this issue Sep 3, 2024 · 0 comments

Comments

@minjoong507
Copy link

Hi there,

Thank you for sharing your work.

I conducted experiments on Temporal Grounding using TimeChat with ActivityNet-Captions. Based on my experiments using the checkpoint, I obtained the following results: R@1 IoU=0.3: 10.06 | R@1 IoU=0.5: 4.64 | R@1 IoU=0.7: 2.04 | mIoU: 6.87

The performance was poorer than I expected, and it seems to perform worse than previous Video-LLMs that have not been trained on Temporal Grounding, such as Video-ChatGPT and Video-LLaVA.

Furthermore, I've attempted to fine-tune TimeChat on ActivityNet-Captions using the checkpoint, but it still did not perform very well: R@1 IoU=0.3: 49.23 | R@1 IoU=0.5': 13.71 | R@1 IoU=0.7': 1.62 | mIoU: 30.55. I followed the configuration instructions found here and changed anno_dir to use videos in ActivityNet-Captions dataset.

Have you experimented with Temporal Grounding using TimeChat on ActivityNet-Captions? Could you give me some guides to improve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant