You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
I am a student researcher at the James Siliibard Brown Center for AI at SDSU. I am really interested in the paper that you wrote, "Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models.". I got a few questions regarding the evaluation metrics that you use.
First, you give a score between 1 and 5 using a GPT pipeline. Is it like you input the predicted response and the actual response in the gpt model and have a system prompt saying that give a score to the predicted response based on Correctness of Information, Detail Orientation, Contextual Understanding, Temporal Understanding, and Consistency. Also, if you do this, how do you do this for the whole dataset? I don't see that you mentioned exactly how the evaluation works.
Also, there is a percentage score associated with the dataset I am referring to in Table 2. I don't understand how you calculate that accuracy for each dataset. Like, what's the exact structure or calculations you use?
Overall I like your paper and approach; I will appreciate it if you can answer the above questions.
Thank you,
Sanchit Singh
Student Researcher at James Sillibard Brown Center for AI at SDSU
The text was updated successfully, but these errors were encountered:
Hey,
I am a student researcher at the James Siliibard Brown Center for AI at SDSU. I am really interested in the paper that you wrote, "Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models.". I got a few questions regarding the evaluation metrics that you use.
First, you give a score between 1 and 5 using a GPT pipeline. Is it like you input the predicted response and the actual response in the gpt model and have a system prompt saying that give a score to the predicted response based on Correctness of Information, Detail Orientation, Contextual Understanding, Temporal Understanding, and Consistency. Also, if you do this, how do you do this for the whole dataset? I don't see that you mentioned exactly how the evaluation works.
Also, there is a percentage score associated with the dataset I am referring to in Table 2. I don't understand how you calculate that accuracy for each dataset. Like, what's the exact structure or calculations you use?
Overall I like your paper and approach; I will appreciate it if you can answer the above questions.
Thank you,
Sanchit Singh
Student Researcher at James Sillibard Brown Center for AI at SDSU
The text was updated successfully, but these errors were encountered: