Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification Regarding CoM Samples and Grounding Tool in Cogcom #28

Open
zhaochen0110 opened this issue Nov 14, 2024 · 1 comment
Open

Comments

@zhaochen0110
Copy link

Hi,

Interesting work, but I have a question regarding the grounding process used for the CoM samples.

In the paper, you mention using a grounding tool to tag the data. However, when I tried applying the GroundingDINO model during the CoM construction process, I encountered difficulties in achieving accurate recognition, particularly with dots inside polyline graphs.

My question is: were the bounding boxes and results generated during the grounding step manually annotated, or were they produced through tuning the GroundingDINO model on the chart data?

Any clarification on this process would be greatly appreciated!

Thanks!

@qijimrc
Copy link
Collaborator

qijimrc commented Nov 27, 2024

Hi,

Interesting work, but I have a question regarding the grounding process used for the CoM samples.

In the paper, you mention using a grounding tool to tag the data. However, when I tried applying the GroundingDINO model during the CoM construction process, I encountered difficulties in achieving accurate recognition, particularly with dots inside polyline graphs.

My question is: were the bounding boxes and results generated during the grounding step manually annotated, or were they produced through tuning the GroundingDINO model on the chart data?

Any clarification on this process would be greatly appreciated!

Thanks!

Thank you very much for your interest in our work.

We manually annotated the CoM reasoning data sourced from artificial graphical images (i.e., ChartQA, MathVista), as we found it difficult to use tools like GroundingDINO to label boxes in these images. The manual annotation includes the reasoning process with visual evidence (i.e., boxes, lines, OCR results) for each data sample, which follows the same paradigm and data structure as the automated generation pipeline used for natural images except for the lines drawing.

Using tools to obtain useful information from artificial images is challenging, and we are uncertain if DINO-x can effectively address this issue. We would be happy to discuss any problems related to this topic in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants