Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASAP] Some questions on datasets and raw data. #2

Open
revoluzionario opened this issue Apr 21, 2024 · 1 comment
Open

[ASAP] Some questions on datasets and raw data. #2

revoluzionario opened this issue Apr 21, 2024 · 1 comment

Comments

@revoluzionario
Copy link

In the statistics you provided, I see that MKG-W has 15000 entities, MKG-Y has 15000 entities but they respectively have 9203 folders and 14481 folders. Is this result okay? Does it have any problems?
image
Second, I don't understand what column "image" means, when I checked MKG-W raw dataset and I saw 27589 images, and MKG-Y raw dataset has 44386 images
image
image

And the number of texts is not the same with the number I checked through using "Find and Replace" with "dbpedia.net", though the error is negligible.

Hope to receive your answer soon!

@quqxui
Copy link
Owner

quqxui commented Apr 29, 2024

Thank you for your interest in our work.

Firstly, the MKG-W & -Y dataset consists of 15,000 entities, indicating the number of nodes in the knowledge graph. However, due to unavailability, a portion of the entities lacks both image and text data, resulting in an unequal number of image and text in the datasets.

In our published datasets, we provide three images for each entity. However, it's important to note that our method does not consider the learning of multiple images. As mentioned in section 4.3.1 of our paper, we randomly select one image as the representation for each entity.

Therefore, the raw datasets for MKG-W and MKG-Y contain more images than what we used in our paper.

If you have any further questions or need clarification on any aspect, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants