Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in reproducing subgraph retrieval #6

Open
yuancu opened this issue Apr 1, 2023 · 6 comments
Open

Problems in reproducing subgraph retrieval #6

yuancu opened this issue Apr 1, 2023 · 6 comments

Comments

@yuancu
Copy link

yuancu commented Apr 1, 2023

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

  1. I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?
  2. In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?
  3. Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

@zhenjia2017
Copy link
Owner

Hi, I will publish the code for the problem you mentioned below ASAP.

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

  1. I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?
  2. In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?
  3. Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

@zhenjia2017
Copy link
Owner

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

  1. I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?
  2. In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?
  3. Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

I)I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.
2) I use CLOCQ to check if the nerd entities are connected in one-hop and two-hops. If they are connected, I use "connect" function of CLOCQ to retrieve all paths. And then I use cosine similarity to choose the best paths for a question. (I will provide the code).
3) If there is only one nerd entity, that can not be grouped into a pair, yes, there is no need to find the best path because all facts share one nerd entity and they should be always connected.

@yuancu
Copy link
Author

yuancu commented Apr 4, 2023

I)I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.

So if I got you right, you used the GET /api/search_space API described here to retrieve the question-related facts, instead of searching for neighbors of grounded entities with GET /api/neighborhood? Therefore the entity linking step and the facts retrieval step are standalone to each other.

If so, isn't is possible that the retrieved facts have nothing to do with the linked entities? This is against the common approach where the facts are retrieved by searching connections with the linked entities.

An imaginary example:

  • question: How old is Barack Obama's daughter
  • retrieved facts: [Old Henry - daughter - Anna]
  • linked entity: Barack_Obama

Also, for this example, the statement there is no need to find the best path because all facts share one nerd entity and they should be always connected does not hold -- the facts are not connected with the nerd entity.

@zhenjia2017
Copy link
Owner

I)I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.

So if I got you right, you used the GET /api/search_space API described here to retrieve the question-related facts, instead of searching for neighbors of grounded entities with GET /api/neighborhood? Therefore the entity linking step and the facts retrieval step are standalone to each other.

If so, isn't is possible that the retrieved facts have nothing to do with the linked entities? This is against the common approach where the facts are retrieved by searching connections with the linked entities.

An imaginary example:

  • question: How old is Barack Obama's daughter
  • retrieved facts: [Old Henry - daughter - Anna]
  • linked entity: Barack_Obama

Also, for this example, the statement there is no need to find the best path because all facts share one nerd entity and they should be always connected does not hold -- the facts are not connected with the nerd entity.

EXAQT does not use search_space API to retrieve facts. EXAQT uses neighborhood API to retrieve facts for NERD entities.

@yuancu
Copy link
Author

yuancu commented Apr 8, 2023

Thank you, this solves my problem :)

@zhenjia2017
Copy link
Owner

zhenjia2017 commented Apr 8, 2023

Thank you, this solves my problem :)

I added three scripts in answer_graph folder. (1) "seed_path_extractor.py" for retrieving the best path between seed entities. (2) "get_CLOCQ_Wikidata_SPOs.py" for retrieving facts from CLOCQ and converting to spo format. (3) "get_fact_for_question" for generating spo.txt for each question in the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants