Problems in reproducing subgraph retrieval #6

yuancu · 2023-04-01T15:21:09Z

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?
In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?
Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

The text was updated successfully, but these errors were encountered:

zhenjia2017 · 2023-04-04T03:33:00Z

Hi, I will publish the code for the problem you mentioned below ASAP.

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?

In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?

Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

zhenjia2017 · 2023-04-04T08:32:19Z

Hi Zhen! I'm trying to reproduce your subgraph retrieval method for other datasets, but I encountered several problems.

I am confused about how you got the facts, i.e. SPO.txt. You mentioned CLOCQ in another issue. If I got you right, did you use CLOCQ to retrieve all facts of each grounded entity?

In the paper, an important step is to inject connectivity. In your implementation, however, (AFAIK) the steps of generating the connectivity file are not included. Did you also use CLOCQ's shortest path API to get the shortest path between each question nodes?

Further question regarding point 2: the assumption that 2 works is that you ground at least 2 entities from the question. So if you only linked one or less entity in the question, the method doesn't work at all, am I correct?

I'd really appreciate it if you could share how you made it. Thank you in advance :)

I）I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.
2) I use CLOCQ to check if the nerd entities are connected in one-hop and two-hops. If they are connected, I use "connect" function of CLOCQ to retrieve all paths. And then I use cosine similarity to choose the best paths for a question. (I will provide the code).
3) If there is only one nerd entity, that can not be grouped into a pair, yes, there is no need to find the best path because all facts share one nerd entity and they should be always connected.

yuancu · 2023-04-04T08:49:08Z

I）I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.

So if I got you right, you used the GET /api/search_space API described here to retrieve the question-related facts, instead of searching for neighbors of grounded entities with GET /api/neighborhood? Therefore the entity linking step and the facts retrieval step are standalone to each other.

If so, isn't is possible that the retrieved facts have nothing to do with the linked entities? This is against the common approach where the facts are retrieved by searching connections with the linked entities.

An imaginary example:

question: How old is Barack Obama's daughter
retrieved facts: [Old Henry - daughter - Anna]
linked entity: Barack_Obama

Also, for this example, the statement there is no need to find the best path because all facts share one nerd entity and they should be always connected does not hold -- the facts are not connected with the nerd entity.

zhenjia2017 · 2023-04-08T06:33:55Z

I）I use ELQ + TagMe to get the nerd entities for each question, and then I use CLOCQ (https://github.com/PhilippChr/CLOCQ) to retrieve relevant facts for each question.

So if I got you right, you used the GET /api/search_space API described here to retrieve the question-related facts, instead of searching for neighbors of grounded entities with GET /api/neighborhood? Therefore the entity linking step and the facts retrieval step are standalone to each other.

If so, isn't is possible that the retrieved facts have nothing to do with the linked entities? This is against the common approach where the facts are retrieved by searching connections with the linked entities.

An imaginary example:

question: How old is Barack Obama's daughter

retrieved facts: [Old Henry - daughter - Anna]

linked entity: Barack_Obama

Also, for this example, the statement there is no need to find the best path because all facts share one nerd entity and they should be always connected does not hold -- the facts are not connected with the nerd entity.

EXAQT does not use search_space API to retrieve facts. EXAQT uses neighborhood API to retrieve facts for NERD entities.

yuancu · 2023-04-08T13:04:47Z

Thank you, this solves my problem :)

zhenjia2017 · 2023-04-08T22:40:02Z

Thank you, this solves my problem :)

I added three scripts in answer_graph folder. (1) "seed_path_extractor.py" for retrieving the best path between seed entities. (2) "get_CLOCQ_Wikidata_SPOs.py" for retrieving facts from CLOCQ and converting to spo format. (3) "get_fact_for_question" for generating spo.txt for each question in the benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems in reproducing subgraph retrieval #6

Problems in reproducing subgraph retrieval #6

yuancu commented Apr 1, 2023 •

edited

Loading

zhenjia2017 commented Apr 4, 2023

zhenjia2017 commented Apr 4, 2023

yuancu commented Apr 4, 2023 •

edited

Loading

zhenjia2017 commented Apr 8, 2023

yuancu commented Apr 8, 2023

zhenjia2017 commented Apr 8, 2023 •

edited

Loading

Problems in reproducing subgraph retrieval #6

Problems in reproducing subgraph retrieval #6

Comments

yuancu commented Apr 1, 2023 • edited Loading

zhenjia2017 commented Apr 4, 2023

zhenjia2017 commented Apr 4, 2023

yuancu commented Apr 4, 2023 • edited Loading

zhenjia2017 commented Apr 8, 2023

yuancu commented Apr 8, 2023

zhenjia2017 commented Apr 8, 2023 • edited Loading

yuancu commented Apr 1, 2023 •

edited

Loading

yuancu commented Apr 4, 2023 •

edited

Loading

zhenjia2017 commented Apr 8, 2023 •

edited

Loading