Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-9] wikidata online end point #409

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

SevenDaysDA
Copy link
Collaborator

@SevenDaysDA SevenDaysDA commented Jan 26, 2023

What does this PR do?

This PR is adding w WikiData-KG-Endpoint to the already existing KG-API. The main difference to a conventional KG, like "Conceptnet", is that we are using a middleware to connect to the Wikidata online Query service, which uses SPARQL code to retrieve the necessary information from Wikidata. Loading an offline version of WD into our Elasticsearch-cluster is not feasible.

This PR contains the main request functionalities, which are necessary for the working with KGs, except the Get-Subgraph request. This is part of a different PR.

Changes:

*datastore-api/app/core/wikidata.py - new class for connecting to WikiData Query-service. The top class is handling the queries and has the necessary methods for searching edges/nodes. At the bottom you will find the routing for the middleware

*datastore-api/tests/conftest.py - Modifications on conftest.py were necessary to test the endpoint. Presaved answers and connections to the elasticsearch-cluster are handled here.

*datastore-api/tests/test_kgs.py - All new requests have accordingly tests. All requests are added to the KG-tests.

Before submitting / marking as 'ready to review'

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@SevenDaysDA SevenDaysDA requested a review from kwang2049 January 26, 2023 16:47
@SevenDaysDA SevenDaysDA temporarily deployed to test January 27, 2023 06:10 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 29, 2023 14:13 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 30, 2023 13:48 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 30, 2023 18:02 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 30, 2023 18:19 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 30, 2023 18:35 — with GitHub Actions Inactive
@SevenDaysDA SevenDaysDA temporarily deployed to test January 31, 2023 07:07 — with GitHub Actions Inactive
Copy link
Collaborator

@kwang2049 kwang2049 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes:

  1. There are a lot of unrelated changes. Please remove them from this PR and create a new PR for them.
  2. Subgraph methods should be also supported for WikiData, which have not been added yet in this PR.
  3. The API callings should be placed inside each router method, i.e. move these
    try:
    if path == f'/datastores/kg/{WikiData.kg_name}/nodes/query_by_name':
    response = await wikidata.get_entity_by_names(names = await request.json())
    return JSONResponse(status_code=200, content=response)
    elif path == f'/datastores/kg/{WikiData.kg_name}/nodes/query_by_ids':
    return JSONResponse(status_code=200, content=await wikidata.get_entity_by_ids(ids = await request.json()))
    elif path == f'/datastores/kg/{WikiData.kg_name}/edges/query_by_ids':
    response = await wikidata.get_edges_for_id_pairs(entity_pair_list = await request.json())
    return JSONResponse(status_code=200, content=response)
    # This should only be used to search certain entities, saved as :
    # /datastores/kg/{WikiData.kg_name}/{entity_id}
    elif path.startswith(f'/datastores/kg/{WikiData.kg_name}/'):
    response = await wikidata.get_entity_by_ids([path.split("/")[-1]])
    return JSONResponse(status_code=200, content=response)
    elif path == f'/datastores/kg/{WikiData.kg_name}' :
    response = "Wikidata API is alive"
    return JSONResponse(status_code=200, content=response)

    into the corresponding routers/kgs.py methods. One can refer to the Bing search PR for this fea: add bing search datastore and tests #369
  4. This branch is out of date. Please pull the master branch

@@ -427,14 +427,14 @@ async def get_node_by_name_msearch(self, kg_name, names):
index = f'{kg_name}{self.datastore_suffix}'
body = []
for name in names:
body.append({'index': index})
body.append({'index': index})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to this file connector.py are unrelated to this PR indent. Please remove them and create a new PR if needed

Comment on lines -40 to +44
async def new_binding(self, request: Request, item_value: str, item_type: str):
user_id = await self._decode_user_id(request)
async def new_binding(self, request: Request, item_value: str, item_type: str, default_user_id=None):
if default_user_id is not None:
user_id = default_user_id
else:
user_id = await self._decode_user_id(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? It seems that default_user_id is never used

raise HTTPException(status_code=500, detail=str(e))

# This should not be reached
return await call_next(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No newline at the end of file

Suggested change
return await call_next(request)
return await call_next(request)

Comment on lines 48 to 49
return BingSearch() No newline at end of file
return BingSearch()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change not needed

"weight": None,
"in_id": None,
"out_id": None,
"id": self._WikiData__without_link(res['item']['value'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is _WikiData__without_link defined?

Comment on lines -17 to 20
assert len(response.json()) == 1
assert len(response.json()) >= 1
assert sorted(response.json()[0]['fields'], key=lambda x: x['name']) == sorted(conceptnet_kg.dict()['fields'], key=lambda x: x['name'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from the master branch

def test_get_all_kgs(self, client: TestClient, conceptnet_kg: Datastore):
# Given:
url = "/datastores/kg"
expected_code = 200
# When:
response = client.get(url)
# Then:
assert response.status_code == expected_code
kgs_preset = list(filter(lambda kg: kg["name"] == conceptnet_kg.name, response.json()))
assert len(kgs_preset) == 1
kg_preset = kgs_preset[0]
conceptnet_kg_json = json.loads(conceptnet_kg.json())
for field in kg_preset["fields"]:
assert field in conceptnet_kg_json["fields"]
for field in conceptnet_kg_json["fields"]:
assert field in kg_preset["fields"]
Please pull it first

Comment on lines +46 to +89
def test_get_node_by_name(self, client: TestClient, conceptnet_kg: Datastore, test_node_lower: Document, token: str):
# Given:
url_post = f"/datastores/kg/{conceptnet_kg.name}/nodes/query_by_name"
expected_code_post = 200

nodes_upper = ["Barack_Obama"]
nodes_lower = ["barack_obama"]

# When:
response_post_upper = client.post(
url_post,
json=nodes_upper,
headers={"Authorization": f"Bearer {token}"}
)

response_post_lower = client.post(
url_post,
json=nodes_lower,
headers={"Authorization": f"Bearer {token}"}
)

# Then:
assert response_post_lower.status_code == expected_code_post
assert response_post_upper.status_code == expected_code_post
assert response_post_lower.json()[test_node_lower.id] == test_node_lower
assert response_post_upper.json()[test_node_lower.id] == test_node_lower

def test_get_nodes_by_id(self, client: TestClient, conceptnet_kg: Datastore, test_node_lower: Document, token: str):
# Given:
url_post = f"/datastores/kg/{conceptnet_kg.name}/nodes/query_by_ids"
expected_code_post = 200

nodes_upper = ["n1010", "n1234"]

# When:
response_post = client.post(
url_post,
json=nodes_upper,
headers={"Authorization": f"Bearer {token}"}
)

# Then:
assert response_post.status_code == expected_code_post
assert response_post.json()[test_node_lower.id] == test_node_lower
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR

def test_wikidata_get_node_by_id(self, client: TestClient, wikidata_kg_name: str):
# Given:
node_id = "Q42"
url = f"/datastores/kg/{wikidata_kg_name}/{node_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert any("Turkey" == name["name"] for name in response.json().values())

# Check wheter description is not null
assert all(len(name["description"]) > 2 for name in response.json().values())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert all(len(name["description"]) > 2 for name in response.json().values())
assert all(len(name["description"]) for name in response.json().values())

Comment on lines 309 to 325
assert response_get.json() == nodes No newline at end of file
assert response_get.json() == nodes

def test_get_subgraph(self, client: TestClient, kg_name: str, test_node_lower: Document, token: str):
# Given:
url_post = f"/datastores/kg/{kg_name}/subgraph/query_by_node_name"

# When:
response_post = client.post(
url_post,
json= {"nids": [test_node_lower["name"]], "hops": 2 },
headers={"Authorization": f"Bearer {token}"}
)

# Then:
assert response_post.status_code == 200
assert len(response_post.json()) > 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR. Please creat another PR for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants