-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DS-9] wikidata online end point #409
base: master
Are you sure you want to change the base?
Conversation
…to DS_9-Wikidata_OnlineEndPoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes:
- There are a lot of unrelated changes. Please remove them from this PR and create a new PR for them.
- Subgraph methods should be also supported for WikiData, which have not been added yet in this PR.
- The API callings should be placed inside each router method, i.e. move these
square-core/datastore-api/app/core/wikidata.py
Lines 185 to 205 in 1bbe9a5
try: if path == f'/datastores/kg/{WikiData.kg_name}/nodes/query_by_name': response = await wikidata.get_entity_by_names(names = await request.json()) return JSONResponse(status_code=200, content=response) elif path == f'/datastores/kg/{WikiData.kg_name}/nodes/query_by_ids': return JSONResponse(status_code=200, content=await wikidata.get_entity_by_ids(ids = await request.json())) elif path == f'/datastores/kg/{WikiData.kg_name}/edges/query_by_ids': response = await wikidata.get_edges_for_id_pairs(entity_pair_list = await request.json()) return JSONResponse(status_code=200, content=response) # This should only be used to search certain entities, saved as : # /datastores/kg/{WikiData.kg_name}/{entity_id} elif path.startswith(f'/datastores/kg/{WikiData.kg_name}/'): response = await wikidata.get_entity_by_ids([path.split("/")[-1]]) return JSONResponse(status_code=200, content=response) elif path == f'/datastores/kg/{WikiData.kg_name}' : response = "Wikidata API is alive" return JSONResponse(status_code=200, content=response)
into the correspondingrouters/kgs.py
methods. One can refer to the Bing search PR for this fea: add bing search datastore and tests #369 - This branch is out of date. Please pull the master branch
@@ -427,14 +427,14 @@ async def get_node_by_name_msearch(self, kg_name, names): | |||
index = f'{kg_name}{self.datastore_suffix}' | |||
body = [] | |||
for name in names: | |||
body.append({'index': index}) | |||
body.append({'index': index}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to this file connector.py
are unrelated to this PR indent. Please remove them and create a new PR if needed
async def new_binding(self, request: Request, item_value: str, item_type: str): | ||
user_id = await self._decode_user_id(request) | ||
async def new_binding(self, request: Request, item_value: str, item_type: str, default_user_id=None): | ||
if default_user_id is not None: | ||
user_id = default_user_id | ||
else: | ||
user_id = await self._decode_user_id(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? It seems that default_user_id
is never used
raise HTTPException(status_code=500, detail=str(e)) | ||
|
||
# This should not be reached | ||
return await call_next(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No newline at the end of file
return await call_next(request) | |
return await call_next(request) | |
return BingSearch() No newline at end of file | ||
return BingSearch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change not needed
"weight": None, | ||
"in_id": None, | ||
"out_id": None, | ||
"id": self._WikiData__without_link(res['item']['value']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is _WikiData__without_link
defined?
assert len(response.json()) == 1 | ||
assert len(response.json()) >= 1 | ||
assert sorted(response.json()[0]['fields'], key=lambda x: x['name']) == sorted(conceptnet_kg.dict()['fields'], key=lambda x: x['name']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from the master branch
square-core/datastore-api/tests/test_kgs.py
Lines 24 to 41 in d659549
def test_get_all_kgs(self, client: TestClient, conceptnet_kg: Datastore): | |
# Given: | |
url = "/datastores/kg" | |
expected_code = 200 | |
# When: | |
response = client.get(url) | |
# Then: | |
assert response.status_code == expected_code | |
kgs_preset = list(filter(lambda kg: kg["name"] == conceptnet_kg.name, response.json())) | |
assert len(kgs_preset) == 1 | |
kg_preset = kgs_preset[0] | |
conceptnet_kg_json = json.loads(conceptnet_kg.json()) | |
for field in kg_preset["fields"]: | |
assert field in conceptnet_kg_json["fields"] | |
for field in conceptnet_kg_json["fields"]: | |
assert field in kg_preset["fields"] |
def test_get_node_by_name(self, client: TestClient, conceptnet_kg: Datastore, test_node_lower: Document, token: str): | ||
# Given: | ||
url_post = f"/datastores/kg/{conceptnet_kg.name}/nodes/query_by_name" | ||
expected_code_post = 200 | ||
|
||
nodes_upper = ["Barack_Obama"] | ||
nodes_lower = ["barack_obama"] | ||
|
||
# When: | ||
response_post_upper = client.post( | ||
url_post, | ||
json=nodes_upper, | ||
headers={"Authorization": f"Bearer {token}"} | ||
) | ||
|
||
response_post_lower = client.post( | ||
url_post, | ||
json=nodes_lower, | ||
headers={"Authorization": f"Bearer {token}"} | ||
) | ||
|
||
# Then: | ||
assert response_post_lower.status_code == expected_code_post | ||
assert response_post_upper.status_code == expected_code_post | ||
assert response_post_lower.json()[test_node_lower.id] == test_node_lower | ||
assert response_post_upper.json()[test_node_lower.id] == test_node_lower | ||
|
||
def test_get_nodes_by_id(self, client: TestClient, conceptnet_kg: Datastore, test_node_lower: Document, token: str): | ||
# Given: | ||
url_post = f"/datastores/kg/{conceptnet_kg.name}/nodes/query_by_ids" | ||
expected_code_post = 200 | ||
|
||
nodes_upper = ["n1010", "n1234"] | ||
|
||
# When: | ||
response_post = client.post( | ||
url_post, | ||
json=nodes_upper, | ||
headers={"Authorization": f"Bearer {token}"} | ||
) | ||
|
||
# Then: | ||
assert response_post.status_code == expected_code_post | ||
assert response_post.json()[test_node_lower.id] == test_node_lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR
def test_wikidata_get_node_by_id(self, client: TestClient, wikidata_kg_name: str): | ||
# Given: | ||
node_id = "Q42" | ||
url = f"/datastores/kg/{wikidata_kg_name}/{node_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have such API in routers/kgs.py https://github.com/UKP-SQuARE/square-core/blob/master/datastore-api/app/routers/kgs.py?
assert any("Turkey" == name["name"] for name in response.json().values()) | ||
|
||
# Check wheter description is not null | ||
assert all(len(name["description"]) > 2 for name in response.json().values()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert all(len(name["description"]) > 2 for name in response.json().values()) | |
assert all(len(name["description"]) for name in response.json().values()) |
datastore-api/tests/test_kgs.py
Outdated
assert response_get.json() == nodes No newline at end of file | ||
assert response_get.json() == nodes | ||
|
||
def test_get_subgraph(self, client: TestClient, kg_name: str, test_node_lower: Document, token: str): | ||
# Given: | ||
url_post = f"/datastores/kg/{kg_name}/subgraph/query_by_node_name" | ||
|
||
# When: | ||
response_post = client.post( | ||
url_post, | ||
json= {"nids": [test_node_lower["name"]], "hops": 2 }, | ||
headers={"Authorization": f"Bearer {token}"} | ||
) | ||
|
||
# Then: | ||
assert response_post.status_code == 200 | ||
assert len(response_post.json()) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. Please creat another PR for this
What does this PR do?
This PR is adding w WikiData-KG-Endpoint to the already existing KG-API. The main difference to a conventional KG, like "Conceptnet", is that we are using a middleware to connect to the Wikidata online Query service, which uses SPARQL code to retrieve the necessary information from Wikidata. Loading an offline version of WD into our Elasticsearch-cluster is not feasible.
This PR contains the main request functionalities, which are necessary for the working with KGs, except the Get-Subgraph request. This is part of a different PR.
Changes:
*
datastore-api/app/core/wikidata.py
- new class for connecting to WikiData Query-service. The top class is handling the queries and has the necessary methods for searching edges/nodes. At the bottom you will find the routing for the middleware*
datastore-api/tests/conftest.py
- Modifications on conftest.py were necessary to test the endpoint. Presaved answers and connections to the elasticsearch-cluster are handled here.*
datastore-api/tests/test_kgs.py
- All new requests have accordingly tests. All requests are added to the KG-tests.Before submitting / marking as 'ready to review'