SPARQL #11

GOLASOOO · 2024-02-04T10:19:40Z

Discuss SPARQL implementations.

GOLASOOO · 2024-02-04T11:15:32Z

[Moved from issue #6 to here as it is rather a big topic]
This is a bit long as an answer but I tried to simplify and make as clear as possible.
I have been researching a bit on the different approaches we can choose in order to query from Wikidata.

Spring Boot:

This is some info about most of the libraries:

They are plenty and all of them seem to use REST to query the data from wikidata's endpoint. Also, as far as I am concerned, all of them use JDBC for querying. However, due to volume I have not check on every one.

✅ Pros (for most of the libraries):

Many file formats available (json, csv, xml,...)
Well documented
Allow parameterized queries

❌ Cons:

A minute of processing time is granted per minute and per IP and 30 error queries per minute

If using Jena, which seems to be the most important library out there regarding SPARQL:

First, it also uses REST so the previous point will be likely to be applied here too. It provides a library to connect to Jena SPARQL API which has lots of different capabilities.

✅ Pros:

It has a RDF (format which can be queried via SPARQL) mapper which is based on JPA so they both look alike.
+ It provides an endpoint called sparqlify which is a rewritter from SPARQL to SQL. Its main features are:
- Supports JPA
- It is available on PostgreSQL
- It can be set so it automically simplifies queries for better performance
Adds another possible syntax for querying, which seems optional and well documented. However, I have not researched enough enough about it (I will do if you guys think it is an interesting option). It is based on algebra.
Allows query transformations
Provides code for implementing our own SPARQL endpoint

Node.js

Regarding Node.js, there are few libraries available which have not been updated in some years. It is way simpler as it is basically an HTTP request in js.

UO283615 · 2024-02-05T15:23:48Z

Regarding SPARQL I think that it is important to mention this question asked in the Arquisoft FAQ, that includes a webpage that allows us to test our queries.

jjgancfer · 2024-02-07T09:35:31Z

Regarding whether we should query from the frontend or the backend, if we consider performance, then I think we should do it from the former, mainly because that way it will increase. Once that question has been answered, it can be sent to the server in the case we need it.

GOLASOOO · 2024-02-07T11:31:12Z

Sorry for such a long answer but for now, we have two possible approaches regarding queries:
The first one would be using user agent as a client to the SPARQL Query Service, as you suggested.

Pros ✅:

Less bandwidth used on our server.
Simpler and faster to implement approach.
Probably faster or same performance that next approach on high specs devices.

Cons ❌:

More computational power and bandwidth required from user's devices, which we should discuss about those users whose devices are older or whose connections are weaker.
It is easier to exploit computational time provided by SPARQL Query Service API, specially if the user answers many questions in a short period of time or if some queries take many resources.
We would require some way of sorting questions depending on predictions on the time it will take to be computed by the API so that cheaper ones would be displayed at first. (This may lead to a game mode where questions would be increasing its difficulty until user misses a question as these queries will probably be much more detailed, but this is out of scope for now).
Connection through node.js libraries which were not updated recently.

On the other hand, we could have a service that automatically stores questions and its answers on a DB which are given to the user by the backend (it is more secure than querying from the user, in my opinion). This service could be running 24/7 storing answers to questions until a certain amount is reached for example.

Pros ✅:

We would get rid of the two first problems of the previous approach. Regarding the third, it could be simplified to an easier to implement version without needing to be so precise.
We could have persistance of questions which would allow us to be more independand about API's availability.
More control on errors occurring in intermediate answers for the same query.

Cons ❌:

More complex architecture and more resources are required even if they end up not being used.
We will possibly have problems regarding bandwidth if many users playing simoultaneously.
More complex code and service interactions.

Thanks for your opinion, it is helpful as we have not decided on which to use yet.

jjgancfer · 2024-02-14T17:42:55Z

On the other hand, we could have a service that automatically stores questions and its answers on a DB which are given to the user by the backend (it is more secure than querying from the user, in my opinion). This service could be running 24/7 storing answers to questions until a certain amount is reached for example.

I agree with this approach. I think it would be the best approach, mainly because with the new focus for questions/answer we would have to use multi-threading until the database is filled up to a useful degree. We would also be able to create it in a simpler language (we could, for instance, use Django and use a Python module to query). We can also find different tools here.

Cons ❌:

More complex architecture and more resources are required even if they end up not being used.

We will possibly have problems regarding bandwidth if many users playing simultaneously.

More complex code and service interactions.

I think this disadvantages are worth mentioning, but I think the second one is negated by the new model of querying early for questions and storing them. Likewise, the first might be slightly negated if we simplify it (making it something like running an script, for instance), but I think that kind of solution would be sub-optimal.

UO283615 · 2024-02-28T20:57:26Z

I've been working on the queries and I managed to write two queries that return all the countries and capitals in the world, one in XML format with a POST request and one in JSON with a GET request. I've appended the JSON with them to this message, said JSON can be imported into Postman and the queries will be loaded there.
ASW-Queries.json

GOLASOOO added the backend ⚙️ Backend issue label Feb 4, 2024

jjgancfer mentioned this issue Feb 4, 2024

Decide technologies #2

Closed

Toto-hitori mentioned this issue Feb 14, 2024

docs/jjgancfer #35

Merged

jjgancfer mentioned this issue Feb 19, 2024

Self Report: First delivery #39

Closed

Toto-hitori assigned UO283615 and GOLASOOO Feb 28, 2024

Toto-hitori closed this as completed Mar 10, 2024

UO283615 mentioned this issue Mar 11, 2024

Self reports: Second delivery #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARQL #11

SPARQL #11

GOLASOOO commented Feb 4, 2024

GOLASOOO commented Feb 4, 2024

UO283615 commented Feb 5, 2024

jjgancfer commented Feb 7, 2024

GOLASOOO commented Feb 7, 2024

jjgancfer commented Feb 14, 2024

UO283615 commented Feb 28, 2024

SPARQL #11

SPARQL #11

Comments

GOLASOOO commented Feb 4, 2024

GOLASOOO commented Feb 4, 2024

Spring Boot:

This is some info about most of the libraries:

✅ Pros (for most of the libraries):

❌ Cons:

If using Jena, which seems to be the most important library out there regarding SPARQL:

✅ Pros:

Node.js

UO283615 commented Feb 5, 2024

jjgancfer commented Feb 7, 2024

GOLASOOO commented Feb 7, 2024

jjgancfer commented Feb 14, 2024

UO283615 commented Feb 28, 2024