Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL #11

Closed
GOLASOOO opened this issue Feb 4, 2024 · 6 comments
Closed

SPARQL #11

GOLASOOO opened this issue Feb 4, 2024 · 6 comments
Assignees
Labels
backend ⚙️ Backend issue

Comments

@GOLASOOO
Copy link
Contributor

GOLASOOO commented Feb 4, 2024

Discuss SPARQL implementations.

@GOLASOOO GOLASOOO added the backend ⚙️ Backend issue label Feb 4, 2024
@GOLASOOO
Copy link
Contributor Author

GOLASOOO commented Feb 4, 2024

[Moved from issue #6 to here as it is rather a big topic]
This is a bit long as an answer but I tried to simplify and make as clear as possible.
I have been researching a bit on the different approaches we can choose in order to query from Wikidata.

Spring Boot:

This is some info about most of the libraries:

They are plenty and all of them seem to use REST to query the data from wikidata's endpoint. Also, as far as I am concerned, all of them use JDBC for querying. However, due to volume I have not check on every one.

✅ Pros (for most of the libraries):

  • Many file formats available (json, csv, xml,...)
  • Well documented
  • Allow parameterized queries

❌ Cons:

  • A minute of processing time is granted per minute and per IP and 30 error queries per minute

If using Jena, which seems to be the most important library out there regarding SPARQL:

First, it also uses REST so the previous point will be likely to be applied here too. It provides a library to connect to Jena SPARQL API which has lots of different capabilities.

✅ Pros:

  • It has a RDF (format which can be queried via SPARQL) mapper which is based on JPA so they both look alike.
    + It provides an endpoint called sparqlify which is a rewritter from SPARQL to SQL. Its main features are:
    • Supports JPA
    • It is available on PostgreSQL
    • It can be set so it automically simplifies queries for better performance
  • Adds another possible syntax for querying, which seems optional and well documented. However, I have not researched enough enough about it (I will do if you guys think it is an interesting option). It is based on algebra.
  • Allows query transformations
  • Provides code for implementing our own SPARQL endpoint

Node.js

Regarding Node.js, there are few libraries available which have not been updated in some years. It is way simpler as it is basically an HTTP request in js.

@UO283615
Copy link
Contributor

UO283615 commented Feb 5, 2024

Regarding SPARQL I think that it is important to mention this question asked in the Arquisoft FAQ, that includes a webpage that allows us to test our queries.

@jjgancfer
Copy link
Contributor

Regarding whether we should query from the frontend or the backend, if we consider performance, then I think we should do it from the former, mainly because that way it will increase. Once that question has been answered, it can be sent to the server in the case we need it.

@GOLASOOO
Copy link
Contributor Author

GOLASOOO commented Feb 7, 2024

Sorry for such a long answer but for now, we have two possible approaches regarding queries:
The first one would be using user agent as a client to the SPARQL Query Service, as you suggested.

Pros ✅:

  • Less bandwidth used on our server.
  • Simpler and faster to implement approach.
  • Probably faster or same performance that next approach on high specs devices.

Cons ❌:

  • More computational power and bandwidth required from user's devices, which we should discuss about those users whose devices are older or whose connections are weaker.
  • It is easier to exploit computational time provided by SPARQL Query Service API, specially if the user answers many questions in a short period of time or if some queries take many resources.
  • We would require some way of sorting questions depending on predictions on the time it will take to be computed by the API so that cheaper ones would be displayed at first. (This may lead to a game mode where questions would be increasing its difficulty until user misses a question as these queries will probably be much more detailed, but this is out of scope for now).
  • Connection through node.js libraries which were not updated recently.

On the other hand, we could have a service that automatically stores questions and its answers on a DB which are given to the user by the backend (it is more secure than querying from the user, in my opinion). This service could be running 24/7 storing answers to questions until a certain amount is reached for example.

Pros ✅:

  • We would get rid of the two first problems of the previous approach. Regarding the third, it could be simplified to an easier to implement version without needing to be so precise.
  • We could have persistance of questions which would allow us to be more independand about API's availability.
  • More control on errors occurring in intermediate answers for the same query.

Cons ❌:

  • More complex architecture and more resources are required even if they end up not being used.
  • We will possibly have problems regarding bandwidth if many users playing simoultaneously.
  • More complex code and service interactions.

Thanks for your opinion, it is helpful as we have not decided on which to use yet.

@jjgancfer
Copy link
Contributor

On the other hand, we could have a service that automatically stores questions and its answers on a DB which are given to the user by the backend (it is more secure than querying from the user, in my opinion). This service could be running 24/7 storing answers to questions until a certain amount is reached for example.

I agree with this approach. I think it would be the best approach, mainly because with the new focus for questions/answer we would have to use multi-threading until the database is filled up to a useful degree. We would also be able to create it in a simpler language (we could, for instance, use Django and use a Python module to query). We can also find different tools here.

Cons ❌:

  • More complex architecture and more resources are required even if they end up not being used.
  • We will possibly have problems regarding bandwidth if many users playing simultaneously.
  • More complex code and service interactions.

I think this disadvantages are worth mentioning, but I think the second one is negated by the new model of querying early for questions and storing them. Likewise, the first might be slightly negated if we simplify it (making it something like running an script, for instance), but I think that kind of solution would be sub-optimal.

@UO283615
Copy link
Contributor

I've been working on the queries and I managed to write two queries that return all the countries and capitals in the world, one in XML format with a POST request and one in JSON with a GET request. I've appended the JSON with them to this message, said JSON can be imported into Postman and the queries will be loaded there.
ASW-Queries.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend ⚙️ Backend issue
Projects
None yet
Development

No branches or pull requests

4 participants