Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AG: class tree visualization issues with AG backend #264

Open
alexskr opened this issue Nov 29, 2022 · 12 comments
Open

AG: class tree visualization issues with AG backend #264

alexskr opened this issue Nov 29, 2022 · 12 comments
Assignees

Comments

@alexskr
Copy link
Member

alexskr commented Nov 29, 2022

A number of ontologies have class tree visualization problems when BioPortal runs with AllegroGraph backend. The preferred name is missing so the class tree has blank entries.

image

API shows perfLabel: null
image

@alexskr
Copy link
Member Author

alexskr commented Nov 29, 2022

image

@alexskr
Copy link
Member Author

alexskr commented Nov 30, 2022

image

@alexskr
Copy link
Member Author

alexskr commented Nov 30, 2022

updated ncbo_cron to the latest codebase in staging env and reprocessed ontologies. Missing perfLabel for purl.obolibrary.org/obo/GO_0008150 in GO ontology is fixed.

@graybeal
Copy link
Contributor

labels still missing in Mondo, e.g., https://stage.bioontology.org/ontologies/MONDO/?p=classes&conceptid=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMONDO_0005200 ('viral dilated cardiomyopathy')

@graybeal
Copy link
Contributor

see also ncbo/ontologies_linked_data#137 for example from production (fixed by re-parsing in that case)

@graybeal
Copy link
Contributor

graybeal commented Nov 30, 2022

Note that the MONDO case does not have prefLabels in the original ontology, it only has regular labels throughout the XML/RDF file. I believe the rdf:label is displayed as the Preferred Label annotation when there is no prefLabel.

@graybeal
Copy link
Contributor

graybeal commented Dec 5, 2022

In case this gives any clue: If I click on the missing label (highlighted under herpes zoster in right side of screen shot), the page may not resolve (big WAITING box); but if I hit reload in the browser I get this display.

image

After further testing: If I click on the empty label location, I get LOADING CLASS spinner, nothing happens, and the network trace reads per the first gif below (a 500 error on that class).

If I then hit Reload, the page refreshes to display the item, the network trace shows a normal 200 response. The only visible difference in the call is there is no callback=load in the second case.

image

@graybeal
Copy link
Contributor

graybeal commented Dec 7, 2022

I think we found the responsible code for this problem. Well, we have a good theory, anyway. It's all in Slack for now, I'll let Misha decide what is worth summarizing in this thread.

@mdorf
Copy link
Member

mdorf commented Dec 7, 2022

I was able to identify the cause of this issue. It has to do with the fact that AllegroGraph does not impose a default ordering of records for paginated results, which results in duplicate values to be included when iterating over the entire record set:

SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/VTO/submissions/14> WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 0 LIMIT 2500

While each run of this query does not produce duplicates, the TOTAL run over the entire graph does. Because of these duplicates, many of the legitimate classes are omitted and are left without a label.

The attached file contains a good illustration of the issue. It includes both the queries run as well as the results of each run right below it: vto_id_queries_with_results_run1.txt

If you grep for the term VTO_0009953, you will see that it’s returned by two of the queries from the set:

SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/VTO/submissions/14> WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 2500 LIMIT 2500

and

SELECT DISTINCT ?id FROM <http://data.bioontology.org/ontologies/VTO/submissions/14> WHERE { ?id a <http://www.w3.org/2002/07/owl#Class> . } OFFSET 102500 LIMIT 2500

4store does the internal ordering correctly, so we’ve never encountered this issue until AG. Because the internal ordering of records in AG is not deterministic, you end up getting random labels missing from one run to the next.

Here is another run to compare to the first one with different duplicates and different missing terms:
vto_id_queries_with_results_run2.txt

Per the selected answer in this StackOverflow thread: https://stackoverflow.com/questions/55146844/offset-in-sparql,

[In a triple store] Rows may be delivered in any order, and this ordering may change from query-to-query, if you don't include an ORDER BY. This can mean that multiple queries with different OFFSET may not get you all rows, and may deliver duplicate rows, when all the partial result sets are combined. So -- anytime you're using OFFSET and/or LIMIT, it's best practice to also use an ORDER BY.

Based on this, a possible solution should be adding the ORDER BY clause to the query:

ORDER BY ?id LIMIT 10000 OFFSET 120000

This, however, may come at a performance cost.

mdorf added a commit to ncbo/ontologies_linked_data that referenced this issue Mar 1, 2023
mdorf added a commit to ncbo/goo that referenced this issue Mar 1, 2023
@mdorf mdorf reopened this Apr 19, 2023
@mdorf
Copy link
Member

mdorf commented Jun 1, 2023

I am working with the Franz developers on improving the performance of the ORDER BY clause in AllegroGraph. As of now, the performance degradation experienced as a result of adding ORDER BY is unacceptable.

@mdorf
Copy link
Member

mdorf commented Jun 1, 2023

  1. It appears the duplicates are ALL coming from the first query with OFFSET 0
  2. ALL results from the first query with OFFSET 0 are duplicated in the subsequent queries (2500 duplicates)
  3. There exist NO other duplicates (consequence of 1 & 2)
  4. There are no “triplicates” or “fourplicates” or any other multiples; just duplicates (consequence of 1 & 2)

@alexskr
Copy link
Member Author

alexskr commented Nov 30, 2023

resolved but needs to be confirmed with AllegroGraph v7.4 when it comes up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants