Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of failed requests #128

Closed
3 of 4 tasks
nleanba opened this issue Feb 5, 2024 · 6 comments
Closed
3 of 4 tasks

Handling of failed requests #128

nleanba opened this issue Feb 5, 2024 · 6 comments
Assignees

Comments

@nleanba
Copy link
Contributor

nleanba commented Feb 5, 2024

Currently, if a SPARQL request fails, synospecies just ignores it, leading to inconsistent/missing results. This happens frequently for species for which many requests need to be made (due to lots of synonyms), e.g. T. rex.

Synospecies should handle this better.

  • Indicate failures to the user (if no retry/retry failed)
  • Retry requests (after a short delay), if it is probably due to too many requests having overloaded the server briefly
  • (Maybe) rate-limit the requests sent (in such a way as to not noticeably slow down the loading process)
  • it might alsow be worth exploring how to send multiple queries over the same connection and/or how to combine queries s.t. fewer need to be sent.
@nleanba
Copy link
Contributor Author

nleanba commented Feb 5, 2024

Indicate failures to the user (if no retry/retry failed)

Not easily possible, due to the separation of synospexies and synogroup

Retry requests (after a short delay)

Experimental findings in synogroup:

  • Adding a check for a 502 HTTP response code and retrying after 200ms has usually resolved all failures, occasionally needing two retries
  • A delay of 100ms or 50ms also work, but at 50 ms it needs 2 retries for at leat one request usually, going up to 4 retries on rare occasions.

I will add the relevant code to synogroup, such that it retires (after 502) up to 4 times, starting with a 50ms gap, doubling the wait for each retry.

Downside of this approach is that it increases the number of requests made. Given that the core issue is that the server seems to struggle with receiving too many requests in too short a time frame, this is non-ideal.

@retog ideas on the last point, i.e. reducing the amount of queries sent are very welcome.

@nleanba
Copy link
Contributor Author

nleanba commented Feb 5, 2024

Hmm. The delays don't work for synospecies becuase the 502 errors have no CORS headers, and thus the js code only gets an opaque error.

This should be fixable by removing the check for status code. I don't think this check was very neccesary anyways.

@nleanba nleanba removed their assignment Feb 5, 2024
@nleanba
Copy link
Contributor Author

nleanba commented Mar 22, 2024

Potential idea to reduce the amount of requests made:

Currently, for each taxon concept, there is one request gathering all relevant treatments.
I think it should be possible to reduce this:

  1. We could combine them into one big query which gets all treatments of all synonyms, (and what tcs&tns they define/augment/deprecate/treat/cite), and then in JS assign the treatments to the synonyms.

    • number of requests is reduced by (Number of Synonyms) - 1
    • For any given treatment, either all or none of the info is present
    • need to check if this is significantly slower (might depend on the endpoint)
  2. Alternatively, we might combine the “get Treatments” step into the “get next round of synonyms” step, gathering all the treatments of a synoynym together with the synonym itself.

    • maybe just get three collated porperties contatining a list of treatment uris, and gather the treatment metadata (authors, date, title, images) together with the material citations
    • number of requests is reduced by (Number of Synonyms)
    • at least the knwoledge of number of treatments per synonym are known sooner, which would allow for some skeleton ui to appear faster, reducing the amount that content moves around as new data is loaded (each synonym can already reserve the vertical space needed for the treatments.
    • need to check if this is significantly slower (might depend on the endpoint)

@nleanba
Copy link
Contributor Author

nleanba commented Mar 22, 2024

e.g SPARQL for 1.:

PREFIX cito: <http://purl.org/spar/cito/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ?tc (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites) WHERE {
  <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_miyakei_Baba_1969> ((^treat:deprecates/(treat:augmentsTaxonConcept|treat:definesTaxonConcept))|((^treat:augmentsTaxonConcept|^treat:definesTaxonConcept)/treat:deprecates))* ?tc .
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
}
GROUP BY ?tc

(Variant without the group_concats seems significantly faster, needs investigation...

or

PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ?treat ?how ?tc ?date (group_concat(DISTINCT ?creator;separator="; ") as ?creators) WHERE {
  <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_edwardsii_Miers_1884> ((^treat:deprecates/(treat:augmentsTaxonConcept|treat:definesTaxonConcept))|((^treat:augmentsTaxonConcept|^treat:definesTaxonConcept)/treat:deprecates))* ?tc .
  ?treat (treat:augmentsTaxonConcept|treat:definesTaxonConcept|treat:deprecates|cito:cites) ?tc ;
          dc:creator ?creator ;
          ?how ?tc .
  OPTIONAL {
    ?treat treat:publishedIn/dc:date ?date .
  }
}
GROUP BY ?treat ?how ?tc ?date

@nleanba
Copy link
Contributor Author

nleanba commented Mar 22, 2024

e.g SPARQL for 2., here for the deprecating synonyms:

# Get synonyms deprecating taxon and all relevant treatments
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tn ?tc ?justification (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?treat; separator="|") as ?treats)
WHERE {
  ?justification treat:deprecates <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_edwardsii_Miers_1884> ;
         (treat:augmentsTaxonConcept|treat:definesTaxonConcept) ?tc .
  ?tc <http://plazi.org/vocab/treatment#hasTaxonName> ?tn .
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth }
  OPTIONAL {
    ?treat (treat:augmentsTaxonConcept|treat:definesTaxonConcept|treat:deprecates|cito:cites) ?tc .
  }
  OPTIONAL {
    ?treat (treat:citesTaxonName|treat:treatsTaxonName) ?tn .
  }
}
GROUP BY ?tn ?tc ?justification

or with distinguishing types of treatments:

# Get synonyms deprecating taxon and all relevant treatments
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tn ?tc (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?justification; separator="|") as ?justs) (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites) (group_concat(DISTINCT ?trtn;separator="|") as ?trtns) (group_concat(DISTINCT ?citetn;separator="|") as ?citetns)
WHERE {
  ?justification treat:deprecates <http://taxon-concept.plazi.org/id/Animalia/Munida_edwardsii_Miers_1884> ;
         (treat:augmentsTaxonConcept|treat:definesTaxonConcept) ?tc .
  ?tc <http://plazi.org/vocab/treatment#hasTaxonName> ?tn .
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth }
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
  OPTIONAL { ?trtn treat:treatsTaxonName ?tn . }
  OPTIONAL { ?citetn treat:citesTaxonName ?tn . }
}
GROUP BY ?tn ?tc

The latter is quite fast on either endpoint

@nleanba
Copy link
Contributor Author

nleanba commented Mar 23, 2024

Also, turns out that Synogroup sends some requests multiple times due to a mismanagement of what synonyms are already being handled: this is fixed in plazi/synolib#9, but for some context:

https://synospecies.plazi.org/#Doryphoribius+zyxiglobus makes 966 SPARQL requests to the selected backend (for a total of 14 taxon concepts), many of which are duplicates (I found one that was sent 45 times!)

The consolodation of queries will reduce the amount of queries setn. but being smarter about not sending duplicates will probably have a much bigger impact.

(plazi/synolib#9 reduces the total number of queries sent for Doryphoribius zyxiglobus to 60.)

@nleanba nleanba closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants