Handling of failed requests #128

nleanba · 2024-02-05T14:41:20Z

Currently, if a SPARQL request fails, synospecies just ignores it, leading to inconsistent/missing results. This happens frequently for species for which many requests need to be made (due to lots of synonyms), e.g. T. rex.

Synospecies should handle this better.

~~Indicate failures to the user (if no retry/retry failed)~~
Retry requests (after a short delay), if it is probably due to too many requests having overloaded the server briefly
(Maybe) rate-limit the requests sent (in such a way as to not noticeably slow down the loading process)
it might alsow be worth exploring how to send multiple queries over the same connection and/or how to combine queries s.t. fewer need to be sent.

nleanba · 2024-02-05T15:26:56Z

Indicate failures to the user (if no retry/retry failed)

Not easily possible, due to the separation of synospexies and synogroup

Retry requests (after a short delay)

Experimental findings in synogroup:

Adding a check for a 502 HTTP response code and retrying after 200ms has usually resolved all failures, occasionally needing two retries
A delay of 100ms or 50ms also work, but at 50 ms it needs 2 retries for at leat one request usually, going up to 4 retries on rare occasions.

I will add the relevant code to synogroup, such that it retires (after 502) up to 4 times, starting with a 50ms gap, doubling the wait for each retry.

Downside of this approach is that it increases the number of requests made. Given that the core issue is that the server seems to struggle with receiving too many requests in too short a time frame, this is non-ideal.

@retog ideas on the last point, i.e. reducing the amount of queries sent are very welcome.

nleanba · 2024-02-05T15:40:16Z

Hmm. The delays don't work for synospecies becuase the 502 errors have no CORS headers, and thus the js code only gets an opaque error.

This should be fixable by removing the check for status code. I don't think this check was very neccesary anyways.

nleanba · 2024-03-22T08:52:49Z

Potential idea to reduce the amount of requests made:

Currently, for each taxon concept, there is one request gathering all relevant treatments.
I think it should be possible to reduce this:

We could combine them into one big query which gets all treatments of all synonyms, (and what tcs&tns they define/augment/deprecate/treat/cite), and then in JS assign the treatments to the synonyms.
- number of requests is reduced by (Number of Synonyms) - 1
- For any given treatment, either all or none of the info is present
- need to check if this is significantly slower (might depend on the endpoint)
Alternatively, we might combine the “get Treatments” step into the “get next round of synonyms” step, gathering all the treatments of a synoynym together with the synonym itself.
- maybe just get three collated porperties contatining a list of treatment uris, and gather the treatment metadata (authors, date, title, images) together with the material citations
- number of requests is reduced by (Number of Synonyms)
- at least the knwoledge of number of treatments per synonym are known sooner, which would allow for some skeleton ui to appear faster, reducing the amount that content moves around as new data is loaded (each synonym can already reserve the vertical space needed for the treatments.
- need to check if this is significantly slower (might depend on the endpoint)

nleanba · 2024-03-22T08:59:42Z

e.g SPARQL for 1.:

PREFIX cito: <http://purl.org/spar/cito/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ?tc (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites) WHERE {
  <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_miyakei_Baba_1969> ((^treat:deprecates/(treat:augmentsTaxonConcept|treat:definesTaxonConcept))|((^treat:augmentsTaxonConcept|^treat:definesTaxonConcept)/treat:deprecates))* ?tc .
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
}
GROUP BY ?tc

(Variant without the group_concats seems significantly faster, needs investigation...

or

PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT ?treat ?how ?tc ?date (group_concat(DISTINCT ?creator;separator="; ") as ?creators) WHERE {
  <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_edwardsii_Miers_1884> ((^treat:deprecates/(treat:augmentsTaxonConcept|treat:definesTaxonConcept))|((^treat:augmentsTaxonConcept|^treat:definesTaxonConcept)/treat:deprecates))* ?tc .
  ?treat (treat:augmentsTaxonConcept|treat:definesTaxonConcept|treat:deprecates|cito:cites) ?tc ;
          dc:creator ?creator ;
          ?how ?tc .
  OPTIONAL {
    ?treat treat:publishedIn/dc:date ?date .
  }
}
GROUP BY ?treat ?how ?tc ?date

nleanba · 2024-03-22T11:25:55Z

e.g SPARQL for 2., here for the deprecating synonyms:

# Get synonyms deprecating taxon and all relevant treatments
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tn ?tc ?justification (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?treat; separator="|") as ?treats)
WHERE {
  ?justification treat:deprecates <http://taxon-concept.plazi.org/id/Animalia/Sadayoshia_edwardsii_Miers_1884> ;
         (treat:augmentsTaxonConcept|treat:definesTaxonConcept) ?tc .
  ?tc <http://plazi.org/vocab/treatment#hasTaxonName> ?tn .
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth }
  OPTIONAL {
    ?treat (treat:augmentsTaxonConcept|treat:definesTaxonConcept|treat:deprecates|cito:cites) ?tc .
  }
  OPTIONAL {
    ?treat (treat:citesTaxonName|treat:treatsTaxonName) ?tn .
  }
}
GROUP BY ?tn ?tc ?justification

or with distinguishing types of treatments:

# Get synonyms deprecating taxon and all relevant treatments
PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tn ?tc (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?justification; separator="|") as ?justs) (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites) (group_concat(DISTINCT ?trtn;separator="|") as ?trtns) (group_concat(DISTINCT ?citetn;separator="|") as ?citetns)
WHERE {
  ?justification treat:deprecates <http://taxon-concept.plazi.org/id/Animalia/Munida_edwardsii_Miers_1884> ;
         (treat:augmentsTaxonConcept|treat:definesTaxonConcept) ?tc .
  ?tc <http://plazi.org/vocab/treatment#hasTaxonName> ?tn .
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth }
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
  OPTIONAL { ?trtn treat:treatsTaxonName ?tn . }
  OPTIONAL { ?citetn treat:citesTaxonName ?tn . }
}
GROUP BY ?tn ?tc

The latter is quite fast on either endpoint

nleanba · 2024-03-23T10:19:17Z

Also, turns out that Synogroup sends some requests multiple times due to a mismanagement of what synonyms are already being handled: this is fixed in plazi/synolib#9, but for some context:

https://synospecies.plazi.org/#Doryphoribius+zyxiglobus makes 966 SPARQL requests to the selected backend (for a total of 14 taxon concepts), many of which are duplicates (I found one that was sent 45 times!)

The consolodation of queries will reduce the amount of queries setn. but being smarter about not sending duplicates will probably have a much bigger impact.

(plazi/synolib#9 reduces the total number of queries sent for Doryphoribius zyxiglobus to 60.)

nleanba assigned retog and nleanba Feb 5, 2024

nleanba removed their assignment Feb 5, 2024

nleanba mentioned this issue Mar 22, 2024

Remove separate treatment queries plazi/synolib#9

Merged

nleanba closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of failed requests #128

Handling of failed requests #128

nleanba commented Feb 5, 2024 •

edited

Loading

nleanba commented Feb 5, 2024

nleanba commented Feb 5, 2024

nleanba commented Mar 22, 2024

nleanba commented Mar 22, 2024 •

edited

Loading

nleanba commented Mar 22, 2024 •

edited

Loading

nleanba commented Mar 23, 2024 •

edited

Loading

Handling of failed requests #128

Handling of failed requests #128

Comments

nleanba commented Feb 5, 2024 • edited Loading

nleanba commented Feb 5, 2024

nleanba commented Feb 5, 2024

nleanba commented Mar 22, 2024

nleanba commented Mar 22, 2024 • edited Loading

nleanba commented Mar 22, 2024 • edited Loading

nleanba commented Mar 23, 2024 • edited Loading

nleanba commented Feb 5, 2024 •

edited

Loading

nleanba commented Mar 22, 2024 •

edited

Loading

nleanba commented Mar 22, 2024 •

edited

Loading

nleanba commented Mar 23, 2024 •

edited

Loading