Input needed on edamverify architecture #22

joncison · 2020-03-19T09:15:38Z

My gut feeling is that most of the checks will need a combination of SPARQL and Python logic to implement (efficiently at least, or at all).

I'd originally imagined the EDAM verification would be a series of QC steps - i.e. a set of invoked scripts or queries - which would return 0 (no error) or 1, 2 or 3 (INFO, WARN or ERROR) with ERROR causing a build fail. But now I'm not sure ...

Just loading EDAM.owl into a graph (without any subsequent processing) takes under a minute on my (very fast) workstation. But if that's is multiplied by 30 (now, in future more) then this doesn't scale so well. But is this OK ?

If it's not OK, we might instead need a single overarching script (denoted as src/edamverify.py here which returns 0-3 (as above) and which invokes the individual QC checks - collating their return values & validation outputs into a single error file. That would allow a single load function - but implies either (or perhaps a combination of):

a monolithic Juypter notebook
a modular python structure / library

My other gut feeling is that (especially for queries that aren't amenable to SPARQL) we'll need convenience functions which can be reused by multiple checks. Which then leads us into the territory of writing an EDAM Python library - which is something I've been mulling for a while and could be extremely useful in it's own right.

What do you think? I will continue experimenting to see what can be SPARQLed but before investing very heavily in time, I want us to discuss and agree a sensible architecture that is efficient in the long term.

opinions please @hmenager @albangaignard @hansioan @matuskalas @veitveit

Cheers!

The text was updated successfully, but these errors were encountered:

joncison assigned joncison, albangaignard and hmenager Mar 19, 2020

joncison added help wanted Extra attention is needed question Further information is requested labels Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input needed on edamverify architecture #22

Input needed on edamverify architecture #22

joncison commented Mar 19, 2020

Input needed on edamverify architecture #22

Input needed on edamverify architecture #22

Comments

joncison commented Mar 19, 2020