Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input needed on edamverify architecture #22

Open
joncison opened this issue Mar 19, 2020 · 0 comments
Open

Input needed on edamverify architecture #22

joncison opened this issue Mar 19, 2020 · 0 comments
Assignees
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@joncison
Copy link
Contributor

@hmenager @albangaignard

My gut feeling is that most of the checks will need a combination of SPARQL and Python logic to implement (efficiently at least, or at all).

I'd originally imagined the EDAM verification would be a series of QC steps - i.e. a set of invoked scripts or queries - which would return 0 (no error) or 1, 2 or 3 (INFO, WARN or ERROR) with ERROR causing a build fail. But now I'm not sure ...

Just loading EDAM.owl into a graph (without any subsequent processing) takes under a minute on my (very fast) workstation. But if that's is multiplied by 30 (now, in future more) then this doesn't scale so well. But is this OK ?

If it's not OK, we might instead need a single overarching script (denoted as src/edamverify.py here which returns 0-3 (as above) and which invokes the individual QC checks - collating their return values & validation outputs into a single error file. That would allow a single load function - but implies either (or perhaps a combination of):

  1. a monolithic Juypter notebook
  2. a modular python structure / library

My other gut feeling is that (especially for queries that aren't amenable to SPARQL) we'll need convenience functions which can be reused by multiple checks. Which then leads us into the territory of writing an EDAM Python library - which is something I've been mulling for a while and could be extremely useful in it's own right.

What do you think? I will continue experimenting to see what can be SPARQLed but before investing very heavily in time, I want us to discuss and agree a sensible architecture that is efficient in the long term.

opinions please @hmenager @albangaignard @hansioan @matuskalas @veitveit

Cheers!

@joncison joncison added help wanted Extra attention is needed question Further information is requested labels Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants