Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a script to load SSSOM mappings to AgroPortal #265

Open
jonquet opened this issue Apr 13, 2022 · 14 comments
Open

Implement a script to load SSSOM mappings to AgroPortal #265

jonquet opened this issue Apr 13, 2022 · 14 comments

Comments

@jonquet
Copy link
Contributor

jonquet commented Apr 13, 2022

Before doing #255 we will have to create a script to load SSOM mappings into AgroPortal.
This is related to D2KAB WP2.
The correspondences between the two format have been discussed and captured (Clement's note) with @saubin78.

CCing @graybeal @matentzn and @cmungall for information and followup.

@jonquet
Copy link
Contributor Author

jonquet commented Apr 13, 2022

Exemple of mappings in AgroPortal :
http://data.agroportal.lirmm.fr/mappings?ontologies=ANAEETHES,AGROVOC&display_links=false&display_context=false&include=all

SSSOM specification: https://mapping-commons.github.io/sssom/Mapping/

AgroPortal SSSOM
classes subject_id
classes object_id
source (mapping) match_type
comment comment
source_name creator_label or author_label
source (process) mapping_provider
relation predicate_id
source_contact_info NA
creator mapping_tool or author_id or creator_id
name
date mapping_date

@saubin78
Copy link

saubin78 commented Apr 27, 2022

Hi.
Thanks @matentzn for informing on the change from match_type to mapping_justification.

I have looked at @jonquet 's proposal for Agroportal to some and I'd like to suggest some modifications and additional elements. :

AgroPortal SSSOM AP example from here comments
classes/id subject_id http://opendata.inra.fr/anaeeThes/c2_2787 criterion needed for choosing which one is subject/object
classes/id object_id http://aims.fao.org/aos/agrovoc/c_36549 criterion needed for choosing which one is subject/object
source (mapping) match_type --> mapping_justification REST possible values : REST; LOOM; SameURI; CUI ?
process/comment comment Generated with the Ontology Mapping Harvest Tool - v.1.3 - Agroportal Project - LIRMM - 12/10/2018 15:08 - FR
source_name ANAEETHES this would correspond to subject_source if it were a URI instead of a string
process/source mapping_provider http://data.agroportal.lirmm.fr/ontologies/ANAEETHES Not bad, this is a URL...
process/relation predicate_id http://www.w3.org/2004/02/skos/core#exactmatch Great, this is a URI !
source_contact_info NA null
process/creator creator_id http://data.agroportal.lirmm.fr/users/jonquet Great, this is a URI !
process/name REST Mapping what is the difference with source (mapping) ?
process/date mapping_date 2018-10-15T12:12:53+02:00 clarification needed : is the date in AP when the mapping was created OR loaded in AP ?
collection/id mapping_set_id http://data.agroportal.lirmm.fr/rest_backup_mappings/3b4ea420-b292-0136-8446-525400026749

@jonquet you can have a look at the whole mapping in Sonia's spreadsheet used to test SSSOM on D2KAB's mapping use cases (3d tab).

@matentzn
Copy link

I remember now, you made that issue here:
mapping-commons/sssom#139

From a cursory look I think most elements look good. These are questionable:

source (mapping) --> this does not map to justification.. Are there more examples?
source_name

source_name should be:

mapping_provider if the source indicates "where the mapping was pulled from"
subject_source if the source indicates which terminology the source id lives in.
etc..

process/name --> mapping_set_title?

@saubin78
Copy link

Could you please clarify the distinction between

  • mapping_provider if the source indicates "where the mapping was pulled from"
  • subject_source if the source indicates which terminology the source id lives in.

@matentzn
Copy link

Moved your question here: mapping-commons/sssom#202

@syphax-bouazzouni
Copy link
Contributor

syphax-bouazzouni commented Jun 28, 2022

Todo

  • Have testing data (@saubin78 Could you please give the TSV files done by URGI)
  • Validate the Agroportal/SSSOM correspondence table
  • How to keep the orientation information of the relationship (subject/object)? (using the source_name ???)
  • Evolve AP mapping types (REST, LOOM...) to integrate SSSOM mapping_justification (Lexical, HumanCreated...) (defined in the code here )

Resources

@saubin78
Copy link

@syphax-bouazzouni please use this TEST file : https://docs.google.com/spreadsheets/d/1EpttUuJNWmp2up4SXDcJrc8mtlvZhyjs/edit?usp=sharing&ouid=101729640835482598083&rtpof=true&sd=true

Note that

  • these are not the final data validated by Michael & Claire (I have troubles with OpenRefine so I could not produce the latest version)
  • I use the "Embedded mode (default)" described here : https://mapping-commons.github.io/sssom/spec/#embedded-mode-default
  • I am not 100% sure about the order of the columns as the specs are not completely clear. One way to check may be to use the python toolkit to produce the rdf serialisation. I have not been able to test it on my laptop yet.
  • I did not include all the available metadata elements from the specification. Tell me if you want me to do so.
  • I put some metadata elements at a global level, e.g. creator_id while leaving others as columns though I could have done differently. For example, as subject_source has always the same value, I could have moved it as a global metadata. This is to illustrate that it can change from one case to another. You have to be able to parse metadata element either as global metadata or column (local to mapping)

I think it's a good start.

@matentzn
Copy link

I am not 100% sure about the order of the columns as the specs are not completely clear. One way to check may be to use the python toolkit to produce the rdf serialisation. I have not been able to test it on my laptop yet.

Columns in sssom toolkit are sorted by whatever the spec prescribes: https://github.com/mapping-commons/sssom/blob/master/src/sssom_schema/schema/sssom_schema.yaml#L473 - but only when using the sssom sort command (something happened to the CLI docs (they should be automatically deployed)).

You have to be able to parse metadata element either as global metadata or column (local to mapping)

If you see any case that is not permitted by the spec but you think is useful (where a mappings_set element goes into a column or a mapping element goes in the mappings_set), let us know.

There is a new feature in sssom py (not released yet on pypi but merged on master) which is sssom validate. If you want to try this, it will provide you with a much more rigorous validation process of your sssom tables than was previously possible.

@jonquet
Copy link
Contributor Author

jonquet commented Jul 1, 2022

After discussion and based on the examples we are suggesting to implement the following correspondences:

AgroPortal SSSOM
classes Mapping : subject_id
classes Mapping : object_id
comment Mapping : comment
relation Mapping : predicate_id
source (process) Mapping : mapping_justification
creator NA (fixed value: http://data.agroportal.lirmm.fr/users/mappingadmin
source (mapping) NA (this is a fixed property in OntoPortal. Will always be "REST")
source_name MappingSet : mapping_set_id
source_contact_info MappingSet : creator_id
name MappingSet : mapping_set_description
date ( we need to double check that we can override this value and that it is not the date when the mappings has been uploaded to AgroPortal) Mapping : mapping_date OR MappingSet : mapping_date

To address the problem of the direction of the mapping that is lost; we shall work on adding in the Mapping model in OntoPortal new attributes to encode the subject_source and object_source respectively subject_source_id and _object_source_id

Then the loading script will have to resolve the ontology URIs (stored in the 2 new fields) to the OntoPortal IDs.

@jonquet
Copy link
Contributor Author

jonquet commented Jul 1, 2022

@matentzn @jgraybeal We are thinking to implement the "SSSOM2OntoPortal converter" into the sssom-py tool here :
https://mapping-commons.github.io/sssom-py/examples.html#convert-command

Note the converter will be generic to produce a JSON output compliant with any OntoPortal instance however, at loading time we will have another loading script that will resolve the ontology IDs to the local ontology IDs (acronyms) in the portal concerned.

What do you think?

@matentzn
Copy link

matentzn commented Jul 1, 2022

That would be amazing. Both ways would probably be even more amazing :)

@graybeal
Copy link

graybeal commented Jul 1, 2022

I'm having a little trouble fully grokking the directionality of the correspondences, I think I may need a walkthrough at some point. And of course your mapping model is not the same as BioPortal's any more, or the other OntoPortals. So maybe that's not an issue, but I think it deserves a bit of thought. As does the 'figure out which ontology to use' step.

@syphax-bouazzouni
Copy link
Contributor

syphax-bouazzouni commented Jul 4, 2022

Hi @graybeal,

To give you more details about the directionality issue, The SSSOM mappings models are directional from the start node (subject_id) to the end node (object_id). Whereas in the ontoportal model we don't it is just a link between two classes and with it, we can't figure out which class is the origin. The solution that we propose is :

Then for our model that is different from the base one, I think it will work because we are still backward compatible with the base model and the import module that we will develop (it is yet developed but it needs some changes) will be generic to work in the base model

Hi @matentzn you can find a first (working) version of the "SSSOM2OntoPortal converter" here :

Finally, we will

@jonquet
Copy link
Contributor Author

jonquet commented Jul 5, 2022

That would be amazing. Both ways would probably be even more amazing :)

Hi @matentzn We will be working on this too a bit later. Not in the sssom-py tool but directly in our API (using a proxy functionality that we have already sued to produce specific formats). We need to do the conversion from OntoPortal format to SSSOM on our side because some information (e.g., class names) are not in the mapping itself but can be populated by the portal.

And of course your mapping model is not the same as BioPortal's any more, or the other OntoPortals.

Hi @graybeal This is actually not the case. AgroPortal's mapping representation is the same that OntoPortal. We have just added a couple of feature sin the past to host "external" mappings in AgroPortal too (i.e., mappings which only one of the 2 classes mapped is in AgroPortal).

@syphax-bouazzouni Amazing reactivity to develop the converter, great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants