Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link to ODIS via JSON-LD/schema.org #115

Open
pbuttigieg opened this issue Mar 15, 2024 · 14 comments
Open

Link to ODIS via JSON-LD/schema.org #115

pbuttigieg opened this issue Mar 15, 2024 · 14 comments

Comments

@pbuttigieg
Copy link

pbuttigieg commented Mar 15, 2024

Following on from discussions at BICIKL and subsequent techical calls, we're exploring how to expose ocean-relevant content (including coastal zones) to the IOC-UNESCO Ocean Data and Information System (ODIS)

Once the initial material has been reviewed by the Traitbank team, we can have another technical call to go through a few examples and create some reference JSON-LD documents. Following that, and the creation of a sitemap registered in ODISCat, we can then begin testing the harvest and dissemination to the ODIS Federation.

This issue will be cross-linked to a counterpart on the odis-arch tracker

@pbuttigieg
Copy link
Author

Pinging this issue - May is upon us

@myrmoteras
Copy link
Contributor

@pbuttigieg let's plan in the week of May 22 to discuss. Can you send @gsautter some example input you would like to have so we can study and have an informed discussion?

@pbuttigieg
Copy link
Author

@myrmoteras seems the project schedules didn't align, but ODIS is now sustained under UNESCO, so we have more flexibility

We've created a getting started guide that may be enough for your developers to create the initial link

https://book.oceaninfohub.org/gettingStarted.html

@myrmoteras
Copy link
Contributor

@pbuttigieg @gsautter can we meet sometimes tomorrow ater 4pm or Friday afternoon after 3pm and discuss next steps?

@gsautter
Copy link

gsautter commented Jun 12, 2024 via email

@myrmoteras myrmoteras changed the title Link to ODIS via JSON-LD/schema.org Link to OBIS via JSON-LD/schema.org Jul 5, 2024
@myrmoteras
Copy link
Contributor

myrmoteras commented Jul 9, 2024

Hi Donat,
Let’s formulate a standard answer to these sort of requests.

The questions regarding duplication is increasing. The answer is not that it doesn’t matter and that tools can de-douplicate, but that our GBIF record is different that it has a link to a treatment, and publications. With other words, this shows how the data is being used. At the same time there is a development to create a digital specimen that includes all the links to the various representation of a the physical or original observation.

In a sense, let me handle these sorts of requests, that is forward them to me and I will take care of it.
OK, will do. I do understand the difference, but kind of failed to explain it that way ... sorry. I'll forward the next such request to you.

However, what's specifically strange about this one is that the specimen with the georeference is not the one the one the treatment is about ... the latter specimen is only mentioned in passing, of sorts. How should we mark something like this, in general? See http://tb.plazi.org/GgServer/html/03F0943BFFCF4D070BF8FDEBFA9C1675

All the best,
Guido

From: Guido Sautter [email protected]
Sent: Tuesday, May 19, 2020 3:33 PM
To: Horton, Tammy [email protected]
Cc: Donat Agosti [email protected]
Subject: Re: Plazi georef incorrect?

EXTERNAL SENDER

Hi Tammy,
I’m writing to ask about the following entry in GBIF:

https://www.gbif.org/dataset/49a11228-6c4d-478f-b958-52610eaab951

Which is one of my papers. The entry includes only one georeferenced specimen – which is not correct. The paper details the locations of all the samples examined and provides a full station list and map. The georeferenced provided is for a mention of a specimen not covered in our paper! How do we correct this?
well, the treatment does kind of cite that specimen, the location given as a range of coordinates. And as such, we marked it. In an automated process, there is pretty little we can do to tell whether or not a specimen cited complete with a pair of coordinates is the one the treatment actually refers to, all the more so if the treatment subject specimen comes without coordinates and other numeric detail data.

We can remove this materials citation if you want, but, treatment subject or not, it is a georeferenced specimen that making available as data is surely worthwhile.

I would also like to ask about duplication that may be occurring through these uploads. I am currently working to prepare datasets for OBIS/GBIF of specimens held in the Discovery Collections – what will happen if I upload the data on these specimens to OBIS? We will be creating duplicates. I’m sure you will have come across this for other museums, sharing specimen data to GBIF.
Actually, you are the first author voicing concerns about duplication. I am not sure if GBIF does any duplicate removal or even reconciliation, especially in the absence of specimen codes, but reality is that very few authors make their specimens available as machine processable data, which is exactly why we extract said data from publications.

Also, I would not worry about avoiding duplicates all too much, considering that there are catalogs like WoRMS and ITIS that already have overlapping occurrence data and still are both individual datasets in GBIF. And ultimately, if the detail data of any two occurrence records match up exactly, consumer applications can eliminate the duplicates. Better to have occurrence data available to the public, if at the risk of duplication, than to have it locked in publications altogether.

Kind regards,
Guido Sautter

@myrmoteras
Copy link
Contributor

discussion from 20240709 Notes

@gsautter
Copy link

gsautter commented Jul 9, 2024

pbuttigieg added a commit to iodepo/odis-in that referenced this issue Jul 10, 2024
pbuttigieg added a commit to iodepo/odis-in that referenced this issue Jul 10, 2024
@pbuttigieg pbuttigieg changed the title Link to OBIS via JSON-LD/schema.org Link to ODIS via JSON-LD/schema.org Jul 10, 2024
@pbuttigieg
Copy link
Author

pbuttigieg commented Jul 18, 2024

@gsautter @myrmoteras

Here's a an initial template - based on this PR iodepo/odis-in#25 - for Treatments as Datasets, based on one of your examples provided.

Note that the schema:citation property is used heavily. This property is used to reference other CreativeWorks that are related to the one being described. The material, treatment, and article citations (including figures, tables, etc) can all go in there as typed nodes, which should allow you to add the metadata you need.

Note I left some additional properties in there just in case those are useful for other treatments. You can of course delete these if not relevant.

I think this is enough to begin sharing TreatmentBank's content via ODIS and related systems. If you have any questions please ping me in iodepo/odis-in#25.

@pbuttigieg
Copy link
Author

pbuttigieg commented Jul 18, 2024

@myrmoteras @gsautter

If you wish to be more literal about taxonomic concepts as claims, you can use this pattern to associate the Claim that a taxon is present with Treatments wherein the claim appears.

If each treatment's JSON-LD has its own @id (a URL pointing to the JSON-LD file describing the treatment), then you can reference the treatments each claim refers to by just pointing to the @id:

       "appearance": [
        {
          "@id": "https://treatment.plazi.org/json-ld/url-to-a-json-ld-representation-of-some-treatments-metadata/"
        },
        ....
]

if the JSON-LD files for Treatments don't have their own URLs/IRIs, then you could include metadata about them verbatim:

       "appearance": [
       {
          "@type": "Dataset",
          "@id": "URL:  Optional. A URL that resolves to *this* JSON-LD document, NOT the URL of the Dataset that this JSON-LD document describes. To link to the Dataset itself, please use 'url' and/or 'identifier')",
          "name": "Maera gujaratensis, Thacker & Myers & Trivedi, 2024",
          "description": "A dataset representing a TreamentBank Treatment record of Maera gujaratensis, see https://plazi.org/treatmentbank/what-treatment/",
          "url": "https://treatment.plazi.org/id/423BD146-4079-EE78-44C5-FE28FE592E10",
          "identifier": "423BD146-4079-EE78-44C5-FE28FE592E10"
        },
        {
          "@type": "Dataset",
          "@id": "URL:  Optional. A URL that resolves to *this* JSON-LD document, NOT the URL of the Dataset that this JSON-LD document describes. To link to the Dataset itself, please use 'url' and/or 'identifier')",
          "name": "Maera gujaratensis, Smith & Jones & Li, 2024",
          "description": "A dummy dataset representing another TreamentBank Treatment record of Maera gujaratensis, see https://plazi.org/treatmentbank/what-treatment/",
          "url": "https://treatment.plazi.org/id/MADE-UP-423BD146-4079-EE78-44C5-FE28FE592E10",
          "identifier": "MADE-UP-423BD146-4079-EE78-44C5-FE28FE592E10"
        }
        ....
]

@gsautter
Copy link

Here's a an initial template - based on this PR iodepo/odis-in#25 - for Treatments as Datasets, based on one of your examples provided.

This is great for modeling treatments as a whole, thanks.

Question remains how to specifically embed the materials citations within this framework, though, and yet more specifically any given coordinates, especially since my understanding was for the main goal to be sharing occurrence data.

Note that the schema:citation property is used heavily. This property is used to reference other CreativeWorks that are related to the one being described. The material, treatment, and article citations (including figures, tables, etc) can all go in there as typed nodes, which should allow you to add the metadata you need.

I see ... good to have an example for a figure citation.

Not sure CreativeWork is the appropriate @type for a table, though, as the properties are very similar to those of a figure, and they usually also have their own URLs ... or could I simply take the figure citation, replace the value of @type, and then put in the respective data for a table?

Also, how to include the DOI of a cited publication? And which @type to use if the cited publication is not an Article? Is there a more generic term, maybe even CreativeWork?

@gsautter
Copy link

Working on the template further, I'm getting the impression the basis for the example was the HTML page proper ... in case it helps, there is the underlying XML document, which holds a good few more details than we can conveniently show in the HTML: https://tb.plazi.org/GgServer/xml/423BD1464079EE7844C5FE28FE592E10 (the HTML is produced from this XML via XSLT).

@pbuttigieg
Copy link
Author

@gsautter

Question remains how to specifically embed the materials citations within this framework, though, and yet more specifically any given coordinates, especially since my understanding was for the main goal to be sharing occurrence data.

Looking at an example of a materialsCitation:

<materialsCitation id="7AFA6A0D4079EE7D451DFD8DFE982B22" ID-GBIF-Occurrence="4903570308" collectionCode="LFSC, R" latitude="22.1985" longLatPrecision="7" longitude="72.1082" pageId="11" pageNumber="574" specimenCount="1" typeStatus="holotype">

<typeStatus id="1529DEF24079EE7D451DFD8DFE1E2B46" box="[335,439,527,553]" pageId="11" pageNumber="574" type="holotype">Holotype</typeStatus>

male,

<quantity id="0D6ACDB54079EE7D4654FD8DFDE32B45" box="[518,586,527,554]" metricMagnitude="-3" metricUnit="m" metricValue="8.0" pageId="11" pageNumber="574" unit="mm" value="8.0">8 mm</quantity>

, (

<collectionCode id="AC83F8954079EE7D460DFD8DFD0C2B46" box="[607,677,527,553]" pageId="11" pageNumber="574">LFSC</collectionCode>

.ZRC-209) Gopnath (

<geoCoordinate id="AFA606974079EE7D47C5FD8DFBB92B45" box="[919,1040,527,554]" degrees="22.1985" direction="north" orientation="latitude" pageId="11" pageNumber="574" precision="5" value="22.1985">22.1985 N</geoCoordinate>

<geoCoordinate id="AFA606974079EE7D404BFD8DFB392B46" box="[1049,1168,527,553]" degrees="72.1082" direction="east" orientation="longitude" pageId="11" pageNumber="574" precision="5" value="72.1082">72.1082 E</geoCoordinate>

), 25 March, 2023, coll. D.

<collectionCode id="AC83F8954079EE7D44E0FDB1FF612B22" box="[178,200,563,589]" country="Chile" name="Departamento de Geologia, Universidad de Chile" pageId="11" pageNumber="574">R</collectionCode>

. Thacker

</materialsCitation>

It looks like you can either treat this as a Dataset about a physical sample (as a value of isBasedOn) or you can try to link the material thing to the Treatment itself.

Perhaps we need a call for this, as it's a little metaphysically fraught.

Note that the schema:citation property is used heavily. This property is used to reference other CreativeWorks that are related to the one being described. The material, treatment, and article citations (including figures, tables, etc) can all go in there as typed nodes, which should allow you to add the metadata you need.

I see ... good to have an example for a figure citation.

Not sure CreativeWork is the appropriate @type for a table, though, as the properties are very similar to those of a figure, and they usually also have their own URLs ... or could I simply take the figure citation, replace the value of @type, and then put in the respective data for a table?

A table is certainly a CreativeWork. For figures, you could used the ImageObject subtype.

Yes, for tables (which don't have a good subtype in schema.org), you can use the CreativeWork type and use the same properties (or any other property from CreativeWork).

Also, how to include the DOI of a cited publication? And which @type to use if the cited publication is not an Article? Is there a more generic term, maybe even CreativeWork?

You can use the identifier property for DOIs (some examples already provided) at the right level. Yes, you can use CreativeWork for uncertain cases. Document or DigitalDocument may be useful too.

@gsautter
Copy link

gsautter commented Aug 8, 2024

It looks like you can either treat this as a Dataset about a physical sample (as a value of isBasedOn) or you can try to link the material thing to the Treatment itself.

Perhaps we need a call for this, as it's a little metaphysically fraught.

Frankly, I'm not philosophical at all as to what exactly we model these things as, just want a representation that will allow data consumers to make the most of what we send them ... and you know the consumers way better than I do, so what would you recommend?

Ultimately, such an occurrence states the presence of a physical object (specimen) identified to be an instance of some taxon, in a certain place at a certain time, and then there is additional information, e.g. who made the observation, and where the specimen was taken (if it's collected, which rhinos and elephants and the like usually are not these days) or where the record of the observation (e.g. a picture from a camera trap) can be found ... it really takes various forms, and we could get very philosophical about it, but that sure isn't my goal ...
Its the the presence of the specimen at a given time in a given place that counts the most, so we should focus on modeling that, and keep in mind that we as well want to accommodate other attributes (some call them properties, to me it's a value in a field with a certain label, not peculiar about terminology at all).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants