-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to create a ODIS node for (Harvard) Dataverse and searching for all ocean data sets in it via ODIS? #481
Comments
ping @pdurbin , @atrisovic |
the JSON {
"@context": {
"@language": "en",
"@vocab": "https://schema.org/",
"citeAs": "cr:citeAs",
"column": "cr:column",
"conformsTo": "dct:conformsTo",
"cr": "http://mlcommons.org/croissant/",
"rai": "http://mlcommons.org/croissant/RAI/",
"data": {
"@id": "cr:data",
"@type": "@json"
},
"dataType": {
"@id": "cr:dataType",
"@type": "@vocab"
},
"dct": "http://purl.org/dc/terms/",
"examples": {
"@id": "cr:examples",
"@type": "@json"
},
"extract": "cr:extract",
"field": "cr:field",
"fileProperty": "cr:fileProperty",
"fileObject": "cr:fileObject",
"fileSet": "cr:fileSet",
"format": "cr:format",
"includes": "cr:includes",
"isLiveDataset": "cr:isLiveDataset",
"jsonPath": "cr:jsonPath",
"key": "cr:key",
"md5": "cr:md5",
"parentField": "cr:parentField",
"path": "cr:path",
"recordSet": "cr:recordSet",
"references": "cr:references",
"regex": "cr:regex",
"repeated": "cr:repeated",
"replace": "cr:replace",
"sc": "https://schema.org/",
"separator": "cr:separator",
"source": "cr:source",
"subField": "cr:subField",
"transform": "cr:transform",
"wd": "https://www.wikidata.org/wiki/"
},
"@type": "sc:Dataset",
"conformsTo": "http://mlcommons.org/croissant/1.0",
"name": "Ocean Heat Content",
"url": "https://doi.org/10.7910/DVN/CAGYQL",
"creator": [
{
"@type": "Person",
"givenName": "Gael",
"familyName": "Forget",
"affiliation": {
"@type": "Organization",
"name": "Massachusetts Institute of Technology"
},
"name": "Forget, Gael"
}
],
"description": "Estimates (OCCA2, ECCO4) of global ocean heat content (OHC) anomaly from 2004-2006 climatology. ECCO4 is a closed heat budget estimate. ECCO4 release 5 is used here that covers 1992-2019. OCCA2 was derived by 1. extending ECCO4 (r2) to 1980-2022 and 2. adding a gridded adjustment to Argo over 2004-2022. The 2004-2006 climatologies were subtracted separately before combining anomalies over 1992-2019.",
"keywords": [
"Earth and Environmental Sciences",
"ocean",
"climate",
"warming"
],
"license": "http://creativecommons.org/publicdomain/zero/1.0",
"datePublished": "2024-03-07",
"dateModified": "2024-03-08",
"includedInDataCatalog": {
"@type": "DataCatalog",
"name": "Harvard Dataverse",
"url": "https://dataverse.harvard.edu"
},
"publisher": {
"@type": "Organization",
"name": "Harvard Dataverse"
},
"version": "1.1",
"citeAs": "@data{DVN/CAGYQL_2024,author = {Forget, Gael},publisher = {Harvard Dataverse},title = {Ocean Heat Content},year = {2024},url = {https://doi.org/10.7910/DVN/CAGYQL}}",
"citation": [
{
"@type": "CreativeWork",
"name": "Forget, G.: Energy Imbalance in the Sunlit Ocean Layer (submitted)"
}
],
"distribution": [
{
"@type": "cr:FileObject",
"@id": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"encodingFormat": "application/x-netcdf",
"md5": "6578a2fa4f30bdb277b8b4581de9bb6b",
"contentSize": "14705",
"description": "Global ocean heat anomalies, in ZJoule, computed from 2004-2006 climatology for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954362"
},
{
"@type": "cr:FileObject",
"@id": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.png",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.png",
"encodingFormat": "image/png",
"md5": "81dbe65ed124c315ab7db4b0bf680186",
"contentSize": "39385",
"description": "Visualization of global OHC anomaly, computed from 2004-2006 climatology, for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954363"
}
]
}
|
The Croissant semantics break interoperability at the moment, with not too much gain. But most of it is immediately useful . |
@gaelforget I'll generate some suggestions for improved metadata based on the example above. in the meantime, setting up the Node (even with the current form of metadata ) can begin following https://book.odis.org/gettingStarted.html I'd set up a dedicated sitemap for ocean-related content (of any kind, socio-economic, physics, biological,...) and use that as the value of your ODIS-Arch URL in the ODISCat entry. |
@fils this is an opportunity to figure out how to handle Croissant semantics and types in a smart way. I'm thinking using additionalType for non-sdo stuff. That would also allow Croissant properties in the stanzas |
@gaelforget hi! @atrisovic and I are at a conference but my first recommendation is to
Also, you're welcome to kick off a thread in our Zulip! https://dataverse.zulipchat.com |
I'll post a comment for each component that is currently preventing compatibility with existing schema.org systems. We'll start with the DistributionStatus quo"distribution": [
{
"@type": "cr:FileObject",
"@id": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"encodingFormat": "application/x-netcdf",
"md5": "6578a2fa4f30bdb277b8b4581de9bb6b",
"contentSize": "14705",
"description": "Global ocean heat anomalies, in ZJoule, computed from 2004-2006 climatology for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954362"
},
{
"@type": "cr:FileObject",
"@id": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.png",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.png",
"encodingFormat": "image/png",
"md5": "81dbe65ed124c315ab7db4b0bf680186",
"contentSize": "39385",
"description": "Visualization of global OHC anomaly, computed from 2004-2006 climatology, for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954363"
}
]
Proposed change
Additional changes that may be useful:
"distribution": [
{
"@type": "DataDownload",
"additionalType": "cr:FileObject",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"encodingFormat": "application/x-netcdf",
"md5": "6578a2fa4f30bdb277b8b4581de9bb6b",
"contentSize": "14705",
"description": "Global ocean heat anomalies, in ZJoule, computed from 2004-2006 climatology for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954362"
},
{
"@type": "DataDownload",
"additionalType": "cr:FileObject",
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.png",
"encodingFormat": "image/png",
"md5": "81dbe65ed124c315ab7db4b0bf680186",
"contentSize": "39385",
"description": "Visualization of global OHC anomaly, computed from 2004-2006 climatology, for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954363"
}
]
The alternative using an array for types: "distribution": [
{
"@type": ["DataDownload", "cr:FileObject"],
"name": "OCCA2_ECCO4_global_OHC_anomaly_1992_2019.nc",
"encodingFormat": "application/x-netcdf",
"md5": "6578a2fa4f30bdb277b8b4581de9bb6b",
"contentSize": "14705",
"description": "Global ocean heat anomalies, in ZJoule, computed from 2004-2006 climatology for OCCA2 (release 1) and ECCO4 (release 5)",
"contentUrl": "https://dataverse.harvard.edu/api/access/datafile/8954362"
} Verify validation |
The change to That being said, it seems Croissant semantics are introducing some "noise" in addition to their very useful extensions of the base schema.org context. As mentioned, we'll likely write some guidance on how to best merge the two, without duplication / reinvention of things that vanilla schema.org already does. |
This is a known issue, seeing |
A prototypical application would be : search dataverse through ODIS to find sizable, regularly formatted, data sets for a given ocean region (e.g. coastal ocean off of New England, US)
Below I just document the bits and pieces we looked at today in discussing this idea with @pbuttigieg
The text was updated successfully, but these errors were encountered: