Skip to content

Commit

Permalink
Merge branch 'develop' of github.com:IQSS/dataverse into 9714-files-a…
Browse files Browse the repository at this point in the history
…pi-extension-filters
  • Loading branch information
GPortas committed Sep 27, 2023
2 parents 4f3d27e + 5fc7b30 commit 0cd37a4
Show file tree
Hide file tree
Showing 20 changed files with 160 additions and 40 deletions.
2 changes: 1 addition & 1 deletion conf/solr/9.3.0/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@
<!-- SCHEMA-FIELDS::BEGIN -->
<field name="accessToSources" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="actionsToMinimizeLoss" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="alternativeTitle" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="alternativeTitle" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="alternativeURL" type="text_en" multiValued="false" stored="true" indexed="true"/>
<field name="astroFacility" type="text_en" multiValued="true" stored="true" indexed="true"/>
<field name="astroInstrument" type="text_en" multiValued="true" stored="true" indexed="true"/>
Expand Down
9 changes: 9 additions & 0 deletions doc/release-notes/9428-alternative-title.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Alternative Title is made repeatable.
- One will need to update database with updated citation block.
`curl http://localhost:8080/api/admin/datasetfield/load -H "Content-type: text/tab-separated-values" -X POST --upload-file scripts/api/data/metadatablocks/citation.tsv`
- One will also need to update solr schema:
Change in "alternativeTitle" field multiValued="true" in `/usr/local/solr/solr-8.11.1/server/solr/collection1/conf/schema.xml`
Reload solr schema: `curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1"`

Since Alternative Title is repeatable now, old json apis would not be compatable with a new version since value of alternative title has changed from simple string to an array.
For example, instead "value": "Alternative Title", the value canbe "value": ["Alternative Title1", "Alternative Title2"]
14 changes: 14 additions & 0 deletions doc/release-notes/9859-ORE and Bag updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Dataverse's OAI_ORE Metadata Export format and archival BagIT exports
(which include the OAI-ORE metadata export file) have been updated to include
information about the dataset version state, e.g. RELEASED or DEACCESSIONED
and to indicate which version of Dataverse was used to create the archival Bag.
As part of the latter, the current OAI_ORE Metadata format has been given a 1.0.0
version designation and it is expected that any future changes to the OAI_ORE export
format will result in a version change and that tools such as DVUploader that can
recreate datasets from archival Bags will start indicating which version(s) of the
OAI_ORE format they can read.

Dataverse installations that have been using archival Bags may wish to update any
existing archival Bags they have, e.g. by deleting existing Bags and using the Dataverse
[archival Bag export API](https://guides.dataverse.org/en/latest/installation/config.html#bagit-export-api-calls)
to generate updated versions.
9 changes: 8 additions & 1 deletion doc/sphinx-guides/source/admin/integrations.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,7 +217,14 @@ Sponsored by the `Ontario Council of University Libraries (OCUL) <https://ocul.o
RDA BagIt (BagPack) Archiving
+++++++++++++++++++++++++++++

A Dataverse installation can be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to the `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_, to a local file system, or to `Google Cloud Storage <https://cloud.google.com/storage>`_.
A Dataverse installation can be configured to submit a copy of published Dataset versions, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ bags to `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_, a local file system, any S3 store, or to `Google Cloud Storage <https://cloud.google.com/storage>`_.
Submission can be automated to occur upon publication, or can be done periodically (via external scripting).
The archival status of each Dataset version can be seen in the Dataset page version table and queried via API.

The archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or potentially in another RDA-conformant repository.
Specifically, the archival Bags include an OAI-ORE Map serialized as JSON-LD that describe the dataset and it's files, as well as information about the version of Dataverse used to export the archival Bag.

The `DVUploader <https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader>`_ includes functionality to recreate a Dataset from an archival Bag produced by Dataverse (using the Dataverse API to do so).

For details on how to configure this integration, see :ref:`BagIt Export` in the :doc:`/installation/config` section of the Installation Guide.

Expand Down
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/client-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ There are multiple Python modules for interacting with Dataverse APIs.

`EasyDataverse <https://github.com/gdcc/easyDataverse>`_ is a Python library designed to simplify the management of Dataverse datasets in an object-oriented way, giving users the ability to upload, download, and update datasets with ease. By utilizing metadata block configurations, EasyDataverse automatically generates Python objects that contain all the necessary details required to create the native Dataverse JSON format used to create or edit datasets. Adding files and directories is also possible with EasyDataverse and requires no additional API calls. This library is particularly well-suited for client applications such as workflows and scripts as it minimizes technical complexities and facilitates swift development.

`python-dvuploader <https://github.com/gdcc/python-dvuploader>`_ implements Jim Myers' excellent `dv-uploader <https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader>`_ as a Python module. It offers parallel direct uploads to Dataverse backend storage, streams files directly instead of buffering them in memory, and supports multi-part uploads, chunking data accordingly.

`pyDataverse <https://github.com/gdcc/pyDataverse>`_ primarily allows developers to manage Dataverse collections, datasets and datafiles. Its intention is to help with data migrations and DevOps activities such as testing and configuration management. The module is developed by `Stefan Kasberger <http://stefankasberger.at>`_ from `AUSSDA - The Austrian Social Science Data Archive <https://aussda.at>`_.

`UBC's Dataverse Utilities <https://ubc-library-rc.github.io/dataverse_utils/>`_ are a set of Python console utilities which allow one to upload datasets from a tab-separated-value spreadsheet, bulk release multiple datasets, bulk delete unpublished datasets, quickly duplicate records. replace licenses, and more. For additional information see their `PyPi page <https://pypi.org/project/dataverse-utils/>`_.
Expand Down
4 changes: 3 additions & 1 deletion doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2144,10 +2144,12 @@ The API call requires a Json body that includes the list of the fileIds that the
curl -H "X-Dataverse-key: $API_TOKEN" -H "Content-Type:application/json" "$SERVER_URL/api/datasets/:persistentId/files/actions/:unset-embargo?persistentId=$PERSISTENT_IDENTIFIER" -d "$JSON"
.. _Archival Status API:

Get the Archival Status of a Dataset By Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Archiving is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.
Archival :ref:`BagIt Export` is an optional feature that may be configured for a Dataverse installation. When that is enabled, this API call be used to retrieve the status. Note that this requires "superuser" credentials.

``GET /api/datasets/$dataset-id/$version/archivalStatus`` returns the archival status of the specified dataset version.

Expand Down
14 changes: 7 additions & 7 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
=============
Configuration
=============

Expand Down Expand Up @@ -1427,24 +1426,25 @@ BagIt file handler configuration settings:
BagIt Export
------------

Your Dataverse installation may be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ archival Bags (sometimes called BagPacks) to `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_ or alternately to any folder on the local filesystem.
Your Dataverse installation may be configured to submit a copy of published Datasets, packaged as `Research Data Alliance conformant <https://www.rd-alliance.org/system/files/Research%20Data%20Repository%20Interoperability%20WG%20-%20Final%20Recommendations_reviewed_0.pdf>`_ zipped `BagIt <https://tools.ietf.org/html/draft-kunze-bagit-17>`_ archival Bags (sometimes called BagPacks) to one of several supported storage services.
Supported services include `Chronopolis <https://libraries.ucsd.edu/chronopolis/>`_ via `DuraCloud <https://duraspace.org/duracloud/>`_, Google's Cloud, and any service that can provide an S3 interface or handle files transferred to a folder on the local filesystem.

These archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or postentially in another RDA-conformant repository.
These archival Bags include all of the files and metadata in a given dataset version and are sufficient to recreate the dataset, e.g. in a new Dataverse instance, or potentially in another RDA-conformant repository. The `DVUploader <https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader>`_ includes functionality to recreate a Dataset from an archival Bag produced by Dataverse. (Note that this functionality is distinct from the :ref:`BagIt File Handler` upload files to an existing Dataset via the Dataverse user interface.)

The Dataverse Software offers an internal archive workflow which may be configured as a PostPublication workflow via an admin API call to manually submit previously published Datasets and prior versions to a configured archive such as Chronopolis. The workflow creates a `JSON-LD <http://www.openarchives.org/ore/0.9/jsonld>`_ serialized `OAI-ORE <https://www.openarchives.org/ore/>`_ map file, which is also available as a metadata export format in the Dataverse Software web interface.

At present, archiving classes include the DuraCloudSubmitToArchiveCommand, LocalSubmitToArchiveCommand, GoogleCloudSubmitToArchive, and S3SubmitToArchiveCommand , which all extend the AbstractSubmitToArchiveCommand and use the configurable mechanisms discussed below. (A DRSSubmitToArchiveCommand, which works with Harvard's DRS also exists and, while specific to DRS, is a useful example of how Archivers can support single-version-only semantics and support archiving only from specified collections (with collection specific parameters)).

All current options support the archival status APIs and the same status is available in the dataset page version table (for contributors/those who could view the unpublished dataset, with more detail available to superusers).
All current options support the :ref:`Archival Status API` calls and the same status is available in the dataset page version table (for contributors/those who could view the unpublished dataset, with more detail available to superusers).

.. _Duracloud Configuration:

Duracloud Configuration
+++++++++++++++++++++++

Also note that while the current Chronopolis implementation generates the archival Bag and submits it to the archive's DuraCloud interface, the step to make a 'snapshot' of the space containing the archival Bag (and verify it's successful submission) are actions a curator must take in the DuraCloud interface.
The current Chronopolis implementation generates the archival Bag and submits it to the archive's DuraCloud interface. The step to make a 'snapshot' of the space containing the archival Bag (and verify it's successful submission) are actions a curator must take in the DuraCloud interface.

The minimal configuration to support an archiver integration involves adding a minimum of two Dataverse Software Keys and any required Payara jvm options. The example instructions here are specific to the DuraCloud Archiver\:
The minimal configuration to support archiver integration involves adding a minimum of two Dataverse Software settings. Individual archivers may require additional settings and/or Payara jvm options and micro-profile settings. The example instructions here are specific to the DuraCloud Archiver\:

\:ArchiverClassName - the fully qualified class to be used for archiving. For example:

Expand All @@ -1454,7 +1454,7 @@ The minimal configuration to support an archiver integration involves adding a m

``curl http://localhost:8080/api/admin/settings/:ArchiverSettings -X PUT -d ":DuraCloudHost, :DuraCloudPort, :DuraCloudContext, :BagGeneratorThreads"``

The DPN archiver defines three custom settings, one of which is required (the others have defaults):
The DuraCloud archiver defines three custom settings, one of which is required (the others have defaults):

\:DuraCloudHost - the URL for your organization's Duracloud site. For example:

Expand Down
4 changes: 2 additions & 2 deletions scripts/api/data/dataset-create-new-all-default-fields.json
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@
},
{
"typeName": "alternativeTitle",
"multiple": false,
"multiple": true,
"typeClass": "primitive",
"value": "Alternative Title"
"value": ["Alternative Title"]
},
{
"typeName": "alternativeURL",
Expand Down
2 changes: 1 addition & 1 deletion scripts/api/data/metadatablocks/citation.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
#datasetField name title description watermark fieldType displayOrder displayFormat advancedSearchField allowControlledVocabulary allowmultiples facetable displayoncreate required parent metadatablock_id termURI
title Title The main title of the Dataset text 0 TRUE FALSE FALSE FALSE TRUE TRUE citation http://purl.org/dc/terms/title
subtitle Subtitle A secondary title that amplifies or states certain limitations on the main title text 1 FALSE FALSE FALSE FALSE FALSE FALSE citation
alternativeTitle Alternative Title Either 1) a title commonly used to refer to the Dataset or 2) an abbreviation of the main title text 2 FALSE FALSE FALSE FALSE FALSE FALSE citation http://purl.org/dc/terms/alternative
alternativeTitle Alternative Title Either 1) a title commonly used to refer to the Dataset or 2) an abbreviation of the main title text 2 FALSE FALSE TRUE FALSE FALSE FALSE citation http://purl.org/dc/terms/alternative
alternativeURL Alternative URL Another URL where one can view or access the data in the Dataset, e.g. a project or personal webpage https:// url 3 <a href="#VALUE" target="_blank">#VALUE</a> FALSE FALSE FALSE FALSE FALSE FALSE citation https://schema.org/distribution
otherId Other Identifier Another unique identifier for the Dataset (e.g. producer's or another repository's identifier) none 4 : FALSE FALSE TRUE FALSE FALSE FALSE citation
otherIdAgency Agency The name of the agency that generated the other identifier text 5 #VALUE FALSE FALSE FALSE FALSE FALSE FALSE otherId citation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1338,6 +1338,7 @@ private void processProdStmt(XMLStreamReader xmlr, MetadataBlockDTO citation) th
List<HashSet<FieldDTO>> producers = new ArrayList<>();
List<HashSet<FieldDTO>> grants = new ArrayList<>();
List<HashSet<FieldDTO>> software = new ArrayList<>();
List<String> prodPlac = new ArrayList<>();

for (int event = xmlr.next(); event != XMLStreamConstants.END_DOCUMENT; event = xmlr.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
Expand All @@ -1353,9 +1354,7 @@ private void processProdStmt(XMLStreamReader xmlr, MetadataBlockDTO citation) th
} else if (xmlr.getLocalName().equals("prodDate")) {
citation.getFields().add(FieldDTO.createPrimitiveFieldDTO("productionDate", parseDate(xmlr, "prodDate")));
} else if (xmlr.getLocalName().equals("prodPlac")) {
List<String> prodPlac = new ArrayList<>();
prodPlac.add(parseText(xmlr, "prodPlac"));
citation.getFields().add(FieldDTO.createMultiplePrimitiveFieldDTO(DatasetFieldConstant.productionPlace, prodPlac));
prodPlac.add(parseText(xmlr));
} else if (xmlr.getLocalName().equals("software")) {
HashSet<FieldDTO> set = new HashSet<>();
addToSet(set,"softwareVersion", xmlr.getAttributeValue(null, "version"));
Expand Down Expand Up @@ -1388,6 +1387,9 @@ private void processProdStmt(XMLStreamReader xmlr, MetadataBlockDTO citation) th
if (producers.size()>0) {
citation.getFields().add(FieldDTO.createMultipleCompoundFieldDTO("producer", producers));
}
if (prodPlac.size() > 0) {
citation.getFields().add(FieldDTO.createMultiplePrimitiveFieldDTO(DatasetFieldConstant.productionPlace, prodPlac));
}
return;
}
}
Expand All @@ -1397,6 +1399,7 @@ private void processProdStmt(XMLStreamReader xmlr, MetadataBlockDTO citation) th
private void processTitlStmt(XMLStreamReader xmlr, DatasetDTO datasetDTO) throws XMLStreamException, ImportException {
MetadataBlockDTO citation = datasetDTO.getDatasetVersion().getMetadataBlocks().get("citation");
List<HashSet<FieldDTO>> otherIds = new ArrayList<>();
List<String> altTitles = new ArrayList<>();

for (int event = xmlr.next(); event != XMLStreamConstants.END_DOCUMENT; event = xmlr.next()) {
if (event == XMLStreamConstants.START_ELEMENT) {
Expand All @@ -1407,8 +1410,7 @@ private void processTitlStmt(XMLStreamReader xmlr, DatasetDTO datasetDTO) throws
FieldDTO field = FieldDTO.createPrimitiveFieldDTO("subtitle", parseText(xmlr));
citation.getFields().add(field);
} else if (xmlr.getLocalName().equals("altTitl")) {
FieldDTO field = FieldDTO.createPrimitiveFieldDTO("alternativeTitle", parseText(xmlr));
citation.getFields().add(field);
altTitles.add(parseText(xmlr));
} else if (xmlr.getLocalName().equals("IDNo")) {
if ( AGENCY_HANDLE.equals( xmlr.getAttributeValue(null, "agency") ) || AGENCY_DOI.equals( xmlr.getAttributeValue(null, "agency") ) ) {
importGenericService.reassignIdentifierAsGlobalId(parseText(xmlr), datasetDTO);
Expand Down Expand Up @@ -1436,6 +1438,10 @@ private void processTitlStmt(XMLStreamReader xmlr, DatasetDTO datasetDTO) throws
if (otherIds.size()>0) {
citation.addField(FieldDTO.createMultipleCompoundFieldDTO("otherId", otherIds));
}
if (!altTitles.isEmpty()) {
FieldDTO field = FieldDTO.createMultiplePrimitiveFieldDTO(DatasetFieldConstant.alternativeTitle, altTitles);
citation.getFields().add(field);
}
return;
}
}
Expand Down
Loading

0 comments on commit 0cd37a4

Please sign in to comment.