Skip to content

Commit

Permalink
Merge branch 'develop' into 8236-required-subfields IQSS#8236
Browse files Browse the repository at this point in the history
  • Loading branch information
pdurbin committed Nov 29, 2021
2 parents 291dbbe + 1c08b81 commit 9cf1d53
Show file tree
Hide file tree
Showing 33 changed files with 670 additions and 76 deletions.
7 changes: 7 additions & 0 deletions doc/release-notes/8155-external-metadata-validation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
### Support for optional external metadata validation scripts

This enables an installation administrator to provide custom scripts for additional metadata validation when datasets are being published and/or when Dataverse collections are being published or modified. Harvard Dataverse Repository has been using this mechanism to combat content that violates our Terms of Use. All the validation or verification logic is defined in these external scripts, thus making it possible for an installation to add checks custom-tailored to their needs.

Please note that only the metadata are subject to these validation checks (not the content of any uploaded files!).

For more information, see the [Database Settings](https://guides.dataverse.org/en/5.9/installation/config.html) section of the Guide.
3 changes: 3 additions & 0 deletions doc/release-notes/8174-new-managefilepermissions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## New ManageFilePermissions Permission

Dataverse can now support a use case in which a Admin or Curator would like to delegate the ability to grant access to restricted files to other users. This can be implemented by creating a custom role (e.g. DownloadApprover) that has the new ManageFilePermissions permission. This release introduces the new permission ( and adjusts the existing standard Admin and Curator roles so they continue to have the ability to grant file download requrests).
14 changes: 14 additions & 0 deletions doc/release-notes/8235-auxiliaryfileAPIenhancements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
### Auxiliary File API Enhancements

This release includes updates to the Auxiliary File API:
- Auxiliary files can now also be associated with non-tabular files
- Improved error reporting
- The API will block attempts to create a duplicate auxiliary file
- Delete and list-by-original calls have been added
- Bug fix: correct checksum recorded for aux file

Please note that the auxiliary files feature is experimental and is designed to support integration with tools from the [OpenDP Project](https://opendp.org). If the API endpoints are not needed they can be blocked.

### Major Use Cases

(note for release time - expand on the items above, as use cases)
7 changes: 7 additions & 0 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ Removes a link between a Dataverse collection and another Dataverse collection.

curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/dataverses/$linked-dataverse-alias/deleteLink/$linking-dataverse-alias

List Dataverse Collection Links
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Provides information about whether a certain Dataverse collection ($dataverse-alias) is linked to or links to another collection. Only accessible to superusers. ::

curl -H "X-Dataverse-key:$API_TOKEN" http://$SERVER/api/dataverses/$dataverse-alias/links

Add Dataverse Collection RoleAssignments to Dataverse Subcollections
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down
40 changes: 36 additions & 4 deletions doc/sphinx-guides/source/developers/aux-file-support.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Auxiliary File Support
======================

Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the OpenDP project (opendp.org). In future versions, this approach will likely become more broadly used and supported.
Auxiliary file support is experimental and as such, related APIs may be added, changed or removed without standard backward compatibility. Auxiliary files in the Dataverse Software are being added to support depositing and downloading differentially private metadata, as part of the `OpenDP project <https://opendp.org>`_. In future versions, this approach will likely become more broadly used and supported.

Adding an Auxiliary File to a Datafile
--------------------------------------
Expand All @@ -16,12 +16,12 @@ To add an auxiliary file, specify the primary key of the datafile (FILE_ID), and
export FORMAT_VERSION='v1'
export TYPE='DP'
export SERVER_URL=https://demo.dataverse.org
curl -H X-Dataverse-key:$API_TOKEN -X POST -F "file=@$FILENAME" -F 'origin=myApp' -F 'isPublic=true' -F "type=$TYPE" "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
You should expect a 200 ("OK") response and JSON with information about your newly uploaded auxiliary file.

Downloading an Auxiliary File that belongs to a Datafile
Downloading an Auxiliary File that Belongs to a Datafile
--------------------------------------------------------
To download an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:
Expand All @@ -33,5 +33,37 @@ formatTag and formatVersion (if applicable) associated with the auxiliary file:
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
Listing Auxiliary Files for a Datafile by Origin
------------------------------------------------
To list auxiliary files, specify the primary key of the datafile (FILE_ID), and the origin associated with the auxiliary files to list (the application/entity that created them).

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export FILE_ID='12345'
export SERVER_URL=https://demo.dataverse.org
export ORIGIN='app1'
curl "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$ORIGIN"
You should expect a 200 ("OK") response and a JSON array with objects representing the auxiliary files found, or a 404/Not Found response if no auxiliary files exist with that origin.

Deleting an Auxiliary File that Belongs to a Datafile
-----------------------------------------------------
To delete an auxiliary file, use the primary key of the datafile, and the
formatTag and formatVersion (if applicable) associated with the auxiliary file:

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_ID='12345'
export FORMAT_TAG='dpJson'
export FORMAT_VERSION='v1'
curl -X DELETE "$SERVER_URL/api/access/datafile/$FILE_ID/auxiliary/$FORMAT_TAG/$FORMAT_VERSION"
51 changes: 51 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2409,3 +2409,54 @@ setting indicates embargoes are not supported. A value of -1 allows embargoes of
can enter for an embargo end date. This limit will be enforced in the popup dialog in which users enter the embargo date. For example, to set a two year maximum:

``curl -X PUT -d 24 http://localhost:8080/api/admin/settings/:MaxEmbargoDurationInMonths``

:DataverseMetadataValidatorScript
+++++++++++++++++++++++++++++++++

An optional external script that validates Dataverse collection metadata as it's being updated or published. The script provided should be an executable that takes a single command line argument, the name of the file containing the metadata exported in the native json format. I.e., Dataverse application will be exporting the collection metadata in json format, saving it in a temp file, and passing the name of the temp file to the validation script as the command line argument. The script should exit with a non-zero error code if the validation fails. If that happens, a failure message (customizable in the next two settings below, `:DataverseMetadataPublishValidationFailureMsg` and `:DataverseMetadataUpdateValidationFailureMsg`) will be shown to the user.

For example, once the following setting is created:

``curl -X PUT -d /usr/local/bin/dv_validator.sh http://localhost:8080/api/admin/settings/:DataverseMetadataValidatorScript``

:DataverseMetadataPublishValidationFailureMsg
+++++++++++++++++++++++++++++++++++++++++++++

Specifies a custom error message shown to the user when a Dataverse collection fails an external metadata validation (as specified in the setting above) during an attempt to publish. If not specified, the default message "This dataverse collection cannot be published because it has failed an external metadata validation test" will be used.

For example:

``curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DataverseMetadataPublishValidationFailureMsg``


:DataverseMetadataUpdateValidationFailureMsg
++++++++++++++++++++++++++++++++++++++++++++

Same as above, but specifies a custom error message shown to the user when an external metadata validation check fails during an attempt to modify a Dataverse collection. If not specified, the default message "This dataverse collection cannot be updated because it has failed an external metadata validation test" will be used.


:DatasetMetadataValidatorScript
+++++++++++++++++++++++++++++++

An optional external script that validates dataset metadata during publishing. The script provided should be an executable that takes a single command line argument, the name of the file containing the metadata exported in the native json format. I.e., Dataverse application will be exporting the dataset metadata in json format, saving it in a temp file, and passing the name of the file to the validation script as the command line argument. The script should exit with a non-zero error code if the validation fails. If that happens, the dataset is left unpublished, and a failure message (customizable in the next setting below, `:DatasetMetadataValidationFailureMsg`) will be shown to the user.

For example:

``curl -X PUT -d /usr/local/bin/ds_validator.sh http://localhost:8080/api/admin/settings/:DatasetMetadataValidatorScript``

In some ways this duplicates a workflow mechanism, since it is possible to define a workflow with additonal validation steps. But please note that the important difference is that this external validation happens *synchronously*, while the user is wating; while a workflow is performed asynchronously with a lock placed on the dataset. This can be useful to some installations, in some situations. But it also means that the script provided should be expected to always work reasonably fast - ideally, in seconds, rather than minutes, etc.

:DatasetMetadataValidationFailureMsg
++++++++++++++++++++++++++++++++++++

Specifies a custom error message shown to the user when a dataset fails an external metadata validation (as specified in the setting above) during an attempt to publish. If not specified, the default message "This dataset cannot be published because it has failed an external metadata validation test" will be used.

For example:

``curl -X PUT -d "This content needs to go through an additional review by the Curation Team before it can be published." http://localhost:8080/api/admin/settings/:DatasetMetadataValidationFailureMsg``


:ExternalValidationAdminOverride
++++++++++++++++++++++++++++++++

When set to ``true``, this setting allows a superuser to publish and/or update Dataverse collections and datasets bypassing the external validation checks (specified by the settings above). In an event where an external script is reporting validation failures that appear to be in error, this option gives an admin with superuser privileges a quick way to publish the dataset or update a collection for the user.
1 change: 1 addition & 0 deletions scripts/api/data/role-curator.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
"DeleteDatasetDraft",
"PublishDataset",
"ManageDatasetPermissions",
"ManageFilePermissions",
"AddDataverse",
"AddDataset",
"ViewUnpublishedDataverse"
Expand Down
5 changes: 4 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/AuxiliaryFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,10 @@
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type = :type"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesWithoutType",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),})
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.type is null"),
@NamedQuery(name = "AuxiliaryFile.findAuxiliaryFilesByOrigin",
query = "select object(o) from AuxiliaryFile as o where o.dataFile.id = :dataFileId and o.origin = :origin"),
})
@NamedNativeQueries({
@NamedNativeQuery(name = "AuxiliaryFile.findAuxiliaryFileTypes",
query = "select distinct type from auxiliaryfile where datafile_id = ?1")
Expand Down
Loading

0 comments on commit 9cf1d53

Please sign in to comment.