Skip to content

Commit

Permalink
IQSS/3623 - Multiple PID Provider support (#10234)
Browse files Browse the repository at this point in the history
* Switch to per-pid-provider settings

* partial refactor towards non-bean providers

* ~auto refactor name/package, remove bean status

* remove Util class

* rename

* add factories for all, fix providers, etc.

* unmanaged providers

* add getters

* add name to cnstr, add cnstr for unmanaged, add auth/shoulder checks

* update permalinks, add separator setting

* no arg constructor

* add unmanaged providers

* check canManagePid instead

* replace getBean(), compiles except for tests

* update tests, comment out ones that are TBD

* add clear method for testing

* bugs - remove dup authority, fix name, add auth/sep/shoulder tests

* make managed/excluded lists optional

* fix name in generated pids

* move setup to berforeall, add test of second permaprovider

* provider name->id

* adding label, more name->id

* providerName->providerId

* add factory map, lookups, add factory, perma parsing tests

* first datacite parsing test/fix id in pid

* rename class

* move auth/shoulder check to lower level method

* fix ids, fix managed list optional in fake

* add effective pid generator logic

* add effective pid generator tests

* fix param order

* fix perma handling of managed/excluded entries

* add managed/excluded tests, cleanup

* update pidprovider discovery to get effective one when necessary

* replace all refs to global protocol/auth/shoulder settings xcept one

keeping the new PidProviderFactoryBean.getDefaultPidGenerator() for now
as a possible way to stay ~backward comaptible

All the rest - tried to find the appropriate PidProvider to supply the
values

* first UI for setting Pid generator

* typo

* flyway script to add pid spec column

* @autoservice and public class for loader discovery

* minor cleanup/refactor

* verify protocol/auth are set/match the provider plus cleanup

* only call getGlobalId() when one should exist

* force all calls to create identifier to set protocol/auth as well

* move template to match refactor

* require superuser to change PidProvider

* cleanup

* check can create method

* make fake provider create file pids

* typo - fix UI

* return default instead of null for UI

* unrelated - logic fix

* partial support for legacy config - FAKE and DatCite - for testing

* cleanup

* style fail

* fix test - don't reset list of providers

* allow old aliases

* reverse logic in datacite legacy creator, add null check

* fix lookups, update test, test DataCite legacy

* missing if!

* disable obsolete test

* updated docs

* add test urls as default

* cleanup -remove unused imports

* unrelated link fix

* fix for #10251 - sync terms popup required code

* API calls for getting provider info and changing PID Generators

* api docs

* change level for entries to fix build error

* typo in refs

* fix indents

* more bad refs

* support for legacy hdl, perma, ezid

* new packages for everyone! (refactor)

* unused imports

* fix cut/paste issues

* add deprecation info

* Apply suggestions from code review

Co-authored-by: Oliver Bertuch <[email protected]>

* reorg/update imports

* revert 2e41b9e

* deprecate old settings

* Change error handling and warnings per review

* Add testing for a valid PID generator as a config test

* formatting, switch if /else logic per review

* add deprecation

* move pid provider's dir setting to spi scope

* change flyway name, tweak release note, delete unused test class

* temporary flyway change

* use new settings in install

* Revert "temporary flyway change"

This reverts commit 7106ef6.

* fix rest api setting

* handle spaces in the pidproviders setting

* add note in Harvard setup

* refactoring/cleaning DataCite provider, drop cache

* moving XmlMetadataTemplate to doi package

* missing import

* move xml file to match package

* minor fixes, make getPidStatus visible in test

* disabled test of DPI lifecycle

* update installer/docs to not talk about a partial DataCite test setup

* remove legacy setting

* indent issue

* missing )

* fix setting name

* remove obsolete settings

* add defaults

* add valid fake pid setup for docker

* also adding pid config to the -dev yml

* Update docker-compose-dev.yml

Co-authored-by: Steven Winship <[email protected]>

* Update docker/compose/demo/compose.yml

Co-authored-by: Steven Winship <[email protected]>

* Update docker-compose-dev.yml

Co-authored-by: Steven Winship <[email protected]>

* Update docker/compose/demo/compose.yml

Co-authored-by: Steven Winship <[email protected]>

---------

Co-authored-by: Oliver Bertuch <[email protected]>
Co-authored-by: qqmye <qqmye@BOOK-2CB3G91HHU>
Co-authored-by: Steven Winship <[email protected]>
  • Loading branch information
4 people authored Mar 6, 2024
1 parent 0390b38 commit 44ce6a1
Show file tree
Hide file tree
Showing 113 changed files with 4,605 additions and 3,157 deletions.
37 changes: 37 additions & 0 deletions doc/release-notes/3623-multipid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
This release adds support for using multiple PID (DOI, Handle, PermalLink) providers, multiple PID provider accounts
(managing a given protocol, authority,separator, shoulder combination), assigning PID provider accounts to specific collections,
and supporting transferred PIDs (where a PID is managed by an account when it's authority, separator, and/or shoulder don't match
the combination where the account can mint new PIDs). It also adds the ability for additional provider services beyond the existing
DataCite, EZId, Handle, and PermaLink providers to be dynamically added as separate jar files.

These changes require per-provider settings rather than the global PID settings previously supported. While backward compatibility
for installations using a single PID Provider account is provided, updating to use the new microprofile settings is highly recommended
and will be required in a future version.

New microprofile settings (where * indicates a provider id indicating which provider the setting is for):

dataverse.pid.providers
dataverse.pid.default-provider
dataverse.pid.*.type
dataverse.pid.*.label
dataverse.pid.*.authority
dataverse.pid.*.shoulder
dataverse.pid.*.identifier-generation-style
dataverse.pid.*.datafile-pid-format
dataverse.pid.*.managed-list
dataverse.pid.*.excluded-list
dataverse.pid.*.datacite.mds-api-url
dataverse.pid.*.datacite.rest-api-url
dataverse.pid.*.datacite.username
dataverse.pid.*.datacite.password
dataverse.pid.*.ezid.api-url
dataverse.pid.*.ezid.username
dataverse.pid.*.ezid.password
dataverse.pid.*.permalink.base-url
dataverse.pid.*.permalink.separator
dataverse.pid.*.handlenet.index
dataverse.pid.*.handlenet.independent-service
dataverse.pid.*.handlenet.auth-handle
dataverse.pid.*.handlenet.key.path
dataverse.pid.*.handlenet.key.passphrase
dataverse.spi.pidproviders.directory
100 changes: 100 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2747,6 +2747,56 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/datasets/24/versions/1.0/canDownloadAtLeastOneFile"
.. _dataset-pid-generator:
Configure The PID Generator a Dataset Uses (If Enabled)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dataverse can be configured to use multiple PID Providers (see the :ref:`pids-configuration` section for more information).
When there are multiple PID Providers and File PIDs are enabled, it is possible to set which provider will be used to generate (mint) those PIDs.
While it usually makes sense to use the same PID Provider that manages the dataset PID, there are cases, specifically if the PID Provider for the dataset PID cannot generate
other PIDs with the same authority/shoulder, etc. as in the dataset PID, where another Provider is needed. Dataverse has a set of API calls to see what PID provider will be
used to generate datafile PIDs and, as a superuser, to change it (to a new one or back to a default).
To see the current choice for this dataset:
.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
curl "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
The response will be the id of the PID Provider that will be used. Details of that provider's configration can be obtained via the :ref:`pids-providers-api`.
To set the behavior for this dataset:
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
export GENERATOR_ID=perma1
curl -X PUT -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json -d $GENERATOR_ID "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
The PID Provider id used must be one of the those configured - see :ref:`pids-providers-api`.
The return status code may be 200/OK, 401/403 if an api key is not sent or the user is not a superuser, or 404 if the dataset or PID provider are not found.
Note that using a PIDProvider that generates DEPENDENT datafile PIDs that doesn't share the dataset PID's protocol/authority/separator/shoulder is not supported. (INDEPENDENT should be used in this case see the :ref:`pids-configuration` section for more information).
The API can also be used to reset the dataset to use the default/inherited value:
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/YD5QDG
curl -X DELETE -H "X-Dataverse-key:$API_TOKEN" -H Content-type:application/json "$SERVER_URL/api/datasets/:persistentId/pidGenerator?persistentId=$PERSISTENT_IDENTIFIER"
The default will always be the same provider as for the dataset PID if that provider can generate new PIDs, and will be the PID Provider set for the collection or the global default otherwise.
Files
-----
Expand Down Expand Up @@ -4809,6 +4859,56 @@ The fully expanded example above (without environment variables) looks like this
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "https://demo.dataverse.org/api/pids/:persistentId/delete?persistentId=doi:10.70122/FK2/9BXT5O"
.. _pids-providers-api:
Get Information about Configured PID Providers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dataverse can be configured with one or more PID Providers that it uses to create new PIDs and manage existing ones.
This API call returns a JSONObject listing the configured providers and details about the protocol/authority/separator/shoulder they manage,
along with information about about how new dataset and datafile PIDs are generated. See the :ref:`pids-configuration` section for more information.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/pids/providers"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/pids/providers"
Get the id of the PID Provider Managing a Given PID
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dataverse can be configured with one or more PID Providers that it uses to create new PIDs and manage existing ones.
This API call returns the string id of the PID Provider than manages a given PID. See the :ref:`pids-configuration` section for more information.
Delete PID (this is only possible for PIDs that are in the "draft" state) and within a Dataverse installation, set ``globalidcreatetime`` to null and ``identifierregistered`` to false. A superuser API token is required.
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PID=doi:10.70122/FK2/9BXT5O
curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/pids/providers/$PID"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/pids/providers/doi:10.70122/FK2/9BXT5O"
If the PID is not managed by Dataverse, this call will report if the PID is recognized as a valid PID for a given protocol (doi, hdl, or perma)
or will return a 400/Bad Request response if it is not.
.. _admin:
Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/developers/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Please note that while the script should work well on new-ish branches, older br
Migrating Datafiles from Local Storage to S3
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A number of pilot Dataverse installations start on local storage, then administrators are tasked with migrating datafiles into S3 or similar object stores. The files may be copied with a command-line utility such as `s3cmd<https://s3tools.org/s3cmd>`. You will want to retain the local file hierarchy, keeping the authority (for example: 10.5072) at the bucket "root."
A number of pilot Dataverse installations start on local storage, then administrators are tasked with migrating datafiles into S3 or similar object stores. The files may be copied with a command-line utility such as `s3cmd <https://s3tools.org/s3cmd>`_. You will want to retain the local file hierarchy, keeping the authority (for example: 10.5072) at the bucket "root."

The below example queries may assist with updating dataset and datafile locations in the Dataverse installation's PostgresQL database. Depending on the initial version of the Dataverse Software and subsequent upgrade path, Datafile storage identifiers may or may not include a ``file://`` prefix, so you'll want to catch both cases.

Expand Down
Loading

0 comments on commit 44ce6a1

Please sign in to comment.