You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Metadata Source Facet Can Now Differentiate Between Harvested Sources
2
+
3
+
The behavior of the feature flag `index-harvested-metadata-source` and the "Metadata Source" facet, which were added and updated, respectively, in [Dataverse 6.3](https://github.com/IQSS/dataverse/releases/tag/v6.3) (through pull requests #10464 and #10651), have been updated. A new field called "Source Name" has been added to harvesting clients.
4
+
5
+
Before Dataverse 6.3, all harvested content (datasets and files) appeared together under "Harvested" under the "Metadata Source" facet. This is still the behavior of Dataverse out of the box. Since Dataverse 6.3, enabling the `index-harvested-metadata-source` feature flag (and reindexing) resulted in harvested content appearing under the nickname for whatever harvesting client was used to bring in the content. This meant that instead of having all harvested content lumped together under "Harvested", content would appear under "client1", "client2", etc.
6
+
7
+
Now, as this release, enabling the `index-harvested-metadata-source` feature flag, populating a new field for harvesting clients called "Source Name" ("sourceName" in the [API](https://dataverse-guide--11217.org.readthedocs.build/en/11217/api/native-api.html#create-a-harvesting-client)), and reindexing (see upgrade instructions below), results in the source name appearing under the "Metadata Source" facet rather than the harvesting client nickname. This gives you more control over the name that appears under the "Metadata Source" facet and allows you to group harvested content from various harvesting clients under the same name if you wish (by reusing the same source name).
8
+
9
+
Previously, `index-harvested-metadata-source` was not documented in the guides, but now you can find information about it under [Feature Flags](https://dataverse-guide--11217.org.readthedocs.build/en/11217/installation/config.html#feature-flags). See also #10217 and #11217.
10
+
11
+
## Upgrade instructions
12
+
13
+
If you have enabled the `dataverse.feature.index-harvested-metadata-source` feature flag and given some of your harvesting clients a source name, you should reindex to have those source names appear under the "Metadata Source" facet.
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
Copy file name to clipboardexpand all lines: doc/sphinx-guides/source/api/native-api.rst
+57-33
Original file line number
Diff line number
Diff line change
@@ -5556,7 +5556,7 @@ Create a Harvesting Set
5556
5556
5557
5557
To create a harvesting set you must supply a JSON file that contains the following fields:
5558
5558
5559
-
- Name: Alpha-numeric may also contain -, _, or %, but no spaces. Must also be unique in the installation.
5559
+
- Name: Alpha-numeric may also contain -, _, or %, but no spaces. It must also be unique in the installation.
5560
5560
- Definition: A search query to select the datasets to be harvested. For example, a query containing authorName:YYY would include all datasets where ‘YYY’ is the authorName.
5561
5561
- Description: Text that describes the harvesting set. The description appears in the Manage Harvesting Sets dashboard and in API responses. This field is optional.
5562
5562
@@ -5652,20 +5652,43 @@ The following API can be used to create and manage "Harvesting Clients". A Harve
5652
5652
List All Configured Harvesting Clients
5653
5653
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5654
5654
5655
-
Shows all the Harvesting Clients configured::
5655
+
Shows all the harvesting clients configured.
5656
5656
5657
-
GET http://$SERVER/api/harvest/clients/
5657
+
.. note:: See :ref:`curl-examples-and-environment-variables`if you are unfamiliar with the use of export below.
5658
+
5659
+
.. code-block:: bash
5660
+
5661
+
export SERVER_URL=https://demo.dataverse.org
5662
+
5663
+
curl "$SERVER_URL/api/harvest/clients"
5664
+
5665
+
The fully expanded example above (without the environment variables) looks like this:
The output will look something like the following.
5690
+
5691
+
.. code-block:: bash
5669
5692
5670
5693
{
5671
5694
"status":"OK",
@@ -5681,6 +5704,7 @@ Shows a Harvesting Client with a defined nickname::
5681
5704
"type": "oai",
5682
5705
"dataverseAlias": "fooData",
5683
5706
"nickName": "myClient",
5707
+
"sourceName": "",
5684
5708
"set": "fooSet",
5685
5709
"useOaiIdentifiersAsPids": false
5686
5710
"schedule": "none",
@@ -5694,16 +5718,12 @@ Shows a Harvesting Client with a defined nickname::
5694
5718
}
5695
5719
5696
5720
5721
+
.. _create-a-harvesting-client:
5722
+
5697
5723
Create a Harvesting Client
5698
5724
~~~~~~~~~~~~~~~~~~~~~~~~~~
5699
-
5700
-
To create a new harvesting client::
5701
-
5702
-
POST http://$SERVER/api/harvest/clients/$nickname
5703
-
5704
-
``nickName`` is the name identifying the new client. It should be alpha-numeric and may also contain -, _, or %, but no spaces. Must also be unique in the installation.
5705
5725
5706
-
You must supply a JSON file that describes the configuration, similarly to the output of the GET API above. The following fields are mandatory:
5726
+
To create a harvesting client you must supply a JSON file that describes the configuration, similarly to the output of the GET API above. The following fields are mandatory:
5707
5727
5708
5728
- dataverseAlias: The alias of an existing collection where harvested datasets will be deposited
5709
5729
- harvestUrl: The URL of the remote OAI archive
@@ -5712,6 +5732,7 @@ You must supply a JSON file that describes the configuration, similarly to the o
5712
5732
5713
5733
The following optional fields are supported:
5714
5734
5735
+
- sourceName: When ``index-harvested-metadata-source`` is enabled (see :ref:`feature-flags`), sourceName will override the nickname in the Metadata Source facet. It can be used to group the content from many harvesting clients under the same name.
5715
5736
- archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
5716
5737
- set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything".
5717
5738
- style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).
@@ -5720,38 +5741,35 @@ The following optional fields are supported:
5720
5741
- useOaiIdentifiersAsPids: Defaults to false; if set to true, the harvester will attempt to use the identifier from the OAI-PMH record header as the **first choice** for the persistent id of the harvested dataset. When set to false, Dataverse will still attempt to use this identifier, but only if none of the `<dc:identifier>` entries in the OAI_DC record contain a valid persistent id (this is new as of v6.5).
5721
5742
5722
5743
Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored. For example, as of writing this there is no way to configure a harvesting schedule via this API.
5723
-
5724
-
An example JSON file would look like this::
5725
5744
5726
-
{
5727
-
"nickName": "zenodo",
5728
-
"dataverseAlias": "zenodoHarvested",
5729
-
"harvestUrl": "https://zenodo.org/oai2d",
5730
-
"archiveUrl": "https://zenodo.org",
5731
-
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
5732
-
"metadataFormat": "oai_dc",
5733
-
"customHeaders": "x-oai-api-key: xxxyyyzzz",
5734
-
"set": "user-lmops",
5735
-
"allowHarvestingMissingCVV":true
5736
-
}
5745
+
You can download this :download:`harvesting-client.json <../_static/api/harvesting-client.json>` file to use as a starting point.
5737
5746
5738
-
Something important to keep in mind about this API is that, unlike the harvesting clients GUI, it will create a client with the values supplied without making any attempts to validate them in real time. In other words, for the `harvestUrl` it will accept anything that looks like a well-formed url, without making any OAI calls to verify that the name of the set and/or the metadata format entered are supported by it. This is by design, to give an admin an option to still be able to create a client, in a rare case when it cannot be done via the GUI because of some real time failures in an exchange with an otherwise valid OAI server. This however puts the responsibility on the admin to supply the values already confirmed to be valid.
Something important to keep in mind about this API is that, unlike the harvesting clients GUI, it will create a client with the values supplied without making any attempts to validate them in real time. In other words, for the `harvestUrl` it will accept anything that looks like a well-formed url, without making any OAI calls to verify that the name of the set and/or the metadata format entered are supported by it. This is by design, to give an admin an option to still be able to create a client, in a rare case when it cannot be done via the GUI because of some real time failures in an exchange with an otherwise valid OAI server. This however puts the responsibility on the admin to supply the values already confirmed to be valid.
5740
5750
5741
5751
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.
5742
5752
5753
+
5754
+
``nickName`` in the JSON file and ``$NICKNAME`` in the URL path below is the name identifying the new client. It should be alpha-numeric and may also contain -, _, or %, but no spaces. It must be unique in the installation.
Copy file name to clipboardexpand all lines: doc/sphinx-guides/source/installation/config.rst
+3
Original file line number
Diff line number
Diff line change
@@ -3493,6 +3493,9 @@ please find all known feature flags below. Any of these flags can be activated u
3493
3493
* - globus-use-experimental-async-framework
3494
3494
- Activates a new experimental implementation of Globus polling of ongoing remote data transfers that does not rely on the instance staying up continuously for the duration of the transfers and saves the state information about Globus upload requests in the database. Added in v6.4. Affects :ref:`:GlobusPollingInterval`. Note that the JVM option :ref:`dataverse.files.globus-monitoring-server` described above must also be enabled on one (and only one, in a multi-node installation) Dataverse instance.
3495
3495
- ``Off``
3496
+
* - index-harvested-metadata-source
3497
+
- Index the nickname or the source name (See the optional ``sourceName`` field in :ref:`create-a-harvesting-client`) of the harvesting client as the "metadata source" of harvested datasets and files. If enabled, the Metadata Source facet will show separate groupings of the content harvested from different sources (by harvesting client nickname or source name) instead of the default behavior where there is one "Harvested" grouping for all harvested content.
3498
+
- ``Off``
3496
3499
3497
3500
**Note:** Feature flags can be set via any `supported MicroProfile Config API source`_, e.g. the environment variable
3498
3501
``DATAVERSE_FEATURE_XXX`` (e.g. ``DATAVERSE_FEATURE_API_SESSION_AUTH=1``). These environment variables can be set in your shell before starting Payara. If you are using :doc:`Docker for development </container/dev-usage>`, you can set them in the `docker compose <https://docs.docker.com/compose/environment-variables/set-environment-variables/>`_ file.
0 commit comments