Skip to content

Commit

Permalink
Merge pull request IQSS#9018 from GlobalDataverseCommunityConsortium/…
Browse files Browse the repository at this point in the history
…GDCC/9005-replaceFiles_api_call

GDCC/9005 replace files api call
  • Loading branch information
kcondon authored Jan 10, 2023
2 parents 03afc7f + c22545b commit f63f0e8
Show file tree
Hide file tree
Showing 10 changed files with 518 additions and 208 deletions.
3 changes: 3 additions & 0 deletions doc/release-notes/9005-replaceFiles-api-call
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
9005

Direct upload and out-of-band uploads can now be used to replace multiple files with one API call (complementing the prior ability to add multiple new files)
49 changes: 7 additions & 42 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1511,6 +1511,13 @@ The fully expanded example above (without environment variables) looks like this
curl -H X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -X POST https://demo.dataverse.org/api/datasets/:persistentId/add?persistentId=doi:10.5072/FK2/J8SJZB -F 'jsonData={"description":"A remote image.","storageIdentifier":"trsa://themes/custom/qdr/images/CoreTrustSeal-logo-transparent.png","checksumType":"MD5","md5Hash":"509ef88afa907eaf2c17c1c8d8fde77e","label":"testlogo.png","fileName":"testlogo.png","mimeType":"image/png"}'
Adding Files To a Dataset via Other Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In some circumstances, it may be useful to move or copy files into Dataverse's storage manually or via external tools and then add then to a dataset (i.e. without involving Dataverse in the file transfer itself).
Two API calls are available for this use case to add files to a dataset or to replace files that were already in the dataset.
These calls were developed as part of Dataverse's direct upload mechanism and are detailed in :doc:`/developers/s3-direct-upload-api`.

Report the data (file) size of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2366,48 +2373,6 @@ The fully expanded example above (without environment variables) looks like this
Note: The ``id`` returned in the json response is the id of the file metadata version.



Adding File Metadata
~~~~~~~~~~~~~~~~~~~~

This API call requires a ``jsonString`` expressing the metadata of multiple files. It adds file metadata to the database table where the file has already been copied to the storage.

The jsonData object includes values for:

* "description" - A description of the file
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
* "storageIdentifier" - String
* "fileName" - String
* "mimeType" - String
* "fixity/checksum" either:

* "md5Hash" - String with MD5 hash value, or
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.

A curl example using an ``PERSISTENT_ID``

* ``SERVER_URL`` - e.g. https://demo.dataverse.org
* ``API_TOKEN`` - API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header. For more details, see the :doc:`auth` section.
* ``PERSISTENT_IDENTIFIER`` - Example: ``doi:10.5072/FK2/7U7YBV``

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \
{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]"
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/:persistentId/addFiles?persistentId=doi:10.5072/FK2/7U7YBV -F jsonData='[{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}, {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]'
Updating File Metadata
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
104 changes: 101 additions & 3 deletions doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ To add multiple Uploaded Files to the Dataset
---------------------------------------------

Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter.
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:
jsonData for this call is an array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:

* "description" - A description of the file
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
Expand Down Expand Up @@ -154,7 +154,7 @@ Replacing an existing file in the Dataset
-----------------------------------------

Once the file exists in the s3 bucket, a final API call is needed to register it as a replacement of an existing file. This call is the same call used to replace a file to a Dataverse installation but, rather than sending the file bytes, additional metadata is added using the "jsonData" parameter.
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must also include values for:
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, whether to allow the mimetype to change (forceReplace=true), etc. For direct uploads, the jsonData object must include values for:

* "storageIdentifier" - String, as specified in prior calls
* "fileName" - String
Expand All @@ -172,9 +172,107 @@ Note that the API call does not validate that the file matches the hash value su
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export FILE_IDENTIFIER=5072
export JSON_DATA="{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'forceReplace':'true', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}"
export JSON_DATA='{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "forceReplace":"true", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}'
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/files/$FILE_IDENTIFIER/replace" -F "jsonData=$JSON_DATA"
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

Replacing multiple existing files in the Dataset
------------------------------------------------

Once the replacement files exist in the s3 bucket, a final API call is needed to register them as replacements for existing files. In this API call, additional metadata is added using the "jsonData" parameter.
jsonData for this call is array of objects that normally include information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must include some additional values:

* "fileToReplaceId" - the id of the file being replaced
* "forceReplace" - whether to replace a file with one of a different mimetype (optional, default is false)
* "description" - A description of the file
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
* "storageIdentifier" - String
* "fileName" - String
* "mimeType" - String
* "fixity/checksum" either:

* "md5Hash" - String with MD5 hash value, or
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings


The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA='[{"fileToReplaceId": 10, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}},{"fileToReplaceId": 11, "forceReplace": true, "description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]'
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/replaceFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
The JSON object returned as a response from this API call includes a "data" that indicates how many of the file replacements succeeded and provides per-file error messages for those that don't, e.g.

.. code-block::
{
"status": "OK",
"data": {
"Files": [
{
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42",
"errorMessage": "Bad Request:The file to replace does not belong to this dataset.",
"fileDetails": {
"fileToReplaceId": 10,
"description": "My description.",
"directoryLabel": "data/subdir1",
"categories": [
"Data"
],
"restrict": "false",
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42",
"fileName": "file1.Bin",
"mimeType": "application/octet-stream",
"checksum": {
"@type": "SHA-1",
"@value": "123456"
}
}
},
{
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53",
"successMessage": "Replaced successfully in the dataset",
"fileDetails": {
"description": "My description.",
"label": "file2.txt",
"restricted": false,
"directoryLabel": "data/subdir1",
"categories": [
"Data"
],
"dataFile": {
"persistentId": "",
"pidURL": "",
"filename": "file2.txt",
"contentType": "text/plain",
"filesize": 2407,
"description": "My description.",
"storageIdentifier": "s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53",
"rootDataFileId": 11,
"previousDataFileId": 11,
"checksum": {
"type": "SHA-1",
"value": "123789"
}
}
}
}
],
"Result": {
"Total number of files": 2,
"Number of files successfully replaced": 1
}
}
}
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.
Original file line number Diff line number Diff line change
Expand Up @@ -1544,6 +1544,10 @@ public void finalizeFileDelete(Long dataFileId, String storageLocation) throws I
throw new IOException("Attempted to permanently delete a physical file still associated with an existing DvObject "
+ "(id: " + dataFileId + ", location: " + storageLocation);
}
if(storageLocation == null || storageLocation.isBlank()) {
throw new IOException("Attempted to delete a physical file with no location "
+ "(id: " + dataFileId + ", location: " + storageLocation);
}
StorageIO<DvObject> directStorageAccess = DataAccess.getDirectStorageIO(storageLocation);
directStorageAccess.delete();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -590,8 +590,7 @@ public String init() {
datafileService,
permissionService,
commandEngine,
systemConfig,
licenseServiceBean);
systemConfig);

fileReplacePageHelper = new FileReplacePageHelper(addReplaceFileHelper,
dataset,
Expand Down
77 changes: 73 additions & 4 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -2452,8 +2452,7 @@ public Response addFileToDataset(@PathParam("id") String idSupplied,
fileService,
permissionSvc,
commandEngine,
systemConfig,
licenseSvc);
systemConfig);


//-------------------
Expand Down Expand Up @@ -3388,14 +3387,84 @@ public Response addFilesToDataset(@PathParam("id") String idSupplied,
this.fileService,
this.permissionSvc,
this.commandEngine,
this.systemConfig,
this.licenseSvc
this.systemConfig
);

return addFileHelper.addFiles(jsonData, dataset, authUser);

}

/**
* Replace multiple Files to an existing Dataset
*
* @param idSupplied
* @param jsonData
* @return
*/
@POST
@Path("{id}/replaceFiles")
@Consumes(MediaType.MULTIPART_FORM_DATA)
public Response replaceFilesInDataset(@PathParam("id") String idSupplied,
@FormDataParam("jsonData") String jsonData) {

if (!systemConfig.isHTTPUpload()) {
return error(Response.Status.SERVICE_UNAVAILABLE, BundleUtil.getStringFromBundle("file.api.httpDisabled"));
}

// -------------------------------------
// (1) Get the user from the API key
// -------------------------------------
User authUser;
try {
authUser = findUserOrDie();
} catch (WrappedResponse ex) {
return error(Response.Status.FORBIDDEN, BundleUtil.getStringFromBundle("file.addreplace.error.auth")
);
}

// -------------------------------------
// (2) Get the Dataset Id
// -------------------------------------
Dataset dataset;

try {
dataset = findDatasetOrDie(idSupplied);
} catch (WrappedResponse wr) {
return wr.getResponse();
}

dataset.getLocks().forEach(dl -> {
logger.info(dl.toString());
});

//------------------------------------
// (2a) Make sure dataset does not have package file
// --------------------------------------

for (DatasetVersion dv : dataset.getVersions()) {
if (dv.isHasPackageFile()) {
return error(Response.Status.FORBIDDEN,
BundleUtil.getStringFromBundle("file.api.alreadyHasPackageFile")
);
}
}

DataverseRequest dvRequest = createDataverseRequest(authUser);

AddReplaceFileHelper addFileHelper = new AddReplaceFileHelper(
dvRequest,
this.ingestService,
this.datasetService,
this.fileService,
this.permissionSvc,
this.commandEngine,
this.systemConfig
);

return addFileHelper.replaceFiles(jsonData, dataset, authUser);

}

/**
* API to find curation assignments and statuses
*
Expand Down
Loading

0 comments on commit f63f0e8

Please sign in to comment.