diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 4aba7881c1f..087cdb6303e 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -81,12 +81,12 @@ with the contents of the file cors.json as follows: Alternatively, you can enable CORS using the AWS S3 web interface, using json-encoded rules as in the example above. -.. _s3-tags: +.. _s3-tags-and-direct-upload: S3 Tags and Direct Upload ~~~~~~~~~~~~~~~~~~~~~~~~~ -Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, look for ``dataverse.files..disable-tagging`` under :ref:`list-of-s3-storage-options` in the Installation Guide. +Since the direct upload mechanism creates the final file rather than an intermediate temporary file, user actions, such as neither saving or canceling an upload session before closing the browser page, can leave an abandoned file in the store. The direct upload mechanism attempts to use S3 tags to aid in identifying/removing such files. Upon upload, files are given a "dv-state":"temp" tag which is removed when the dataset changes are saved and new files are added in the Dataverse installation. Note that not all S3 implementations support tags. Minio, for example, does not. With such stores, direct upload may not work and you might need to disable tagging. For details, see :ref:`s3-tagging` in the Installation Guide. Trusted Remote Storage with the ``remote`` Store Type ----------------------------------------------------- diff --git a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst index 0040c1fd3f0..33b8e434e6e 100644 --- a/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst +++ b/doc/sphinx-guides/source/developers/s3-direct-upload-api.rst @@ -79,6 +79,12 @@ In the single part case, only one call to the supplied URL is required: curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T "" +Or, if you have disabled S3 tagging (see :ref:`s3-tagging`), you should omit the header like this: + +.. code-block:: bash + + curl -i -X PUT -T "" + Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response. In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a slice of the total file, with the last part containing the remaining bytes. diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index ae27a9727da..75ae760aa4a 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -1189,13 +1189,23 @@ Larger installations may want to increase the number of open S3 connections allo ``./asadmin create-jvm-options "-Ddataverse.files..connection-pool-size=4096"`` -By default, when direct upload to an S3 store is configured, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup in case the file is not added to the dataset after upload (e.g., if the user cancels the operation). (See :ref:`s3-tags`.) +.. _s3-tagging: + +S3 Tagging +########## + +By default, when direct upload to an S3 store is configured, Dataverse will place a ``temp`` tag on the file being uploaded for an easier cleanup in case the file is not added to the dataset after upload (e.g., if the user cancels the operation). (See :ref:`s3-tags-and-direct-upload`.) If your S3 store does not support tagging and gives an error when direct upload is configured, you can disable the tagging by using the ``dataverse.files..disable-tagging`` JVM option. For example: ``./asadmin create-jvm-options "-Ddataverse.files..disable-tagging=true"`` Disabling the ``temp`` tag makes it harder to identify abandoned files that are not used by your Dataverse instance (i.e. one cannot search for the ``temp`` tag in a delete script). These should still be removed to avoid wasting storage space. To clean up these files and any other leftover files, regardless of whether the ``temp`` tag is applied, you can use the :ref:`cleanup-storage-api` API endpoint. +Note that if you disable tagging, you should should omit the ``x-amz-tagging:dv-state=temp`` header when using the :doc:`/developers/s3-direct-upload-api`, as noted in that section. + +Finalizing S3 Configuration +########################### + In case you would like to configure Dataverse to use a custom S3 service instead of Amazon S3 services, please add the options for the custom URL and region as documented below. Please read above if your desired combination has been tested already and what other options have been set for a successful integration. diff --git a/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java b/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java index 03f41fc409d..ccbbe8bd619 100644 --- a/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java +++ b/src/test/java/edu/harvard/iq/dataverse/api/UtilIT.java @@ -2512,6 +2512,21 @@ static Response getUploadUrls(String idOrPersistentIdOfDataset, long sizeInBytes return requestSpecification.get("/api/datasets/" + idInPath + "/uploadurls?size=" + sizeInBytes + optionalQueryParam); } + /** + * If you set dataverse.files.localstack1.disable-tagging=true you will see + * an error like below. + * + * To avoid it, don't send the x-amz-tagging header. + */ + /* + + AccessDenied + There were headers present in the request which were not signed + 25ff2bb0-13c7-420e-8ae6-3d92677e4bd9 + 9Gjjt1m+cjU4OPvX9O9/8RuvnG41MRb/18Oux2o5H5MY7ISNTlXN+Dz9IG62/ILVxhAGI0qyPfg= + x-amz-tagging + + */ static Response uploadFileDirect(String url, InputStream inputStream) { return given() .header("x-amz-tagging", "dv-state=temp")