Skip to content

Commit

Permalink
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
Browse files Browse the repository at this point in the history
…averse into 9834-files-api-extension-file-counts
  • Loading branch information
GPortas committed Sep 21, 2023
2 parents 19f129e + 7b8d5ad commit a5b605e
Show file tree
Hide file tree
Showing 24 changed files with 299 additions and 1,480 deletions.
5 changes: 5 additions & 0 deletions doc/release-notes/9880-info-api-zip-limit-embargo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Implemented the following new endpoints:

- getZipDownloadLimit (/api/info/zipDownloadLimit): Get the configured zip file download limit. The response contains the long value of the limit in bytes.

- getMaxEmbargoDurationInMonths (/api/info/settings/:MaxEmbargoDurationInMonths): Get the maximum embargo duration in months, if available, configured through the database setting :MaxEmbargoDurationInMonths.
2 changes: 2 additions & 0 deletions doc/sphinx-guides/source/api/client-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ There are multiple Python modules for interacting with Dataverse APIs.

`pyDataverse <https://github.com/gdcc/pyDataverse>`_ primarily allows developers to manage Dataverse collections, datasets and datafiles. Its intention is to help with data migrations and DevOps activities such as testing and configuration management. The module is developed by `Stefan Kasberger <http://stefankasberger.at>`_ from `AUSSDA - The Austrian Social Science Data Archive <https://aussda.at>`_.

`UBC's Dataverse Utilities <https://ubc-library-rc.github.io/dataverse_utils/>`_ are a set of Python console utilities which allow one to upload datasets from a tab-separated-value spreadsheet, bulk release multiple datasets, bulk delete unpublished datasets, quickly duplicate records. replace licenses, and more. For additional information see their `PyPi page <https://pypi.org/project/dataverse-utils/>`_.

`dataverse-client-python <https://github.com/IQSS/dataverse-client-python>`_ had its initial release in 2015. `Robert Liebowitz <https://github.com/rliebz>`_ created this library while at the `Center for Open Science (COS) <https://centerforopenscience.org>`_ and the COS uses it to integrate the `Open Science Framework (OSF) <https://osf.io>`_ with Dataverse installations via an add-on which itself is open source and listed on the :doc:`/api/apps` page.

`Pooch <https://github.com/fatiando/pooch>`_ is a Python library that allows library and application developers to download data. Among other features, it takes care of various protocols, caching in OS-specific locations, checksum verification and adds optional features like progress bars or log messages. Among other popular repositories, Pooch supports Dataverse in the sense that you can reference Dataverse-hosted datasets by just a DOI and Pooch will determine the data repository type, query the Dataverse API for contained files and checksums, giving you an easy interface to download them.
Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/api/dataaccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Basic access URI:

``/api/access/datafile/$id``

.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``. However, this file access method is only effective when the FilePIDsEnabled option is enabled, which can be authorized by the admin. For further information, refer to :ref:`:FilePIDsEnabled`.

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* ::

Expand Down
41 changes: 41 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3536,6 +3536,8 @@ Show Support Of Incomplete Metadata Deposition
Learn if an instance has been configured to allow deposition of incomplete datasets via the API.
See also :ref:`create-dataset-command` and :ref:`dataverse.api.allow-incomplete-metadata`

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
Expand All @@ -3548,6 +3550,45 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/info/settings/incompleteMetadataViaApi"
Get Zip File Download Limit
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get the configured zip file download limit. The response contains the long value of the limit in bytes.

This limit comes from the database setting :ref:`:ZipDownloadLimit` if set, or the default value if the database setting is not set, which is 104857600 (100MB).

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/info/zipDownloadLimit"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl "https://demo.dataverse.org/api/info/zipDownloadLimit"
Get Maximum Embargo Duration In Months
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get the maximum embargo duration in months, if available, configured through the database setting :ref:`:MaxEmbargoDurationInMonths` from the Configuration section of the Installation Guide.

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash
export SERVER_URL=https://demo.dataverse.org
curl "$SERVER_URL/api/info/settings/:MaxEmbargoDurationInMonths"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl "https://demo.dataverse.org/api/info/settings/:MaxEmbargoDurationInMonths"
.. _metadata-blocks-api:

Expand Down
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/container/configbaker-image.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ Maven modules packaging target with activated "container" profile from the proje

If you specifically want to build a config baker image *only*, try

``mvn -Pct package -Ddocker.filter=dev_bootstrap``
``mvn -Pct docker:build -Ddocker.filter=dev_bootstrap``

The build of config baker involves copying Solr configset files. The Solr version used is inherited from Maven,
acting as the single source of truth. Also, the tag of the image should correspond the application image, as
Expand Down
63 changes: 63 additions & 0 deletions doc/sphinx-guides/source/developers/api-design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
==========
API Design
==========

API design is a large topic. We expect this page to grow over time.

.. contents:: |toctitle|
:local:

Paths
-----

A reminder `from Wikipedia <https://en.wikipedia.org/wiki/Uniform_Resource_Identifier>`_ of what a path is:

.. code-block:: bash
userinfo host port
┌──┴───┐ ┌──────┴──────┐ ┌┴┐
https://[email protected]:123/forum/questions/?tag=networking&order=newest#top
└─┬─┘ └─────────────┬────────────┘└───────┬───────┘ └────────────┬────────────┘ └┬┘
scheme authority path query fragment
Exposing Settings
~~~~~~~~~~~~~~~~~

Since Dataverse 4, database settings have been exposed via API at http://localhost:8080/api/admin/settings

(JVM options are probably available via the Payara REST API, but this is out of scope.)

Settings need to be exposed outside to API clients outside of ``/api/admin`` (which is typically restricted to localhost). Here are some guidelines to follow when exposing settings.

- When you are exposing a database setting as-is:

- Use ``/api/info/settings`` as the root path.

- Append the name of the setting including the colon (e.g. ``:DatasetPublishPopupCustomText``)

- Final path example: ``/api/info/settings/:DatasetPublishPopupCustomText``

- If the absence of the database setting is filled in by a default value (e.g. ``:ZipDownloadLimit`` or ``:ApiTermsOfUse``):

- Use ``/api/info`` as the root path.

- Append the setting but remove the colon and downcase the first character (e.g. ``zipDownloadLimit``)

- Final path example: ``/api/info/zipDownloadLimit``

- If the database setting you're exposing make more sense outside of ``/api/info`` because there's more context (e.g. ``:CustomDatasetSummaryFields``):

- Feel free to use a path outside of ``/api/info`` as the root path.

- Given additional context, append a shortened name (e.g. ``/api/datasets/summaryFieldNames``).

- Final path example: ``/api/datasets/summaryFieldNames``

- If you need to expose a JVM option (MicroProfile setting) such as ``dataverse.api.allow-incomplete-metadata``:

- Use ``/api/info`` as the root path.

- Append a meaningful name for the setting (e.g. ``incompleteMetadataViaApi``).

- Final path example: ``/api/info/incompleteMetadataViaApi``

1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Developer Guide
sql-upgrade-scripts
testing
documentation
api-design
security
dependencies
debugging
Expand Down
14 changes: 14 additions & 0 deletions doc/sphinx-guides/source/developers/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,20 @@ If ``dataverse.siteUrl`` is absent, you can add it with:

``./asadmin create-jvm-options "-Ddataverse.siteUrl=http\://localhost\:8080"``

dataverse.oai.server.maxidentifiers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The OAI Harvesting tests require that the paging limit for ListIdentifiers must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:

``./asadmin create-jvm-options "-Ddataverse.oai.server.maxidentifiers=2"``

dataverse.oai.server.maxrecords
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The OAI Harvesting tests require that the paging limit for ListRecords must be set to 2, in order to be able to trigger this paging behavior without having to create and export too many datasets:

``./asadmin create-jvm-options "-Ddataverse.oai.server.maxrecords=2"``

Identifier Generation
^^^^^^^^^^^^^^^^^^^^^

Expand Down
4 changes: 4 additions & 0 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,8 @@ Here are the configuration options for PermaLinks:
- :ref:`:DataFilePIDFormat <:DataFilePIDFormat>` (optional)
- :ref:`:FilePIDsEnabled <:FilePIDsEnabled>` (optional, defaults to false)

You must restart Payara after making changes to these settings.

.. _auth-modes:

Auth Modes: Local vs. Remote vs. Both
Expand Down Expand Up @@ -2980,6 +2982,8 @@ This setting controls the number of files that can be uploaded through the UI at

``curl -X PUT -d 500 http://localhost:8080/api/admin/settings/:MultipleUploadFilesLimit``

.. _:ZipDownloadLimit:

:ZipDownloadLimit
+++++++++++++++++

Expand Down
32 changes: 25 additions & 7 deletions modules/container-configbaker/scripts/bootstrap.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,17 @@
set -euo pipefail

function usage() {
echo "Usage: $(basename "$0") [-h] [-u instanceUrl] [-t timeout] [<persona>]"
echo "Usage: $(basename "$0") [-h] [-u instanceUrl] [-t timeout] [-e targetEnvFile] [<persona>]"
echo ""
echo "Execute initial configuration (bootstrapping) of an empty Dataverse instance."
echo -n "Known personas: "
find "${BOOTSTRAP_DIR}" -mindepth 1 -maxdepth 1 -type d -exec basename {} \; | paste -sd ' '
echo ""
echo "Parameters:"
echo "instanceUrl - Location on container network where to reach your instance. Default: 'http://dataverse:8080'"
echo " timeout - Provide how long to wait for the instance to become available (using wait4x). Default: '2m'"
echo " persona - Configure persona to execute. Calls ${BOOTSTRAP_DIR}/<persona>/init.sh. Default: 'base'"
echo " instanceUrl - Location on container network where to reach your instance. Default: 'http://dataverse:8080'"
echo " timeout - Provide how long to wait for the instance to become available (using wait4x). Default: '2m'"
echo "targetEnvFile - Path to a file where the bootstrap process can expose information as env vars (e.g. dataverseAdmin's API token)"
echo " persona - Configure persona to execute. Calls ${BOOTSTRAP_DIR}/<persona>/init.sh. Default: 'base'"
echo ""
echo "Note: This script will wait for the Dataverse instance to be available before executing the bootstrapping."
echo " It also checks if already bootstrapped before (availability of metadata blocks) and skip if true."
Expand All @@ -24,13 +25,15 @@ function usage() {

# Set some defaults as documented
DATAVERSE_URL=${DATAVERSE_URL:-"http://dataverse:8080"}
TIMEOUT=${TIMEOUT:-"2m"}
TIMEOUT=${TIMEOUT:-"3m"}
TARGET_ENV_FILE=${TARGET_ENV_FILE:-""}

while getopts "u:t:h" OPTION
while getopts "u:t:e:h" OPTION
do
case "$OPTION" in
u) DATAVERSE_URL="$OPTARG" ;;
t) TIMEOUT="$OPTARG" ;;
e) TARGET_ENV_FILE="$OPTARG" ;;
h) usage;;
\?) usage;;
esac
Expand All @@ -54,6 +57,21 @@ if [[ $BLOCK_COUNT -gt 0 ]]; then
exit 0
fi

# Provide a space to store environment variables output to
ENV_OUT=$(mktemp)
export ENV_OUT

# Now execute the bootstrapping script
echo "Now executing bootstrapping script at ${BOOTSTRAP_DIR}/${PERSONA}/init.sh."
exec "${BOOTSTRAP_DIR}/${PERSONA}/init.sh"
# shellcheck disable=SC1090
source "${BOOTSTRAP_DIR}/${PERSONA}/init.sh"

# If the env file option was given, check if the file is writeable and copy content from the temporary file
if [[ -n "${TARGET_ENV_FILE}" ]]; then
if [[ -f "${TARGET_ENV_FILE}" && -w "${TARGET_ENV_FILE}" ]]; then
cat "${ENV_OUT}" > "${TARGET_ENV_FILE}"
else
echo "File ${TARGET_ENV_FILE} not found, is a directory or not writeable"
exit 2
fi
fi
2 changes: 2 additions & 0 deletions modules/container-configbaker/scripts/bootstrap/dev/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ curl "${DATAVERSE_URL}/api/admin/settings/:DoiProvider" -X PUT -d FAKE

API_TOKEN=$(grep apiToken "/tmp/setup-all.sh.out" | jq ".data.apiToken" | tr -d \")
export API_TOKEN
# ${ENV_OUT} comes from bootstrap.sh and will expose the saved information back to the host if enabled.
echo "API_TOKEN=${API_TOKEN}" >> "${ENV_OUT}"

echo "Publishing root dataverse..."
curl -H "X-Dataverse-key:$API_TOKEN" -X POST "${DATAVERSE_URL}/api/dataverses/:root/actions/:publish"
Expand Down
4 changes: 2 additions & 2 deletions scripts/installer/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,7 @@

# 3e. set permissions:

conn_cmd = "GRANT CREATE PRIVILEGES on DATABASE "+pgDb+" to "+pgUser+";"
conn_cmd = "GRANT ALL PRIVILEGES on DATABASE "+pgDb+" to "+pgUser+";"
try:
cur.execute(conn_cmd)
except:
Expand All @@ -422,7 +422,7 @@
conn.close()

if int(pg_major_version) >= 15:
conn_cmd = "GRANT ALL ON SCHEMA public TO "+pgUser+";"
conn_cmd = "GRANT CREATE ON SCHEMA public TO "+pgUser+";"
print("PostgreSQL 15 or higher detected. Running " + conn_cmd)
try:
cur.execute(conn_cmd)
Expand Down
4 changes: 3 additions & 1 deletion src/main/docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,6 @@ LABEL org.opencontainers.image.created="@git.build.time@" \
org.opencontainers.image.vendor="Global Dataverse Community Consortium" \
org.opencontainers.image.licenses="Apache-2.0" \
org.opencontainers.image.title="Dataverse Application Image" \
org.opencontainers.image.description="This container image provides the research data repository software Dataverse in a box."
org.opencontainers.image.description="This container image provides the research data repository software Dataverse in a box." \
org.dataverse.deps.postgresql.version="@postgresql.server.version@" \
org.dataverse.deps.solr.version="@solr.version@"
Original file line number Diff line number Diff line change
Expand Up @@ -708,6 +708,12 @@ protected Response ok( boolean value ) {
.add("data", value).build() ).build();
}

protected Response ok(long value) {
return Response.ok().entity(Json.createObjectBuilder()
.add("status", ApiConstants.STATUS_OK)
.add("data", value).build()).build();
}

/**
* @param data Payload to return.
* @param mediaType Non-JSON media type.
Expand Down
30 changes: 24 additions & 6 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import edu.harvard.iq.dataverse.TermsOfUseAndAccessValidator;
import edu.harvard.iq.dataverse.UserNotificationServiceBean;
import edu.harvard.iq.dataverse.api.auth.AuthRequired;
import edu.harvard.iq.dataverse.authorization.Permission;
import edu.harvard.iq.dataverse.authorization.users.ApiToken;
import edu.harvard.iq.dataverse.authorization.users.AuthenticatedUser;
import edu.harvard.iq.dataverse.authorization.users.User;
Expand Down Expand Up @@ -77,6 +78,8 @@

import static edu.harvard.iq.dataverse.util.json.JsonPrinter.jsonDT;
import static jakarta.ws.rs.core.Response.Status.BAD_REQUEST;
import static jakarta.ws.rs.core.Response.Status.FORBIDDEN;

import jakarta.ws.rs.core.UriInfo;
import org.glassfish.jersey.media.multipart.FormDataBodyPart;
import org.glassfish.jersey.media.multipart.FormDataContentDisposition;
Expand Down Expand Up @@ -731,6 +734,11 @@ public Response reingest(@Context ContainerRequestContext crc, @PathParam("id")
public Response redetectDatafile(@Context ContainerRequestContext crc, @PathParam("id") String id, @QueryParam("dryRun") boolean dryRun) {
try {
DataFile dataFileIn = findDataFileOrDie(id);
// Ingested Files have mimetype = text/tab-separated-values
// No need to redetect
if (dataFileIn.isTabularData()) {
return error(Response.Status.BAD_REQUEST, "The file is an ingested tabular file.");
}
String originalContentType = dataFileIn.getContentType();
DataFile dataFileOut = execCommand(new RedetectFileTypeCommand(createDataverseRequest(getRequestUser(crc)), dataFileIn, dryRun));
NullSafeJsonBuilder result = NullSafeJsonBuilder.jsonObjectBuilder()
Expand Down Expand Up @@ -838,13 +846,23 @@ public Response getFileDownloadCount(@Context ContainerRequestContext crc, @Path
@AuthRequired
@Path("{id}/dataTables")
public Response getFileDataTables(@Context ContainerRequestContext crc, @PathParam("id") String dataFileId) {
return response(req -> {
DataFile dataFile = execCommand(new GetDataFileCommand(req, findDataFileOrDie(dataFileId)));
if (!dataFile.isTabularData()) {
return error(BAD_REQUEST, "This operation is only available for tabular files.");
DataFile dataFile;
try {
dataFile = findDataFileOrDie(dataFileId);
} catch (WrappedResponse e) {
return error(Response.Status.NOT_FOUND, "File not found for given id.");
}
if (dataFile.isRestricted() || FileUtil.isActivelyEmbargoed(dataFile)) {
DataverseRequest dataverseRequest = createDataverseRequest(getRequestUser(crc));
boolean hasPermissionToDownloadFile = permissionSvc.requestOn(dataverseRequest, dataFile).has(Permission.DownloadFile);
if (!hasPermissionToDownloadFile) {
return error(FORBIDDEN, "Insufficient permissions to access the requested information.");
}
return ok(jsonDT(dataFile.getDataTables()));
}, getRequestUser(crc));
}
if (!dataFile.isTabularData()) {
return error(BAD_REQUEST, "This operation is only available for tabular files.");
}
return ok(jsonDT(dataFile.getDataTables()));
}

@POST
Expand Down
Loading

0 comments on commit a5b605e

Please sign in to comment.