Skip to content

Commit

Permalink
Merge branch 'develop' of github.com:IQSS/dataverse into 10001-datase…
Browse files Browse the repository at this point in the history
…t-api-user-permissions
  • Loading branch information
GPortas committed Oct 20, 2023
2 parents fa1b37b + ab231ff commit 5cd6679
Show file tree
Hide file tree
Showing 22 changed files with 632 additions and 408 deletions.
8 changes: 8 additions & 0 deletions doc/release-notes/9763-versions-api-improvements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Improvements in the /versions API

- optional pagination has been added to `/api/datasets/{id}/versions` that may be useful in datasets with a large number of versions;
- a new flag `includeFiles` is added to both `/api/datasets/{id}/versions` and `/api/datasets/{id}/versions/{vid}` (true by default), providing an option to drop the file information from the output;
- when files are requested to be included, some database lookup optimizations have been added to improve the performance on datasets with large numbers of files.

This is reflected in the [Dataset Versions API](https://guides.dataverse.org/en/9763-lookup-optimizations/api/native-api.html#dataset-versions-api) section of the Guide.

Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Extended the getDownloadSize endpoint (/api/datasets/{id}/versions/{versionId}/downloadsize), including the following new features:

- The endpoint now accepts a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned dataset versions when searching for versions to obtain the file total download size.


- The endpoint now supports filtering by criteria. In particular, it accepts the following optional criteria query parameters:

- contentType
- accessStatus
- categoryName
- tabularTagName
- searchText
72 changes: 67 additions & 5 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -889,6 +889,10 @@ It returns a list of versions with their metadata, and file list:
]
}
The optional ``includeFiles`` parameter specifies whether the files should be listed in the output. It defaults to ``true``, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or files having the files included can dramatically increase the volume of the output). A separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.

The optional ``offset`` and ``limit`` parameters can be used to specify the range of the versions list to be shown. This can be used to paginate through the list in a dataset with a large number of versions.


Get Version of a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -901,13 +905,16 @@ Get Version of a Dataset
export ID=24
export VERSION=1.0
curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION"
curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION?includeFiles=false"
The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0"
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0?includeFiles=false"
The optional ``includeFiles`` parameter specifies whether the files should be listed in the output (defaults to ``true``). Note that a separate ``/files`` API can be used for listing the files, or a subset thereof in a given version.


By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.

Expand Down Expand Up @@ -974,7 +981,7 @@ The fully expanded example above (without environment variables) looks like this
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files"
This endpoint supports optional pagination, through the ``limit`` and ``offset`` query params:
This endpoint supports optional pagination, through the ``limit`` and ``offset`` query parameters:

.. code-block:: bash
Expand Down Expand Up @@ -1054,7 +1061,7 @@ Usage example:
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files?includeDeaccessioned=true"
.. note:: Keep in mind that you can combine all of the above query params depending on the results you are looking for.
.. note:: Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.

Get File Counts in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1142,7 +1149,7 @@ Usage example:
Please note that filtering values are case sensitive and must be correctly typed for the endpoint to recognize them.

Keep in mind that you can combine all of the above query params depending on the results you are looking for.
Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.

View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -1898,6 +1905,61 @@ Usage example:
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?mode=Archival"
Category name filtering is also optionally supported. To return the size of all files available for download matching the requested category name.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?categoryName=Data"
Tabular tag name filtering is also optionally supported. To return the size of all files available for download for which the requested tabular tag has been added.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?tabularTagName=Survey"
Content type filtering is also optionally supported. To return the size of all files available for download matching the requested content type.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?contentType=image/png"
Filtering by search text is also optionally supported. The search will be applied to the labels and descriptions of the dataset files, to return the size of all files available for download that contain the text searched in one of such fields.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?searchText=word"
File access filtering is also optionally supported. In particular, by the following possible values:

* ``Public``
* ``Restricted``
* ``EmbargoedThenRestricted``
* ``EmbargoedThenPublic``

If no filter is specified, the files will match all of the above categories.

Please note that filtering query parameters are case sensitive and must be correctly typed for the endpoint to recognize them.

By default, deaccessioned dataset versions are not included in the search when applying the :latest or :latest-published identifiers. Additionally, when filtering by a specific version tag, you will get a "not found" error if the version is deaccessioned and you do not enable the ``includeDeaccessioned`` option described below.

If you want to include deaccessioned dataset versions, you must set ``includeDeaccessioned`` query parameter to ``true``.

Usage example:

.. code-block:: bash
curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/downloadsize?includeDeaccessioned=true"
.. note:: Keep in mind that you can combine all of the above query parameters depending on the results you are looking for.

Submit a Dataset for Review
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
33 changes: 20 additions & 13 deletions src/main/java/edu/harvard/iq/dataverse/Dataset.java
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,23 @@ public void setCitationDateDatasetFieldType(DatasetFieldType citationDateDataset
this.citationDateDatasetFieldType = citationDateDatasetFieldType;
}

// Per DataCite best practices, the citation date of a dataset may need
// to be adjusted to reflect the latest embargo availability date of any
// file within the first published version.
// If any files are embargoed in the first version, this date will be
// calculated and cached here upon its publication, in the
// FinalizeDatasetPublicationCommand.
private Timestamp embargoCitationDate;

public Timestamp getEmbargoCitationDate() {
return embargoCitationDate;
}

public void setEmbargoCitationDate(Timestamp embargoCitationDate) {
this.embargoCitationDate = embargoCitationDate;
}



@ManyToOne
@JoinColumn(name="template_id",nullable = true)
Expand Down Expand Up @@ -676,20 +693,10 @@ public Timestamp getCitationDate() {
Timestamp citationDate = null;
//Only calculate if this dataset doesn't use an alternate date field for publication date
if (citationDateDatasetFieldType == null) {
List<DatasetVersion> versions = this.versions;
// TODo - is this ever not version 1.0 (or draft if not published yet)
DatasetVersion oldest = versions.get(versions.size() - 1);
citationDate = super.getPublicationDate();
if (oldest.isPublished()) {
List<FileMetadata> fms = oldest.getFileMetadatas();
for (FileMetadata fm : fms) {
Embargo embargo = fm.getDataFile().getEmbargo();
if (embargo != null) {
Timestamp embDate = Timestamp.valueOf(embargo.getDateAvailable().atStartOfDay());
if (citationDate.compareTo(embDate) < 0) {
citationDate = embDate;
}
}
if (embargoCitationDate != null) {
if (citationDate.compareTo(embargoCitationDate) < 0) {
return embargoCitationDate;
}
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ public Dataset findDeep(Object pk) {
.setHint("eclipselink.left-join-fetch", "o.files.roleAssignments")
.getSingleResult();
}

public List<Dataset> findByOwnerId(Long ownerId) {
return findByOwnerId(ownerId, false);
}
Expand Down
8 changes: 7 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/DatasetVersion.java
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,13 @@
query = "SELECT OBJECT(o) FROM DatasetVersion AS o WHERE o.dataset.harvestedFrom IS NULL and o.releaseTime IS NOT NULL and o.archivalCopyLocation IS NULL"
),
@NamedQuery(name = "DatasetVersion.findById",
query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id")})
query = "SELECT o FROM DatasetVersion o LEFT JOIN FETCH o.fileMetadatas WHERE o.id=:id"),
@NamedQuery(name = "DatasetVersion.findByDataset",
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC"),
@NamedQuery(name = "DatasetVersion.findReleasedByDataset",
query = "SELECT o FROM DatasetVersion o WHERE o.dataset.id=:datasetId AND o.versionState=edu.harvard.iq.dataverse.DatasetVersion.VersionState.RELEASED ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")/*,
@NamedQuery(name = "DatasetVersion.findVersionElements",
query = "SELECT o.id, o.versionState, o.versionNumber, o.minorVersionNumber FROM DatasetVersion o WHERE o.dataset.id=:datasetId ORDER BY o.versionNumber DESC, o.minorVersionNumber DESC")*/})


@Entity
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ public enum FileOrderCriteria {
}

/**
* Mode to base the search in {@link DatasetVersionFilesServiceBean#getFilesDownloadSize(DatasetVersion, FileDownloadSizeMode)}
* Mode to base the search in {@link DatasetVersionFilesServiceBean#getFilesDownloadSize(DatasetVersion, FileSearchCriteria, FileDownloadSizeMode)}
* <p>
* All: Includes both archival and original sizes for tabular files
* Archival: Includes only the archival size for tabular files
Expand Down Expand Up @@ -191,16 +191,17 @@ public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Intege
* Returns the total download size of all files for a particular DatasetVersion
*
* @param datasetVersion the DatasetVersion to access
* @param searchCriteria for retrieving only files matching this criteria
* @param mode a FileDownloadSizeMode to base the search on
* @return long value of total file download size
*/
public long getFilesDownloadSize(DatasetVersion datasetVersion, FileDownloadSizeMode mode) {
public long getFilesDownloadSize(DatasetVersion datasetVersion, FileSearchCriteria searchCriteria, FileDownloadSizeMode mode) {
return switch (mode) {
case All ->
Long.sum(getOriginalTabularFilesSize(datasetVersion), getArchivalFilesSize(datasetVersion, false));
Long.sum(getOriginalTabularFilesSize(datasetVersion, searchCriteria), getArchivalFilesSize(datasetVersion, false, searchCriteria));
case Original ->
Long.sum(getOriginalTabularFilesSize(datasetVersion), getArchivalFilesSize(datasetVersion, true));
case Archival -> getArchivalFilesSize(datasetVersion, false);
Long.sum(getOriginalTabularFilesSize(datasetVersion, searchCriteria), getArchivalFilesSize(datasetVersion, true, searchCriteria));
case Archival -> getArchivalFilesSize(datasetVersion, false, searchCriteria);
};
}

Expand Down Expand Up @@ -301,22 +302,24 @@ private void applyOrderCriteriaToGetFileMetadatasQuery(JPAQuery<FileMetadata> qu
}
}

private long getOriginalTabularFilesSize(DatasetVersion datasetVersion) {
private long getOriginalTabularFilesSize(DatasetVersion datasetVersion, FileSearchCriteria searchCriteria) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
Long result = queryFactory
JPAQuery<?> baseQuery = queryFactory
.from(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()))
.from(dataTable)
.where(dataTable.dataFile.eq(fileMetadata.dataFile))
.select(dataTable.originalFileSize.sum()).fetchFirst();
.where(dataTable.dataFile.eq(fileMetadata.dataFile));
applyFileSearchCriteriaToQuery(baseQuery, searchCriteria);
Long result = baseQuery.select(dataTable.originalFileSize.sum()).fetchFirst();
return (result == null) ? 0 : result;
}

private long getArchivalFilesSize(DatasetVersion datasetVersion, boolean ignoreTabular) {
private long getArchivalFilesSize(DatasetVersion datasetVersion, boolean ignoreTabular, FileSearchCriteria searchCriteria) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
JPAQuery<?> baseQuery = queryFactory
.from(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()));
applyFileSearchCriteriaToQuery(baseQuery, searchCriteria);
Long result;
if (ignoreTabular) {
result = baseQuery.where(fileMetadata.dataFile.dataTables.isEmpty()).select(fileMetadata.dataFile.filesize.sum()).fetchFirst();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,9 +166,44 @@ public DatasetVersion findDeep(Object pk) {
.setHint("eclipselink.left-join-fetch", "o.fileMetadatas.datasetVersion")
.setHint("eclipselink.left-join-fetch", "o.fileMetadatas.dataFile.releaseUser")
.setHint("eclipselink.left-join-fetch", "o.fileMetadatas.dataFile.creator")
.setHint("eclipselink.left-join-fetch", "o.fileMetadatas.dataFile.dataFileTags")
.getSingleResult();
}


/**
* Performs the same database lookup as the one behind Dataset.getVersions().
* Additionally, provides the arguments for selecting a partial list of
* (length-offset) versions for pagination, plus the ability to pre-select
* only the publicly-viewable versions.
* It is recommended that individual software components utilize the
* ListVersionsCommand, instead of calling this service method directly.
* @param datasetId
* @param offset for pagination through long lists of versions
* @param length for pagination through long lists of versions
* @param includeUnpublished retrieves all the versions, including drafts and deaccessioned.
* @return (partial) list of versions
*/
public List<DatasetVersion> findVersions(Long datasetId, Integer offset, Integer length, boolean includeUnpublished) {
TypedQuery<DatasetVersion> query;
if (includeUnpublished) {
query = em.createNamedQuery("DatasetVersion.findByDataset", DatasetVersion.class);
} else {
query = em.createNamedQuery("DatasetVersion.findReleasedByDataset", DatasetVersion.class)
.setParameter("datasetId", datasetId);
}

query.setParameter("datasetId", datasetId);

if (offset != null) {
query.setFirstResult(offset);
}
if (length != null) {
query.setMaxResults(length);
}

return query.getResultList();
}

public DatasetVersion findByFriendlyVersionNumber(Long datasetId, String friendlyVersionNumber) {
Long majorVersionNumber = null;
Long minorVersionNumber = null;
Expand Down
Loading

0 comments on commit 5cd6679

Please sign in to comment.