Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset files API extension for file counts #9853

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e9cf041
Stash: getVersionFileCounts endpoint WIP. Pending access type count, …
GPortas Aug 28, 2023
cc5a1bf
Added: setFileCategories API endpoint
GPortas Aug 29, 2023
eadab48
Fixed: getVersionFilesIT test case for category filtering
GPortas Aug 29, 2023
70b9193
Added: getVersionFileCounts count per category test coverage
GPortas Aug 29, 2023
ace6783
Added: getVersionFileCounts count per access status
GPortas Aug 30, 2023
a87136c
Reefactor: new JsonPrinter methods for getVersionFileCounts response
GPortas Aug 30, 2023
e1913b3
Added: docs
GPortas Aug 30, 2023
aa60eae
Added: deleted, tabularData, and fileAccessRequest boolean fields to …
GPortas Sep 8, 2023
312aedd
Stash: userFileAccessRequested endpoint WIP
GPortas Sep 8, 2023
4e7e2ee
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 8, 2023
536885b
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 8, 2023
455cb2c
Fixed: removed deleted field from DataFile payload which causes nulla…
GPortas Sep 8, 2023
55a81be
Refactor: simpler IT testGetUserPermissionsOnFile
GPortas Sep 8, 2023
0248e1e
Added: tests and tweaks for userFileAccessRequested API endpoint
GPortas Sep 9, 2023
d33e8f5
Added: hasBeenDeleted files API endpoint. Pending IT
GPortas Sep 11, 2023
19f129e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 12, 2023
1aa3703
Added: IT for getHasBeenDeleted Files API endpoint
GPortas Sep 12, 2023
c224af6
Added: docs for userFileAccessRequested endpoint
GPortas Sep 12, 2023
578fdc5
Added: docs for hasBeenDeleted endpoint
GPortas Sep 12, 2023
85b9139
Added: release notes for #9851
GPortas Sep 12, 2023
aacbc64
Fixed: curl call examples in files API docs
GPortas Sep 12, 2023
d9b3f54
Fixed: null check for DataFile owner in JsonPrinter
GPortas Sep 12, 2023
a5b605e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 21, 2023
81628ad
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 21, 2023
d4af8cf
Merge pull request #9900 from IQSS/9851-datafile-payload-extension
kcondon Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/release-notes/9834-files-api-extension-counts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Implemented the following new endpoints:

- getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).


- setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Implemented the following new endpoints:

- userFileAccessRequested (/api/access/datafile/{id}/userFileAccessRequested): Returns true or false depending on whether or not the calling user has requested access to a particular file.


- hasBeenDeleted (/api/files/{id}/hasBeenDeleted): Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.


In addition, the DataFile API payload has been extended to include the following fields:

- tabularData: Boolean field to know if the DataFile is of tabular type


- fileAccessRequest: Boolean field to know if the file access requests are enabled on the Dataset (DataFile owner)
12 changes: 12 additions & 0 deletions doc/sphinx-guides/source/api/dataaccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -404,6 +404,18 @@ A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET http://$SERVER/api/access/datafile/{id}/listRequests

User Has Requested Access to a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``/api/access/datafile/{id}/userFileAccessRequested``

This method returns true or false depending on whether or not the calling user has requested access to a particular file.

A curl example using an ``id``::

curl -H "X-Dataverse-key:$API_TOKEN" -X GET "http://$SERVER/api/access/datafile/{id}/userFileAccessRequested"


Get User Permissions on a File:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
120 changes: 116 additions & 4 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1022,6 +1022,32 @@ Please note that both filtering and ordering criteria values are case sensitive

Keep in mind that you can combine all of the above query params depending on the results you are looking for.

Get File Counts in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get file counts, for the given dataset and version.

The returned file counts are based on different criteria:

- Total (The total file count)
- Per content type
- Per category name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export ID=24
export VERSION=1.0

curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION/files/counts"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts"

View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2820,13 +2846,13 @@ A curl example using an ``ID``
export SERVER_URL=https://demo.dataverse.org
export ID=24

curl "$SERVER_URL/api/files/$ID/downloadCount"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/downloadCount"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/files/24/downloadCount"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/downloadCount"

A curl example using a ``PERSISTENT_ID``

Expand All @@ -2836,16 +2862,53 @@ A curl example using a ``PERSISTENT_ID``
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000

curl "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"
curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/downloadCount?persistentId=$PERSISTENT_ID"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/downloadCount?persistentId=doi:10.5072/FK2/AAA000"

If you are interested in download counts for multiple files, see :doc:`/api/metrics`.

File Has Been Deleted
~~~~~~~~~~~~~~~~~~~~~

Know if a particular file that existed in a previous version of the dataset no longer exists in the latest version.

A curl example using an ``ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24

curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/$ID/hasBeenDeleted"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/24/hasBeenDeleted"

A curl example using a ``PERSISTENT_ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000

curl -H "X-Dataverse-key:$API_TOKEN" -X GET "$SERVER_URL/api/files/:persistentId/hasBeenDeleted?persistentId=$PERSISTENT_ID"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X GET "https://demo.dataverse.org/api/files/:persistentId/hasBeenDeleted?persistentId=doi:10.5072/FK2/AAA000"

Updating File Metadata
~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2895,6 +2958,55 @@ Also note that dataFileTags are not versioned and changes to these will update t

.. _EditingVariableMetadata:

Updating File Metadata Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Updates the categories for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the category names.

Although updating categories can also be done with the previous endpoint, this has been created to be more practical when it is only necessary to update categories and not other metadata fields.

A curl example using an ``ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24

curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/$ID/metadata/categories"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"http://demo.dataverse.org/api/files/24/metadata/categories"

A curl example using a ``PERSISTENT_ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000

curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/:persistentId/metadata/categories?persistentId=$PERSISTENT_ID"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"https://demo.dataverse.org/api/files/:persistentId/metadata/categories?persistentId=doi:10.5072/FK2/AAA000"

Note that if the specified categories do not exist, they will be created.

Editing Variable Level Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import edu.harvard.iq.dataverse.QEmbargo;
import edu.harvard.iq.dataverse.QFileMetadata;

import com.querydsl.core.Tuple;
import com.querydsl.core.types.dsl.BooleanExpression;
import com.querydsl.core.types.dsl.CaseBuilder;
import com.querydsl.core.types.dsl.DateExpression;
Expand All @@ -21,7 +22,9 @@
import java.io.Serializable;
import java.sql.Timestamp;
import java.time.LocalDate;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Stateless
@Named
Expand All @@ -48,6 +51,72 @@ public enum DataFileAccessStatus {
Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic
}

/**
* Given a DatasetVersion, returns its total file metadata count
*
* @param datasetVersion the DatasetVersion to access
* @return long value of total file metadata count
*/
public long getFileMetadataCount(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
return queryFactory.selectFrom(fileMetadata).where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId())).stream().count();
}

/**
* Given a DatasetVersion, returns its file metadata count per content type
*
* @param datasetVersion the DatasetVersion to access
* @return Map<String, Long> of file metadata counts per content type
*/
public Map<String, Long> getFileMetadataCountPerContentType(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
List<Tuple> contentTypeOccurrences = queryFactory
.select(fileMetadata.dataFile.contentType, fileMetadata.count())
.from(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()))
.groupBy(fileMetadata.dataFile.contentType).fetch();
Map<String, Long> result = new HashMap<>();
for (Tuple occurrence : contentTypeOccurrences) {
result.put(occurrence.get(fileMetadata.dataFile.contentType), occurrence.get(fileMetadata.count()));
}
return result;
}

/**
* Given a DatasetVersion, returns its file metadata count per category name
*
* @param datasetVersion the DatasetVersion to access
* @return Map<String, Long> of file metadata counts per category name
*/
public Map<String, Long> getFileMetadataCountPerCategoryName(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
List<Tuple> categoryNameOccurrences = queryFactory
.select(dataFileCategory.name, fileMetadata.count())
.from(dataFileCategory, fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()).and(fileMetadata.fileCategories.contains(dataFileCategory)))
.groupBy(dataFileCategory.name).fetch();
Map<String, Long> result = new HashMap<>();
for (Tuple occurrence : categoryNameOccurrences) {
result.put(occurrence.get(dataFileCategory.name), occurrence.get(fileMetadata.count()));
}
return result;
}

/**
* Given a DatasetVersion, returns its file metadata count per DataFileAccessStatus
*
* @param datasetVersion the DatasetVersion to access
* @return Map<DataFileAccessStatus, Long> of file metadata counts per DataFileAccessStatus
*/
public Map<DataFileAccessStatus, Long> getFileMetadataCountPerAccessStatus(DatasetVersion datasetVersion) {
Map<DataFileAccessStatus, Long> allCounts = new HashMap<>();
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.Public);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.Restricted);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.EmbargoedThenPublic);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.EmbargoedThenRestricted);
return allCounts;
}

/**
* Returns a FileMetadata list of files in the specified DatasetVersion
*
Expand All @@ -62,13 +131,13 @@ public enum DataFileAccessStatus {
* @return a FileMetadata list from the specified DatasetVersion
*/
public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Integer limit, Integer offset, String contentType, DataFileAccessStatus accessStatus, String categoryName, String searchText, FileMetadatasOrderCriteria orderCriteria) {
JPAQuery<FileMetadata> baseQuery = createBaseQuery(datasetVersion, orderCriteria);
JPAQuery<FileMetadata> baseQuery = createGetFileMetadatasBaseQuery(datasetVersion, orderCriteria);

if (contentType != null) {
baseQuery.where(fileMetadata.dataFile.contentType.eq(contentType));
}
if (accessStatus != null) {
baseQuery.where(createAccessStatusExpression(accessStatus));
baseQuery.where(createGetFileMetadatasAccessStatusExpression(accessStatus));
}
if (categoryName != null) {
baseQuery.from(dataFileCategory).where(dataFileCategory.name.eq(categoryName).and(fileMetadata.fileCategories.contains(dataFileCategory)));
Expand All @@ -78,7 +147,7 @@ public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Intege
baseQuery.where(fileMetadata.label.lower().contains(searchText).or(fileMetadata.description.lower().contains(searchText)));
}

applyOrderCriteriaToQuery(baseQuery, orderCriteria);
applyOrderCriteriaToGetFileMetadatasQuery(baseQuery, orderCriteria);

if (limit != null) {
baseQuery.limit(limit);
Expand All @@ -90,7 +159,22 @@ public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Intege
return baseQuery.fetch();
}

private JPAQuery<FileMetadata> createBaseQuery(DatasetVersion datasetVersion, FileMetadatasOrderCriteria orderCriteria) {
private void addAccessStatusCountToTotal(DatasetVersion datasetVersion, Map<DataFileAccessStatus, Long> totalCounts, DataFileAccessStatus dataFileAccessStatus) {
long fileMetadataCount = getFileMetadataCountByAccessStatus(datasetVersion, dataFileAccessStatus);
if (fileMetadataCount > 0) {
totalCounts.put(dataFileAccessStatus, fileMetadataCount);
}
}

private long getFileMetadataCountByAccessStatus(DatasetVersion datasetVersion, DataFileAccessStatus accessStatus) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
return queryFactory
.selectFrom(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()).and(createGetFileMetadatasAccessStatusExpression(accessStatus)))
.stream().count();
}

private JPAQuery<FileMetadata> createGetFileMetadatasBaseQuery(DatasetVersion datasetVersion, FileMetadatasOrderCriteria orderCriteria) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
JPAQuery<FileMetadata> baseQuery = queryFactory.selectFrom(fileMetadata).where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()));
if (orderCriteria == FileMetadatasOrderCriteria.Newest || orderCriteria == FileMetadatasOrderCriteria.Oldest) {
Expand All @@ -99,7 +183,7 @@ private JPAQuery<FileMetadata> createBaseQuery(DatasetVersion datasetVersion, Fi
return baseQuery;
}

private BooleanExpression createAccessStatusExpression(DataFileAccessStatus accessStatus) {
private BooleanExpression createGetFileMetadatasAccessStatusExpression(DataFileAccessStatus accessStatus) {
QEmbargo embargo = fileMetadata.dataFile.embargo;
BooleanExpression activelyEmbargoedExpression = embargo.dateAvailable.goe(DateExpression.currentDate(LocalDate.class));
BooleanExpression inactivelyEmbargoedExpression = embargo.isNull();
Expand All @@ -123,7 +207,7 @@ private BooleanExpression createAccessStatusExpression(DataFileAccessStatus acce
return accessStatusExpression;
}

private void applyOrderCriteriaToQuery(JPAQuery<FileMetadata> query, FileMetadatasOrderCriteria orderCriteria) {
private void applyOrderCriteriaToGetFileMetadatasQuery(JPAQuery<FileMetadata> query, FileMetadatasOrderCriteria orderCriteria) {
DateTimeExpression<Timestamp> orderByLifetimeExpression = new CaseBuilder().when(dvObject.publicationDate.isNotNull()).then(dvObject.publicationDate).otherwise(dvObject.createDate);
switch (orderCriteria) {
case NameZA:
Expand Down
Loading
Loading