Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset files API extension for file counts #9853

Merged
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
e9cf041
Stash: getVersionFileCounts endpoint WIP. Pending access type count, …
GPortas Aug 28, 2023
cc5a1bf
Added: setFileCategories API endpoint
GPortas Aug 29, 2023
eadab48
Fixed: getVersionFilesIT test case for category filtering
GPortas Aug 29, 2023
70b9193
Added: getVersionFileCounts count per category test coverage
GPortas Aug 29, 2023
ace6783
Added: getVersionFileCounts count per access status
GPortas Aug 30, 2023
a87136c
Reefactor: new JsonPrinter methods for getVersionFileCounts response
GPortas Aug 30, 2023
e1913b3
Added: docs
GPortas Aug 30, 2023
aa60eae
Added: deleted, tabularData, and fileAccessRequest boolean fields to …
GPortas Sep 8, 2023
312aedd
Stash: userFileAccessRequested endpoint WIP
GPortas Sep 8, 2023
4e7e2ee
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 8, 2023
536885b
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 8, 2023
455cb2c
Fixed: removed deleted field from DataFile payload which causes nulla…
GPortas Sep 8, 2023
55a81be
Refactor: simpler IT testGetUserPermissionsOnFile
GPortas Sep 8, 2023
0248e1e
Added: tests and tweaks for userFileAccessRequested API endpoint
GPortas Sep 9, 2023
d33e8f5
Added: hasBeenDeleted files API endpoint. Pending IT
GPortas Sep 11, 2023
19f129e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 12, 2023
1aa3703
Added: IT for getHasBeenDeleted Files API endpoint
GPortas Sep 12, 2023
c224af6
Added: docs for userFileAccessRequested endpoint
GPortas Sep 12, 2023
578fdc5
Added: docs for hasBeenDeleted endpoint
GPortas Sep 12, 2023
85b9139
Added: release notes for #9851
GPortas Sep 12, 2023
aacbc64
Fixed: curl call examples in files API docs
GPortas Sep 12, 2023
d9b3f54
Fixed: null check for DataFile owner in JsonPrinter
GPortas Sep 12, 2023
a5b605e
Merge branch '9785-files-api-extension-search' of github.com:IQSS/dat…
GPortas Sep 21, 2023
81628ad
Merge branch '9834-files-api-extension-file-counts' of github.com:IQS…
GPortas Sep 21, 2023
d4af8cf
Merge pull request #9900 from IQSS/9851-datafile-payload-extension
kcondon Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/release-notes/9834-files-api-extension-counts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Implemented the following new endpoints:

- getVersionFileCounts (/api/datasets/{id}/versions/{versionId}/files/counts): Given a dataset and its version, retrieves file counts based on different criteria (Total count, per content type, per access status and per category name).


- setFileCategories (/api/files/{id}/metadata/categories): Updates the categories (by name) for an existing file. If the specified categories do not exist, they will be created.
75 changes: 75 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1022,6 +1022,32 @@ Please note that both filtering and ordering criteria values are case sensitive

Keep in mind that you can combine all of the above query params depending on the results you are looking for.

Get File Counts in a Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Get file counts, for the given dataset and version.

The returned file counts are based on different criteria:

- Total (The total file count)
- Per content type
- Per category name
- Per access status (Possible values: Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic)

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export ID=24
export VERSION=1.0

curl "$SERVER_URL/api/datasets/$ID/versions/$VERSION/files/counts"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl "https://demo.dataverse.org/api/datasets/24/versions/1.0/files/counts"

View Dataset Files and Folders as a Directory Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -2895,6 +2921,55 @@ Also note that dataFileTags are not versioned and changes to these will update t

.. _EditingVariableMetadata:

Updating File Metadata Categories
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Updates the categories for an existing file where ``ID`` is the database id of the file to update or ``PERSISTENT_ID`` is the persistent id (DOI or Handle) of the file. Requires a ``jsonString`` expressing the category names.

Although updating categories can also be done with the previous endpoint, this has been created to be more practical when it is only necessary to update categories and not other metadata fields.

A curl example using an ``ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export ID=24

curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/$ID/metadata/categories"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"http://demo.dataverse.org/api/files/24/metadata/categories"

A curl example using a ``PERSISTENT_ID``

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_ID=doi:10.5072/FK2/AAA000

curl -H "X-Dataverse-key:$API_TOKEN" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"$SERVER_URL/api/files/:persistentId/metadata/categories?persistentId=$PERSISTENT_ID"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST \
-F 'jsonData={"categories":["Category1","Category2"]}' \
"https://demo.dataverse.org/api/files/:persistentId/metadata/categories?persistentId=doi:10.5072/FK2/AAA000"

Note that if the specified categories do not exist, they will be created.

Editing Variable Level Metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import edu.harvard.iq.dataverse.QEmbargo;
import edu.harvard.iq.dataverse.QFileMetadata;

import com.querydsl.core.Tuple;
import com.querydsl.core.types.dsl.BooleanExpression;
import com.querydsl.core.types.dsl.CaseBuilder;
import com.querydsl.core.types.dsl.DateExpression;
Expand All @@ -21,7 +22,9 @@
import java.io.Serializable;
import java.sql.Timestamp;
import java.time.LocalDate;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Stateless
@Named
Expand All @@ -48,6 +51,72 @@ public enum DataFileAccessStatus {
Public, Restricted, EmbargoedThenRestricted, EmbargoedThenPublic
}

/**
* Given a DatasetVersion, returns its total file metadata count
*
* @param datasetVersion the DatasetVersion to access
* @return long value of total file metadata count
*/
public long getFileMetadataCount(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
return queryFactory.selectFrom(fileMetadata).where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId())).stream().count();
}

/**
* Given a DatasetVersion, returns its file metadata count per content type
*
* @param datasetVersion the DatasetVersion to access
* @return Map<String, Long> of file metadata counts per content type
*/
public Map<String, Long> getFileMetadataCountPerContentType(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
List<Tuple> contentTypeOccurrences = queryFactory
.select(fileMetadata.dataFile.contentType, fileMetadata.count())
.from(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()))
.groupBy(fileMetadata.dataFile.contentType).fetch();
Map<String, Long> result = new HashMap<>();
for (Tuple occurrence : contentTypeOccurrences) {
result.put(occurrence.get(fileMetadata.dataFile.contentType), occurrence.get(fileMetadata.count()));
}
return result;
}

/**
* Given a DatasetVersion, returns its file metadata count per category name
*
* @param datasetVersion the DatasetVersion to access
* @return Map<String, Long> of file metadata counts per category name
*/
public Map<String, Long> getFileMetadataCountPerCategoryName(DatasetVersion datasetVersion) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
List<Tuple> categoryNameOccurrences = queryFactory
.select(dataFileCategory.name, fileMetadata.count())
.from(dataFileCategory, fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()).and(fileMetadata.fileCategories.contains(dataFileCategory)))
.groupBy(dataFileCategory.name).fetch();
Map<String, Long> result = new HashMap<>();
for (Tuple occurrence : categoryNameOccurrences) {
result.put(occurrence.get(dataFileCategory.name), occurrence.get(fileMetadata.count()));
}
return result;
}

/**
* Given a DatasetVersion, returns its file metadata count per DataFileAccessStatus
*
* @param datasetVersion the DatasetVersion to access
* @return Map<DataFileAccessStatus, Long> of file metadata counts per DataFileAccessStatus
*/
public Map<DataFileAccessStatus, Long> getFileMetadataCountPerAccessStatus(DatasetVersion datasetVersion) {
Map<DataFileAccessStatus, Long> allCounts = new HashMap<>();
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.Public);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.Restricted);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.EmbargoedThenPublic);
addAccessStatusCountToTotal(datasetVersion, allCounts, DataFileAccessStatus.EmbargoedThenRestricted);
return allCounts;
}

/**
* Returns a FileMetadata list of files in the specified DatasetVersion
*
Expand All @@ -62,13 +131,13 @@ public enum DataFileAccessStatus {
* @return a FileMetadata list from the specified DatasetVersion
*/
public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Integer limit, Integer offset, String contentType, DataFileAccessStatus accessStatus, String categoryName, String searchText, FileMetadatasOrderCriteria orderCriteria) {
JPAQuery<FileMetadata> baseQuery = createBaseQuery(datasetVersion, orderCriteria);
JPAQuery<FileMetadata> baseQuery = createGetFileMetadatasBaseQuery(datasetVersion, orderCriteria);

if (contentType != null) {
baseQuery.where(fileMetadata.dataFile.contentType.eq(contentType));
}
if (accessStatus != null) {
baseQuery.where(createAccessStatusExpression(accessStatus));
baseQuery.where(createGetFileMetadatasAccessStatusExpression(accessStatus));
}
if (categoryName != null) {
baseQuery.from(dataFileCategory).where(dataFileCategory.name.eq(categoryName).and(fileMetadata.fileCategories.contains(dataFileCategory)));
Expand All @@ -78,7 +147,7 @@ public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Intege
baseQuery.where(fileMetadata.label.lower().contains(searchText).or(fileMetadata.description.lower().contains(searchText)));
}

applyOrderCriteriaToQuery(baseQuery, orderCriteria);
applyOrderCriteriaToGetFileMetadatasQuery(baseQuery, orderCriteria);

if (limit != null) {
baseQuery.limit(limit);
Expand All @@ -90,7 +159,22 @@ public List<FileMetadata> getFileMetadatas(DatasetVersion datasetVersion, Intege
return baseQuery.fetch();
}

private JPAQuery<FileMetadata> createBaseQuery(DatasetVersion datasetVersion, FileMetadatasOrderCriteria orderCriteria) {
private void addAccessStatusCountToTotal(DatasetVersion datasetVersion, Map<DataFileAccessStatus, Long> totalCounts, DataFileAccessStatus dataFileAccessStatus) {
long fileMetadataCount = getFileMetadataCountByAccessStatus(datasetVersion, dataFileAccessStatus);
if (fileMetadataCount > 0) {
totalCounts.put(dataFileAccessStatus, fileMetadataCount);
}
}

private long getFileMetadataCountByAccessStatus(DatasetVersion datasetVersion, DataFileAccessStatus accessStatus) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
return queryFactory
.selectFrom(fileMetadata)
.where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()).and(createGetFileMetadatasAccessStatusExpression(accessStatus)))
.stream().count();
}

private JPAQuery<FileMetadata> createGetFileMetadatasBaseQuery(DatasetVersion datasetVersion, FileMetadatasOrderCriteria orderCriteria) {
JPAQueryFactory queryFactory = new JPAQueryFactory(em);
JPAQuery<FileMetadata> baseQuery = queryFactory.selectFrom(fileMetadata).where(fileMetadata.datasetVersion.id.eq(datasetVersion.getId()));
if (orderCriteria == FileMetadatasOrderCriteria.Newest || orderCriteria == FileMetadatasOrderCriteria.Oldest) {
Expand All @@ -99,7 +183,7 @@ private JPAQuery<FileMetadata> createBaseQuery(DatasetVersion datasetVersion, Fi
return baseQuery;
}

private BooleanExpression createAccessStatusExpression(DataFileAccessStatus accessStatus) {
private BooleanExpression createGetFileMetadatasAccessStatusExpression(DataFileAccessStatus accessStatus) {
QEmbargo embargo = fileMetadata.dataFile.embargo;
BooleanExpression activelyEmbargoedExpression = embargo.dateAvailable.goe(DateExpression.currentDate(LocalDate.class));
BooleanExpression inactivelyEmbargoedExpression = embargo.isNull();
Expand All @@ -123,7 +207,7 @@ private BooleanExpression createAccessStatusExpression(DataFileAccessStatus acce
return accessStatusExpression;
}

private void applyOrderCriteriaToQuery(JPAQuery<FileMetadata> query, FileMetadatasOrderCriteria orderCriteria) {
private void applyOrderCriteriaToGetFileMetadatasQuery(JPAQuery<FileMetadata> query, FileMetadatasOrderCriteria orderCriteria) {
DateTimeExpression<Timestamp> orderByLifetimeExpression = new CaseBuilder().when(dvObject.publicationDate.isNotNull()).then(dvObject.publicationDate).otherwise(dvObject.createDate);
switch (orderCriteria) {
case NameZA:
Expand Down
17 changes: 16 additions & 1 deletion src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,22 @@ public Response getVersionFiles(@Context ContainerRequestContext crc,
return ok(jsonFileMetadatas(datasetVersionFilesServiceBean.getFileMetadatas(datasetVersion, limit, offset, contentType, dataFileAccessStatus, categoryName, searchText, fileMetadatasOrderCriteria)));
}, getRequestUser(crc));
}


@GET
@AuthRequired
@Path("{id}/versions/{versionId}/files/counts")
public Response getVersionFileCounts(@Context ContainerRequestContext crc, @PathParam("id") String datasetId, @PathParam("versionId") String versionId, @Context UriInfo uriInfo, @Context HttpHeaders headers) {
return response(req -> {
DatasetVersion datasetVersion = getDatasetVersionOrDie(req, versionId, findDatasetOrDie(datasetId), uriInfo, headers);
JsonObjectBuilder jsonObjectBuilder = Json.createObjectBuilder();
jsonObjectBuilder.add("total", datasetVersionFilesServiceBean.getFileMetadataCount(datasetVersion));
jsonObjectBuilder.add("perContentType", json(datasetVersionFilesServiceBean.getFileMetadataCountPerContentType(datasetVersion)));
jsonObjectBuilder.add("perCategoryName", json(datasetVersionFilesServiceBean.getFileMetadataCountPerCategoryName(datasetVersion)));
jsonObjectBuilder.add("perAccessStatus", jsonFileCountPerAccessStatusMap(datasetVersionFilesServiceBean.getFileMetadataCountPerAccessStatus(datasetVersion)));
return ok(jsonObjectBuilder);
}, getRequestUser(crc));
}

@GET
@AuthRequired
@Path("{id}/dirindex")
Expand Down
38 changes: 30 additions & 8 deletions src/main/java/edu/harvard/iq/dataverse/api/Files.java
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@

import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
Expand All @@ -63,15 +64,12 @@
import jakarta.ejb.EJBException;
import jakarta.inject.Inject;
import jakarta.json.Json;
import jakarta.json.JsonArray;
import jakarta.json.JsonString;
import jakarta.json.JsonValue;
import jakarta.json.stream.JsonParsingException;
import jakarta.servlet.http.HttpServletResponse;
import jakarta.ws.rs.Consumes;
import jakarta.ws.rs.DELETE;
import jakarta.ws.rs.GET;
import jakarta.ws.rs.POST;
import jakarta.ws.rs.PUT;
import jakarta.ws.rs.Path;
import jakarta.ws.rs.PathParam;
import jakarta.ws.rs.QueryParam;
import jakarta.ws.rs.*;
import jakarta.ws.rs.container.ContainerRequestContext;
import jakarta.ws.rs.core.Context;
import jakarta.ws.rs.core.HttpHeaders;
Expand Down Expand Up @@ -866,4 +864,28 @@ public Response getFileDataTables(@Context ContainerRequestContext crc, @PathPar
}
return ok(jsonDT(dataFile.getDataTables()));
}

@POST
@AuthRequired
@Path("{id}/metadata/categories")
@Produces(MediaType.APPLICATION_JSON)
public Response setFileCategories(@Context ContainerRequestContext crc, @PathParam("id") String dataFileId, String jsonBody) {
return response(req -> {
DataFile dataFile = execCommand(new GetDataFileCommand(req, findDataFileOrDie(dataFileId)));
jakarta.json.JsonObject jsonObject;
try (StringReader stringReader = new StringReader(jsonBody)) {
jsonObject = Json.createReader(stringReader).readObject();
JsonArray requestedCategoriesJson = jsonObject.getJsonArray("categories");
FileMetadata fileMetadata = dataFile.getFileMetadata();
for (JsonValue jsonValue : requestedCategoriesJson) {
JsonString jsonString = (JsonString) jsonValue;
fileMetadata.addCategoryByName(jsonString.getString());
}
execCommand(new UpdateDatasetVersionCommand(fileMetadata.getDataFile().getOwner(), req));
return ok("Categories of file " + dataFileId + " updated.");
} catch (JsonParsingException jpe) {
return error(Response.Status.BAD_REQUEST, "Error parsing Json: " + jpe.getMessage());
}
}, getRequestUser(crc));
}
}
16 changes: 16 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/util/json/JsonPrinter.java
Original file line number Diff line number Diff line change
Expand Up @@ -1111,6 +1111,22 @@ public Set<Collector.Characteristics> characteristics() {
};
}

public static JsonObjectBuilder json(Map<String, Long> map) {
JsonObjectBuilder jsonObjectBuilder = Json.createObjectBuilder();
for (Map.Entry<String, Long> mapEntry : map.entrySet()) {
jsonObjectBuilder.add(mapEntry.getKey(), mapEntry.getValue());
}
return jsonObjectBuilder;
}

public static JsonObjectBuilder jsonFileCountPerAccessStatusMap(Map<DatasetVersionFilesServiceBean.DataFileAccessStatus, Long> map) {
JsonObjectBuilder jsonObjectBuilder = Json.createObjectBuilder();
for (Map.Entry<DatasetVersionFilesServiceBean.DataFileAccessStatus, Long> mapEntry : map.entrySet()) {
jsonObjectBuilder.add(mapEntry.getKey().toString(), mapEntry.getValue());
}
return jsonObjectBuilder;
}

public static Collector<JsonObjectBuilder, ArrayList<JsonObjectBuilder>, JsonArrayBuilder> toJsonArray() {
return new Collector<JsonObjectBuilder, ArrayList<JsonObjectBuilder>, JsonArrayBuilder>() {

Expand Down
Loading