Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset files API extension for file display data with pagination and sorting #9693

Merged
merged 45 commits into from
Sep 21, 2023

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Jul 5, 2023

What this PR does / why we need it:

We need this API extension to support dataset files tab display data in the new frontend, as well as extend the capabilities of the API for other use cases.

In particular, this API extension includes:

  • Pagination and ordering for the getVersionFiles endpoint (underlying logic implemented with database queries)
  • New endpoints under the Files API: getCountGuestbookResponses, getFileDataTables
  • New endpoint under the Access API: getUserPermissionsOnFile
  • publicationDate attribute added to DataFile payload

Which issue(s) this PR closes:

Closes #9692

Special notes for your reviewer:

Support for category and folder grouping in Files tab

Initially mentioned in: #9693 (comment)

This PR does not include changes recently implemented in #9204 for JSF. An issue has been created in the frontend repository for supporting those changes in the SPA: IQSS/dataverse-frontend#142 (This will potentially result in an API extension spike issue to support the SPA).

Suggestions on how to test this:

You can test the getVersionFiles endpoint by combining its new optional parameters (offset, limit, and orderCriteria). For example:

curl -H "X-Dataverse-Key: <API_TOKEN>" -X GET "http://localhost:8080/api/v1/datasets/:persistentId/versions/:latest/files?persistentId=<PID>&offset=10&limit=5&orderCriteria=Newest"

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
No

Is there a release notes update needed for this change?:
Yes

Additional documentation:
No

@GPortas GPortas self-assigned this Jul 5, 2023
@coveralls
Copy link

coveralls commented Jul 5, 2023

Coverage Status

coverage: 19.959% (-0.009%) from 19.968% when pulling 6b45d82 on 9692-files-api-extension-display-data into 497c578 on develop.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@cmbz cmbz added the Size: 10 A percentage of a sprint. 7 hours. label Jul 5, 2023
@qqmyers
Copy link
Member

qqmyers commented Jul 5, 2023

Make sure you look at #9204 (merged, part of 5.14) which 'pre-sorts' by tags and/or file directory/folder - that's going to complicate your queries (assuming the initial goal is to stay current with 5.14).

@donsizemore
Copy link
Contributor

on testing for long parameters: will installations with FilePIDsEnabled ever want to send persistentIDs instead? (or am I mis-reading)

@GPortas
Copy link
Contributor Author

GPortas commented Jul 10, 2023

on testing for long parameters: will installations with FilePIDsEnabled ever want to send persistentIDs instead? (or am I mis-reading)

@donsizemore Thanks for this comment. It made me remember that findDataFileOrDie checks the type of id sent (id or PID). It is therefore required that the getCountGuestbookResponses endpoint includes this call too to support PIDs in addition to the numeric ids.

@qqmyers I'm going to add the findDataFileOrDie call again to the getCountGuestbookResponses endpoint. This will also solve the drawbacks I mentioned in my previous comment and will add the necessary verifications to the endpoint.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@scolapasta
Copy link
Contributor

scolapasta commented Sep 11, 2023

@GPortas I'll move this milestone 6.1 issue to "ready for Review", but I'm assigning you first so that you can handle the 6.0 merge and address any EE10 issues. When done, this one can actually go back to ready for QA (barring no major edits).

@github-actions

This comment has been minimized.

@GPortas GPortas removed their assignment Sep 12, 2023
@kcondon kcondon self-assigned this Sep 15, 2023
@kcondon
Copy link
Contributor

kcondon commented Sep 15, 2023

@GPortas The dataTables endpoint allows downloading table of a restricted file by a guest (no perms, no key)

@GPortas
Copy link
Contributor Author

GPortas commented Sep 15, 2023

@kcondon By downloading you mean getting the json payload from the /api/files/{id}/dataTables endpoint, right? You don't mean downloading the file.

In that case, how sensitive do you think the dataTables payload is? I was treating it at the same level as fileMetadata /api/files/{id}/, which allows obtaining the file metadata of a restricted file also by a guest (no perms, no key).

@kcondon
Copy link
Contributor

kcondon commented Sep 15, 2023

@GPortas I've discussed with members of the team and our data privacy project, OpenDP, does allow access to summary stats to private files but only after they calculate them with some noise added so a user cannot reverse engineer, along with some outside data for which they also have summary stats, who or what is being described. So, yes, summary stats should not be shared when "private", either due to belonging to a restricted or embargoed file (it happens here, too). The OpenDP summary stats get saved as a separate auxiliary file, not in the datatable.

Additionally, the export as DDI endpoint normally contains summary stats but it does not for restricted files.

@GPortas
Copy link
Contributor Author

GPortas commented Sep 20, 2023

@kcondon Makes sense, thank you for the explanation.

I have updated the code following that approach. In addition to permission checks for restricted files, I've also added them for embargoed files, since I consider that they meet the same requirements here regarding their privacy.

Furthermore, there is already an endpoint related to tabular files where these permission checks are done (restricted + embargoed): https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/api/Access.java#L468C17-L468C17

@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9692-files-api-extension-display-data
ghcr.io/gdcc/configbaker:9692-files-api-extension-display-data

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@kcondon kcondon merged commit 3b7d824 into develop Sep 21, 2023
12 checks passed
@kcondon kcondon deleted the 9692-files-api-extension-display-data branch September 21, 2023 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spike - API] [1/2] Extend the Files API to support frontend files tab display data
9 participants