Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Datasets API to support dataset deaccession and recovery of files and file counts of deaccessioned datasets #9937

Merged
merged 40 commits into from
Oct 13, 2023

Conversation

GPortas
Copy link
Contributor

@GPortas GPortas commented Sep 20, 2023

What this PR does / why we need it:

This PR extends the following endpoints related to dataset version files:

  • getVersionFiles ("{id}/versions/{versionId}/files")
  • getVersionFileCounts ("{id}/versions/{versionId}/files/counts")

The above endpoints now accept a new boolean optional query parameter "includeDeaccessioned", which, if enabled, causes the endpoint to consider deaccessioned versions when searching for versions to obtain files or file counts. To do this, the following existing commands were modified:

  • GetLatestAccessibleDatasetVersionCommand
  • GetSpecificPublishedDatasetVersionCommand
  • GetLatestPublishedDatasetVersionCommand

This logic is required for the SPA to display file metadata for deaccessioned dataset versions.

Additionally, a new endpoint has been developed to support version deaccessioning through API (Given a dataset and a version): "{id}/versions/{versionId}/deaccession".

The version identifiers ":latest", ":latest-published" and ":draft" have been extracted to API constants (ApiConstants.java), since these strings were hardcoded throughout the application and this way it is easier to manage them.

Finally, the DataFile API payload has been extended to add the field "friendlyType", as described in IQSS/dataverse-frontend#169 (comment)

Which issue(s) this PR closes:

Suggestions on how to test this:

  1. Create a dataset version and publish it

  2. Deaccession dataset version via curl:

Create a deaccession.json file, which should like this:

{
  "deaccessionReason": "Description of the deaccession reason.",
  "deaccessionForwardURL": "https://demo.dataverse.org"
}

Export the following variables:

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=http://localhost:8080 (or your instance URL)
export ID=<DATASET_ID>
export VERSIONID=1.0
export FILE_PATH=deaccession.json

Then, call the deaccession endpoint:

curl -H "X-Dataverse-key:$API_TOKEN" -X POST "$SERVER_URL/api/datasets/$ID/versions/$VERSIONID/deaccession" -H "Content-type:application/json" --upload-file $FILE_PATH

  1. Get version files for the deaccessioned dataset version:

curl "$SERVER_URL/api/datasets/$ID/versions/$VERSIONID/files?includeDeaccessioned=true"

  1. Get version file counts for the deaccessioned dataset version:

curl "$SERVER_URL/api/datasets/$ID/versions/$VERSIONID/files/counts?includeDeaccessioned=true"

  1. Repeat previous operations but removing includeDeaccessioned query param to check how the deaccessioned version is not found (old behavior).

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

No

Is there a release notes update needed for this change?:

Yes

@GPortas GPortas changed the title Reet Extend Datasets API to support dataset deaccession and recovery of files and file counts of deaccessioned datasets Sep 20, 2023
@GPortas GPortas changed the base branch from develop to 9851-datafile-payload-extension September 20, 2023 09:05
@GPortas GPortas self-assigned this Sep 20, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@GPortas GPortas marked this pull request as ready for review September 21, 2023 09:38
@GPortas GPortas removed their assignment Sep 21, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Base automatically changed from 9834-files-api-extension-file-counts to 9714-files-api-extension-filters October 3, 2023 13:56
Base automatically changed from 9714-files-api-extension-filters to develop October 4, 2023 13:44
Copy link
Contributor

@sekmiller sekmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates! passing to QA

@sekmiller sekmiller removed their assignment Oct 4, 2023
@GPortas
Copy link
Contributor Author

GPortas commented Oct 6, 2023

We have conflicts here due to the revert. Let's wait for #9972 to be merged before resolving the conflicts and merging this PR.

@kcondon

New API option to get the download size of the dataset version files without the original tabular files size
@kcondon kcondon self-assigned this Oct 12, 2023
@GPortas
Copy link
Contributor Author

GPortas commented Oct 12, 2023

@kcondon Conflicts fixed.

@github-actions

This comment has been minimized.

@kcondon
Copy link
Contributor

kcondon commented Oct 12, 2023

@GPortas
Issues found:

  1. Deaccession endpoint examples not working. In how to test you list a deaccession api but it is incomplete, doesn't include data and has PUT rather than POST. So, I used the example in your doc and it failed:
    curl -H "X-Dataverse-key:xyz" -X POST "http://localhost:8080/api/datasets/336086/versions/1.0/deaccession" -d '{"deaccessionReason":"Description of the deaccession reason.", "deaccessionForwardURL":"https://demo.dataverse.org"}'
    {"status":"ERROR","message":"Error parsing Json: Invalid token=EOF at (line no=1, column no=0, offset=-1). Expected tokens are: [CURLYOPEN, SQUAREOPEN, STRING, NUMBER, TRUE, FALSE, NULL]"
  2. Both deaccessioned dataset file info endpoints allow guest/unauthorized user to get file info.
Screen Shot 2023-10-12 at 5 58 32 PM Screen Shot 2023-10-12 at 5 58 49 PM

deaccesioned_file_info_guest.txt

@github-actions

This comment has been minimized.

@GPortas
Copy link
Contributor Author

GPortas commented Oct 13, 2023

@kcondon

I apologize. I again forgot to update the description with the latest status of changes (like in other PR). Anyway, I had to update the documentation because the example curl calls were not correct, as you explained. I've updated both the docs and the example calls in the PR description so you can test it now.

Regarding the getVersionFiles and getVersionFileCounts calls with the includeDeaccessioned option and without sending credentials (guest user), they were indeed not working correctly.

The EditDataset permission check was missing for the GetSpecificPublishedDatasetVersionCommand command. I've fixed this and covered that case with integration tests.

@github-actions

This comment has been minimized.

@kcondon
Copy link
Contributor

kcondon commented Oct 13, 2023

@GPortas Thanks, looks great!

@kcondon
Copy link
Contributor

kcondon commented Oct 13, 2023

@GPortas Darn, just noticed merge conflicts. Can you take a look? Otherwise good to merge. Thanks! Just realized it might be due to my merging the other count pr. I also noticed a couple failing tests:

Screen Shot 2023-10-13 at 10 43 47 AM

@GPortas
Copy link
Contributor Author

GPortas commented Oct 13, 2023

@kcondon Thanks!

I run the failing tests on my localhost and they don't fail. Can you give me more information about those failing tests? (If possible, the assertions that are failing). Maybe some trace from the Jenkins output? I currently don't have access to Jenkins and can't see it.

Now working on resolving conflicts.

@kcondon kcondon merged commit 3725dea into develop Oct 13, 2023
@kcondon kcondon deleted the 9852-files-api-deaccessioned-datasets branch October 13, 2023 16:26
@github-actions
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9852-files-api-deaccessioned-datasets
ghcr.io/gdcc/configbaker:9852-files-api-deaccessioned-datasets

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@pdurbin pdurbin added this to the 6.1 milestone Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Size: 10 A percentage of a sprint. 7 hours. SPA These changes are required for the Dataverse SPA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Spike - API] Extend the Files API to return files for deaccessioned datasets
4 participants