Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for auditing physical files and file metadata #11016

Merged
merged 25 commits into from
Dec 2, 2024

Conversation

stevenwinship
Copy link
Contributor

@stevenwinship stevenwinship commented Nov 13, 2024

What this PR does / why we need it: Find Datasets with missing files so Admins can either delete the file reference or work with authors to re-upload the files.
See: IQSS/dataverse.harvard.edu#220

Which issue(s) this PR closes:

  • Closes #

Special notes for your reviewer:

Suggestions on how to test this: Create multiple Datasets with multiple files. If running in Docker locally delete a file from docker-dev-volumes/app/data/store...
call the api and see the missing file listed in the json response.
Other test could include deleting a FileMetadata row from the DB
Request specific Datasets as well as firstId and lastId

Does this PR introduce a user interface change? If mockups are available, please link/include them here: No

Is there a release notes update needed for this change?: Included

Additional documentation:

Preview docs at https://dataverse-guide--11016.org.readthedocs.build/en/11016/api/native-api.html#datafile-audit

@stevenwinship stevenwinship self-assigned this Nov 13, 2024
@coveralls
Copy link

coveralls commented Nov 13, 2024

Coverage Status

coverage: 21.825% (-0.03%) from 21.856%
when pulling 26e8574 on 220-audit-physical-files
into 61b8046 on develop.

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick pass through the docs and code. @stevenwinship please let me know what you think.

doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/api/native-api.rst Outdated Show resolved Hide resolved

Auditing specific Datasets (comma separated list)::

curl "$SERVER_URL/api/admin/datafiles/auditFiles?DatasetIdentifierList=doi.org/10.5072/FK2/JXYBJS,doi.org/10.7910/DVN/MPU019
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this pattern of passing in the URL form of a PID minus "https://" anywhere else? It seems ok. Can we pass in the normal PIDs (the non-URL form) instead?

Copy link
Contributor Author

@stevenwinship stevenwinship Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's different... "doi.org/10... vs. doi:10...".

In this PR we. should use the pattern from reExportDataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the doc

"identifier": "DVN/MPU019",
"persistentURL": "https://doi.org/10.7910/DVN/MPU019",
"missingFiles": [
"s3://dvn-cloud:298910, jihad_metadata_edited.csv"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Easier parsing would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-formatted the json output:

"missingFiles": [
{
"StorageIdentifier": "s3://dvn-cloud:298910",
"label": "jihad_metadata_edited.csv"
}
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks. Do we need the directoryLabel too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added directoryLabel

doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
doc/release-notes/220-harvard-edu-audit-files.md Outdated Show resolved Hide resolved
src/main/java/edu/harvard/iq/dataverse/api/Admin.java Outdated Show resolved Hide resolved
@pdurbin pdurbin changed the title audit physical files API for auditing physical files and file metadata Nov 19, 2024

This comment has been minimized.

4 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

This comment has been minimized.

@stevenwinship stevenwinship force-pushed the 220-audit-physical-files branch from 1da5daa to 2db26b2 Compare November 20, 2024 15:29

This comment has been minimized.

4 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the code but API tests are passing as of 3eec366 and the latest commits were just docs and reformatting. Approved.

This comment has been minimized.

@cmbz cmbz added the FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) label Nov 21, 2024
@cmbz cmbz added this to the 6.5 milestone Nov 21, 2024
@ofahimIQSS ofahimIQSS self-assigned this Nov 27, 2024

This comment has been minimized.

2 similar comments

This comment has been minimized.

Copy link

github-actions bot commented Dec 2, 2024

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:220-audit-physical-files
ghcr.io/gdcc/configbaker:220-audit-physical-files

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

@ofahimIQSS
Copy link
Contributor

Merging PR - Testing Passed
Uploading Screen Recording 2024-12-02 at 4.27.37 PM.mov…

@ofahimIQSS ofahimIQSS merged commit 5d7d942 into develop Dec 2, 2024
11 of 12 checks passed
@ofahimIQSS ofahimIQSS deleted the 220-audit-physical-files branch December 2, 2024 21:30
@ofahimIQSS ofahimIQSS removed their assignment Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FY25 Sprint 10 FY25 Sprint 10 (2024-11-06 - 2024-11-20) FY25 Sprint 11 FY25 Sprint 11 (2024-11-20 - 2024-12-04) Original size: 30 Type: Feature a feature request
Projects
Status: Done 🧹
Development

Successfully merging this pull request may close these issues.

5 participants