Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for auditing physical files and file metadata #11016
API for auditing physical files and file metadata #11016
Changes from 1 commit
60d6f92
804d284
a62193c
d0df4f0
a1d1030
e433ee2
e4751c5
456f9f6
9b15681
2586c33
abfc738
b64addc
e89f1ca
7e9aae9
11cbe85
3eec366
26e8574
2db26b2
2c5aca8
3c67a79
58d3235
50b752a
a192c17
e06e1d2
8c79f67
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use this pattern of passing in the URL form of a PID minus "https://" anywhere else? It seems ok. Can we pass in the normal PIDs (the non-URL form) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9/21/22 Durbin:
Batch Exports Through the API
...
curl http://localhost:8080/api/admin/metadata/:persistentId/reExportDataset?persistentId=doi:10.5072/FK2/AAA000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's different... "doi.org/10... vs. doi:10...".
In this PR we. should use the pattern from reExportDataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Just the PID would be better than the URL form without "https://".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is eventually passed to PidUtil.parseAsGlobalID() so it will work with any format that works with that method. So whatever you type into that list will show up, as is, in the json "DatasetIdentifierList": []. It's just there to document what you passed in. Even if it's garbage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗑️ in 🗑️ out, as they say!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might technically be the identifier from the database but what about other types of PIDs like Handles and Permalinks? Let's not make users parse "persistentURL". We could expose this information separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added authority and protocol. Was this what you wanted or is there more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to be a pain but I think I'd rather have the full PID (e.g. "doi:10.5072/FK2/JXYBJS) than each field split out.
I mean, if you want to leave protocol, authority, and identifier fields in, I won't object, but having the full PID is useful, I think. The full PID is what we operate on in a lot of cases, including this "audit" API we're adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For easier parsing, should this be a JSON object with entries? Instead of a string where you split on comma then colon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Easier parsing would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-formatted the json output:
"missingFiles": [
{
"StorageIdentifier": "s3://dvn-cloud:298910",
"label": "jihad_metadata_edited.csv"
}
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks. Do we need the
directoryLabel
too?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added directoryLabel