Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RO-Crate metadata mime type detection #10016

Merged
merged 9 commits into from
May 7, 2024

Conversation

ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented Oct 16, 2023

What this PR does / why we need it:
This PR adds support for RO-Crate metadata detection. It does so by extending MimeTypeDetectionByFileName.properties to accept filenames with extensions and adding the two possible RO-Crate metadata file names:

ro-crate-metadata.json=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"
ro-crate-metadata.jsonld=application/ld+json; profile="http://www.w3.org/ns/json-ld#flattened http://www.w3.org/ns/json-ld#compacted https://w3id.org/ro/crate"

Which issue(s) this PR closes:

Closes #10015

Suggestions on how to test this:
Add a file named ro-crate-metadata.json to a dataset. The displayed mime type should be application/ld+json instead of application/json (the default for JSON files):
image

When you add the RO-Crate previewer (gdcc/dataverse-previewers#41) to the external tools, it should then preview the metadata file as RO-Crate.

Is there a release notes update needed for this change?:
Yes.

@pdurbin
Copy link
Member

pdurbin commented Oct 16, 2023

The displayed mime type should be application/ld+json instead of application/json (the default for JSON files):

This is a nice improvement! Wouldn't the next logical step be to edit the following files...

  • src/main/java/propertyFiles/MimeTypeDisplay.properties
  • src/main/java/propertyFiles/MimeTypeFacets.properties

... so that we can show RO-Crate rather than application/ld+json to the user? Here's a typical example of when we edit these files:

Also, some tests would be nice. And a release note. 😄

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@ErykKul
Copy link
Collaborator Author

ErykKul commented Oct 16, 2023

@pdurbin
Can you check my release note? Is it fine like this?
I added the test, I also needed to tweak the code a little bit to make it work with the profile in the mime type (RO-Crate is ld+json, but not all ld+json are RO-Crate).
It was not easy to get that cumbersome mime type as key in the property files, but Unicode escaping worked fine. It only looks strange, the test confirms that it works.
I have set the MimeTypeDisplay property of the RO-Crate as RO-Crate metadata and the MimeTypeFactes as Metadata. If you have a suggestion to change them, I will be happy to do it. Especially the facet is tricky, maybe already existing facet like Data or Document would be better here?
Thanks!

@pdurbin
Copy link
Member

pdurbin commented Oct 18, 2023

I have set the MimeTypeDisplay property of the RO-Crate as RO-Crate metadata

Sounds good. In line with https://www.researchobject.org/ro-crate/1.1/terminology.html

and the MimeTypeFactes as Metadata.

No objection from me. It's the first of these but it doesn't seem to fit with anything more popular:

  55 Data
  47 Code
  26 Image
  21 Archive
  15 Text
  13 Document
  10 Shape
   9 Audio
   8 Video
   2 FITS
   1 Unknown
   1 Tabular Data
   1 Network Data
   1 Metadata

@jggautier do you have an opinion on this?

Also, @ErykKul the release notes look good. I'm glad tests are passing. I haven't really looked at the code yet but I'm giving this a small effort size for review and QA.

@pdurbin pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Oct 18, 2023
@jggautier
Copy link
Contributor

Thanks for asking. I don't have an informed opinion.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left comments about dropping some now duplicate code and getting rid of the example file (which I think is not needed). Otherwise, it looks ready to go. I see that the build failed way back when but the results are no longer available, so we should check when this rebuilds after changes.

@cmbz
Copy link

cmbz commented Apr 16, 2024

Related to IQSS/dataverse-pm#175. Added related GREI tag.

@ErykKul
Copy link
Collaborator Author

ErykKul commented Apr 23, 2024

@qqmyers, I have refactored the lookup logic. It looks better now and is slightly more readable.

@ErykKul
Copy link
Collaborator Author

ErykKul commented May 2, 2024

@qqmyers I am not sure why the test was failing, but if it is not related to this PR, it can go to "ready for QA"

@coveralls
Copy link

Coverage Status

coverage: 20.62% (+0.02%) from 20.603%
when pulling 0252cdb on ErykKul:10015_ro_crate_mime_type
into a329f29 on IQSS:develop.

Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@sekmiller sekmiller self-assigned this May 7, 2024
@sekmiller sekmiller merged commit 71324d8 into IQSS:develop May 7, 2024
11 checks passed
@pdurbin pdurbin added this to the 6.3 milestone May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GREI 3 Search and Browse Size: 3 A percentage of a sprint. 2.1 hours.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

better detection of RO-Crate files
7 participants