-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Croissant to Signposting "describedby" output #10542
Labels
FY25 Sprint 10
FY25 Sprint 10 (2024-11-06 - 2024-11-20)
FY25 Sprint 11
FY25 Sprint 11 (2024-11-20 - 2024-12-04)
Size: 10
A percentage of a sprint. 7 hours.
Comments
FWIW: I think signposting uses multiple describedbys - since you add the type attribute to specify the format for each one. We originally didn't put all of our exports in it because the draft/spec said something about only common formats, but in subsequent discussions, I don't think there would be any concern if we just automatically added all exports that are installed to the list. |
This was referenced May 7, 2024
Open
20204/08/01
|
pdurbin
added a commit
that referenced
this issue
Nov 22, 2024
pdurbin
added a commit
that referenced
this issue
Nov 22, 2024
pdurbin
added a commit
that referenced
this issue
Nov 22, 2024
pdurbin
added a commit
that referenced
this issue
Nov 25, 2024
pdurbin
added a commit
that referenced
this issue
Nov 25, 2024
…0542 The test file is used in InfoIT#testGetExportFormats
pdurbin
added a commit
that referenced
this issue
Nov 25, 2024
Before this PR... In development: Expected: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" Actual: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" On Jenkins Expected: is "http://localhost:8080/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292" Actual: http://ec2-3-225-221-142.compute-1.amazonaws.com/dataset.xhtml?persistentId=doi:10.5072/FK2/6A3292 So we'll change to just "endsWith" since we aren't actually testing the baseurl, just the datasetPid which we fixed up in ca93d60.
pdurbin
added a commit
that referenced
this issue
Dec 13, 2024
Conflicts: doc/sphinx-guides/source/api/changelog.rst (updated to 6.6)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
FY25 Sprint 10
FY25 Sprint 10 (2024-11-06 - 2024-11-20)
FY25 Sprint 11
FY25 Sprint 11 (2024-11-20 - 2024-12-04)
Size: 10
A percentage of a sprint. 7 hours.
Today @siacus and I were talking about how dataset landing pages can become heavy when the machine-readable JSON we put in the
<head>
(Schema.org JSON-LD or Croissant) gets large. In a real-life dataset with 25K files, the Croissant file can be 7.1 MB.We talked about putting a link to the Croissant file in our Signposting output, like we do for Schema.org JSON-LD. Basically, robots could request just the headers (e.g. with
curl --head
) and receive a link to the Croissant file, rather than the entire payload, which can be large.Unfortunately, people suffering from heavy dataset pages won't get relief until the large content is removed from the
<head>
of the page, but putting the link in Signposting gives machines an option for the future if the world wants to move in that direction. We already suggested Signposting to the Croissant/Google Dataset Search team at mlcommons/croissant#530 (comment)In our Signposting output, we already include a link for downloading Schema.org JSON-LD data via API. For example:
<https://dataverse.harvard.edu/api/datasets/export?exporter=schema.org&persistentId=doi:10.7910/DVN/TJCLKP>;rel="describedby"
The Signposting spec seems to allow multiple "describedby" values, but if we prefer to keep a single "describedby" value, we could consider swapping out
schema.org
forcroissant
when it's available, like we do for the<head>
tag:I don't think this is a lot of work. A 3 is probably enough but I'll give it a 10 for reviewing the Signposting spec and talking to that community, if need be, about multiple "describedby" values. The file to edit is SignpostingResources.java as seen in PR #8981.
See also this issue we opened with the Croissant team where we asked for guidance on large Croissant files:
<head>
mlcommons/croissant#646Related issues and PRs:
The text was updated successfully, but these errors were encountered: