Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTML & API] Incorrect title and incorrect URI lookup with https://www.rfc-editor.org URL #256

Open
kael opened this issue Jan 25, 2022 · 5 comments

Comments

@kael
Copy link

kael commented Jan 25, 2022

Describe the bug

  • Bookmarking RFCs from The RFC Editor creates annotations pointing to one specific RFC (RFC 8783).

  • Also, all RFC Editor RFCs are bookmarked with the same title : RFC 8783: Distributed Denial-of-Service Open Threat Signaling (DOTS) Data Channel Specification

  • At last, annotations linked to the RFC URL appear as orphan although they exist.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://www.rfc-editor.org/rfc/rfc8984.html
  2. Click on the bookmarklet
  3. Notice orphan annotations
  4. Bookmark the RFC
  5. The bookmarked RFC is titled "RFC 8783: Distributed Denial-of-Service Open Threat Signaling (DOTS) Data Channel Specification" instead of "RFC 8984 JSCalendar: A JSON Representation of Calendar Data"
  6. API searching for annotations with the RFC URL returns annotations with other RFC (see annotation.uri and annotation.target.source[0]): https://api.hypothes.is/api/search?uri=https://www.rfc-editor.org/rfc/rfc8984.html

Expected behavior

  • HTML: Bookmarking a RFC should create annotations with a correct title based on the bookmarked RFC and not from RFC 8783.
  • API: An API search using a RFC Editor URL should only return annotations targeting the uri parameter and not all bookmarked RFCs from the RFC Editor.

Additional comment

The HTML code of the page doesn't contain a canonical URL but the following metadata:

<meta content="8984" name="rfc.number">
<!-- Generator version information:  ... -->
<link href="rfc8984.xml" rel="alternate" type="application/rfc+xml">
<link href="#copyright" rel="license">
<!-- ... -->
<link href="https://dx.doi.org/10.17487/rfc8984" rel="alternate">
<link href="urn:issn:2070-1721" rel="alternate">
<link href="https://datatracker.ietf.org/doc/draft-ietf-calext-jscalendar-32" rel="prev">
@robertknight
Copy link
Member

From the above HTML snippet, it looks like this one is the same across different RFCs:

<link href="urn:issn:2070-1721" rel="alternate">

This ISSN refers to the whole collection of RFCs. However I believe Hypothesis treats such links as alternate URIs for the specific URI, and so it creates an equivalence relation between the specific RFC you are annotating and this ISSN. Since a link to this same identifier is created for all RFCs, fetching annotations for one RFC will return entries for others.

Since ISSNs in general refer to ongoing publications or collections rather than specific articles, we could treat them specially.

@kael
Copy link
Author

kael commented Jan 25, 2022

Since ISSNs in general refer to ongoing publications or collections rather than specific articles, we could treat them specially.

👍

As an aside, is there any internal tool for fixing wrong annotations metadata (like document title) once a fixed is available, or would current annotations stay the same and the fix would only by applied to future URIs ?

@robertknight
Copy link
Member

As an aside, is there any internal tool for fixing wrong annotations metadata (like document title) once a fixed is available

No, I'm afraid not. We've been discussing internally that we need to do an overhaul of how document equivalence/metadata works in Hypothesis and create tools for this purpose.

@kael
Copy link
Author

kael commented Jan 25, 2022

No, I'm afraid not. We've been discussing internally that we need to do an overhaul of how document equivalence/metadata works in Hypothesis and create tools for this purpose.

Alright then.

Yes, some clarification of document uniqueness would be nice, also with the problem of the title of the bookmarked page not being the expected one but apparently the one from the initial bookmark. But if the first bookmark sends wrong metadata (e.g because of a SPA client-side navigation, and/or not correctly updated canonical URL) to the Hypothesis server, these wrong metadata pollute the DB for ever, and it creates some unexpected annotations.

Anyway, I'll be waiting for that ISSN related bug to be fixed, 'cos I have some RFC to bookmark. 😄

Cheers !

@kael
Copy link
Author

kael commented Jan 30, 2023

Since ISSNs in general refer to ongoing publications or collections rather than specific articles, we could treat them specially.

I've just realized that Hypothesis was indexing ISSNs like it is indexing DOIs, but it seems to have stopped, there are some cases of indexed-based ISSN content.

It'd be awesome if you could index ISSN (and ISBN as well). And ISS[N|B] wildcard search would be neat too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants