Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft-ietf-tsvwg-ieee-802 should not be a thing #381

Open
1 task done
rjsparks opened this issue Nov 21, 2023 · 13 comments
Open
1 task done

draft-ietf-tsvwg-ieee-802 should not be a thing #381

rjsparks opened this issue Nov 21, 2023 · 13 comments
Assignees
Labels
bug Something isn't working

Comments

@rjsparks
Copy link
Member

rjsparks commented Nov 21, 2023

Describe the issue

From a conversation with @stefanomunarini in #378:

More concretely, see https://bib.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tsvwg-ieee-802.xml and note that the returned anchor is incorrect - there is no such thing as I-D.ietf-tsvwg-ieee-802.

This is a different issue I think. I can see ietf-tsvwg-ieee-802 is actually indexed in the service.

Should this not be the case, can you @rjsparks please open a new issue? Thanks

draft-ietf-tsvwg-ieee-802 does not exist. draft-ietf-tsvwg-ieee-802-11 does (it became an RFC - its last version was draft-ietf-tsvwg-ieee-802-11-11.

The datatracker has some heuristics that use probing to see what exists when a name ends in -\d\d. I suspect the code that chose to index ietf-tsvwg-ieee-802 lacks similar consideration.

Code of Conduct

@rjsparks rjsparks added the bug Something isn't working label Nov 21, 2023
@TonyLHansen
Copy link

That'd be a good addition to the index code.

@stefanomunarini
Copy link
Contributor

Thank you for raising this issue @rjsparks . This issue is being investigated

@stefanomunarini
Copy link
Contributor

It turns out, ietf-tools/relaton-data-ids contains some data generated from older versions of a Relaton Gem (specifically relaton-ietf prior to v1.12.6). The ietf-tools crawler for this repository does not remove data before each crawler run, in opposition to what the relaton/relaton-data-ids crawler does (see https://github.com/relaton/relaton-data-ids/blob/8fd23a9f6dba6f3ad86f53024e0bf5ffb7d2da83/crawler.rb#L9-L10). draft-ietf-tsvwg-ieee-802 does not, in fact, exist in relaton/relaton-data-ids.

I do not recall being there any particular reason for not removing old data. @kesara can you confirm? In case there is no reason, here is the PR here

@stefanomunarini
Copy link
Contributor

The document draft-ietf-tsvwg-ieee-802 is not indexed in the service anymore. However, the issue with the wrong anchor formatting for draft-ietf-tsvwg-ieee-802-11 persists (current anchor is draft-ietf-tsvwg-ieee-802). Is there a documentation on how Datatracker achieve this using heuristics?

@rjsparks
Copy link
Member Author

rjsparks commented Dec 1, 2023

The datatracker has the advantage of knowing what drafts exist.

When it makes bibxml-id entries it uses the draft name.

When it tries to find drafts that are ambiguous because there are endings that might be versions or part of the draft name, it probes the set of known names as part of the heuristics.

See https://github.com/ietf-tools/datatracker/blob/b78f5bab908dcd5e078cfcb947bc01b8a7a8f721/ietf/doc/utils.py#L1201-L1209. This works by looking for the longest matching string that is still an existing draftname. It relies on there not being any draft names in the database were A-nn and A are both valid names. If we ever allowed that to happen, the heuristics would become much more complicated.

For the bibxml service, I would look to see if that at the point the code is generating the anchor it has fully identified the draft name already and is unconditionally stripping -nn when it shouldn't be.

@stefanomunarini
Copy link
Contributor

stefanomunarini commented Dec 12, 2023

Thank you for the pointers to the Datatracker implementation @rjsparks . I'm looking into implementing a similar heuristic in the bibxml-service, however I am running into a loop of cases which are creating some confusion.

To summarise, and I invite you to confirm the new behaviour of the service:

  • The document draft-ietf-tsvwg-ieee-802-11 should be indexed in the service, while ietf-tsvwg-ieee-802-11 should not (it does not exist).
  • The bibxml-service should serve draft-ietf-tsvwg-ieee-802-11 for both paths I-D.draft-ietf-tsvwg-ieee-802-11-NN and I-D.ietf-tsvwg-ieee-802-11. In the first case, it should serve the NN version; in the latter, it should serve the latest version available.
  • On a more general term, we accept all of the followings: foo-bar (unversioned), foo-bar-11* (unversioned, if it exists), draft-foo-bar* (unversioned, if it exists), draft-foo-bar-11* (versioned, draft v.11, if it exists), draft-foo-bar-11 (unversioned, the whole string is the document name).

* these are paths that by default would raise an exception, because the expected format is either draft-foo-VERSION, or foo without a version. For these paths, the service will check against the Datatracker to assess whether the whole string is the actual document name.

One last case that is not supported: draft-foo-bar-11 (unversioned, document name is draft-foo-bar, requesting v.11) should result in an exception.

Can you please confirm all of the above? Thanks

@stefanomunarini
Copy link
Contributor

Could you also please confirm the expected anchor values for the above cases?

@rjsparks
Copy link
Member Author

For the first checkbox, yes, but more importantly draft-ietf-tsvwg-ieee-802 does not exist (you started conflating the specific version requests vs latest version requests with identifying the document being requested, and yes, those do interact).

@rjsparks
Copy link
Member Author

The other check boxes are on track, but the * is not.

draft-sparks-sipcore-multiple-reasons-00 should definitely be something I can ask for. Can you clarify what you meant with the *?

@stefanomunarini
Copy link
Contributor

stefanomunarini commented Dec 12, 2023

draft-sparks-sipcore-multiple-reasons-00

In case of such path the service looks up for a document having name draft-sparks-sipcore-multiple-reasons-00. If no document is found, it returns the draft v.00 of draft-sparks-sipcore-multiple-reasons. Is this the correct behaviour?

@rjsparks
Copy link
Member Author

The better way (because it's going to be right 99.99... for some string of 9s percent of the time) is to look for draft-sparks-sipcore-multiple-reasons first and if so, treat the -00 as a version. Only if you don't find a draft-sparks-sipcore-multiple-reasons would you go look for something named draft-sparks-sipcore-multiple-reasons-00.

Doing it in the order you suggest will cause a database lookup that will fail all but that very fractional small percent.

There are other ways to optimize this (that involve keeping extra configuration or state) but the above is the path the datatracker takes.

@stefanomunarini
Copy link
Contributor

Hi @rjsparks , I am not fully confident of submitting a PR just yet, as the changes I've made affect a big part of the Internet Draft adapter. However, I deployed what is in this PR to our dev instance (https://dev.bibxml.org) and I invite you to test the new changes.

In particular, note that while before https://bib.ietf.org/get-one/by-docid/?docid=draft-ietf-tsvwg-ieee-802-11&doctype=Internet-Draft&query=draft-ietf-tsvwg-ieee-802&query_format=docid_regex&page=1 would point to https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-tsvwg-ieee-802.xml , the dev instance (https://dev.bibxml.org/get-one/by-docid/?docid=draft-ietf-tsvwg-ieee-802-11&doctype=Internet-Draft&query=draft-ietf-tsvwg-ieee-802&query_format=docid_regex&page=1) now points to the right document (https://dev.bibxml.org/public/rfc/bibxml3/reference.I-D.draft-ietf-tsvwg-ieee-802-11.xml).

The wrong URL (https://bib.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tsvwg-ieee-802.xml) can still be accessed manually tho, as I haven't found a way to block it without blocking other valid paths.

Please note: the way this part of the system was implemented makes it very hard to accommodate this special case without breaking other paths. We may have to reconsider a different approach if the above proposed solution does not satisfy the requirements.

@rjsparks
Copy link
Member Author

rjsparks commented Jan 8, 2024

@stefanomunarini - @kesara may have more to say after looking more closely, but I think that for the short term, getting the correct answer for a correct question is far more important than getting a wrong answer for the wrong question. These are rare enough that if the last example is problematic, we can block it or redirect it at cloudflare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants