draft-ietf-tsvwg-ieee-802 should not be a thing #381

rjsparks · 2023-11-21T15:10:06Z

Describe the issue

From a conversation with @stefanomunarini in #378:

More concretely, see https://bib.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tsvwg-ieee-802.xml and note that the returned anchor is incorrect - there is no such thing as I-D.ietf-tsvwg-ieee-802.

This is a different issue I think. I can see ietf-tsvwg-ieee-802 is actually indexed in the service.

Should this not be the case, can you @rjsparks please open a new issue? Thanks

draft-ietf-tsvwg-ieee-802 does not exist. draft-ietf-tsvwg-ieee-802-11 does (it became an RFC - its last version was draft-ietf-tsvwg-ieee-802-11-11.

The datatracker has some heuristics that use probing to see what exists when a name ends in -\d\d. I suspect the code that chose to index ietf-tsvwg-ieee-802 lacks similar consideration.

Code of Conduct

I agree to follow the IETF's Code of Conduct

The text was updated successfully, but these errors were encountered:

TonyLHansen · 2023-11-21T16:00:37Z

That'd be a good addition to the index code.

stefanomunarini · 2023-11-22T15:38:24Z

Thank you for raising this issue @rjsparks . This issue is being investigated

stefanomunarini · 2023-11-29T20:06:25Z

It turns out, ietf-tools/relaton-data-ids contains some data generated from older versions of a Relaton Gem (specifically relaton-ietf prior to v1.12.6). The ietf-tools crawler for this repository does not remove data before each crawler run, in opposition to what the relaton/relaton-data-ids crawler does (see https://github.com/relaton/relaton-data-ids/blob/8fd23a9f6dba6f3ad86f53024e0bf5ffb7d2da83/crawler.rb#L9-L10). draft-ietf-tsvwg-ieee-802 does not, in fact, exist in relaton/relaton-data-ids.

I do not recall being there any particular reason for not removing old data. @kesara can you confirm? In case there is no reason, here is the PR here

stefanomunarini · 2023-12-01T08:56:01Z

The document draft-ietf-tsvwg-ieee-802 is not indexed in the service anymore. However, the issue with the wrong anchor formatting for draft-ietf-tsvwg-ieee-802-11 persists (current anchor is draft-ietf-tsvwg-ieee-802). Is there a documentation on how Datatracker achieve this using heuristics?

rjsparks · 2023-12-01T14:10:38Z

The datatracker has the advantage of knowing what drafts exist.

When it makes bibxml-id entries it uses the draft name.

When it tries to find drafts that are ambiguous because there are endings that might be versions or part of the draft name, it probes the set of known names as part of the heuristics.

See https://github.com/ietf-tools/datatracker/blob/b78f5bab908dcd5e078cfcb947bc01b8a7a8f721/ietf/doc/utils.py#L1201-L1209. This works by looking for the longest matching string that is still an existing draftname. It relies on there not being any draft names in the database were A-nn and A are both valid names. If we ever allowed that to happen, the heuristics would become much more complicated.

For the bibxml service, I would look to see if that at the point the code is generating the anchor it has fully identified the draft name already and is unconditionally stripping -nn when it shouldn't be.

stefanomunarini · 2023-12-12T15:07:36Z

Thank you for the pointers to the Datatracker implementation @rjsparks . I'm looking into implementing a similar heuristic in the bibxml-service, however I am running into a loop of cases which are creating some confusion.

To summarise, and I invite you to confirm the new behaviour of the service:

The document draft-ietf-tsvwg-ieee-802-11 should be indexed in the service, while ietf-tsvwg-ieee-802-11 should not (it does not exist).
The bibxml-service should serve draft-ietf-tsvwg-ieee-802-11 for both paths I-D.draft-ietf-tsvwg-ieee-802-11-NN and I-D.ietf-tsvwg-ieee-802-11. In the first case, it should serve the NN version; in the latter, it should serve the latest version available.
On a more general term, we accept all of the followings: foo-bar (unversioned), foo-bar-11* (unversioned, if it exists), draft-foo-bar* (unversioned, if it exists), draft-foo-bar-11* (versioned, draft v.11, if it exists), draft-foo-bar-11 (unversioned, the whole string is the document name).

* these are paths that by default would raise an exception, because the expected format is either draft-foo-VERSION, or foo without a version. For these paths, the service will check against the Datatracker to assess whether the whole string is the actual document name.

One last case that is not supported: draft-foo-bar-11 (unversioned, document name is draft-foo-bar, requesting v.11) should result in an exception.

Can you please confirm all of the above? Thanks

stefanomunarini · 2023-12-12T15:08:44Z

Could you also please confirm the expected anchor values for the above cases?

rjsparks · 2023-12-12T16:46:55Z

For the first checkbox, yes, but more importantly draft-ietf-tsvwg-ieee-802 does not exist (you started conflating the specific version requests vs latest version requests with identifying the document being requested, and yes, those do interact).

rjsparks · 2023-12-12T16:52:00Z

The other check boxes are on track, but the * is not.

draft-sparks-sipcore-multiple-reasons-00 should definitely be something I can ask for. Can you clarify what you meant with the *?

stefanomunarini · 2023-12-12T17:43:02Z

draft-sparks-sipcore-multiple-reasons-00

In case of such path the service looks up for a document having name draft-sparks-sipcore-multiple-reasons-00. If no document is found, it returns the draft v.00 of draft-sparks-sipcore-multiple-reasons. Is this the correct behaviour?

rjsparks · 2023-12-13T14:46:01Z

The better way (because it's going to be right 99.99... for some string of 9s percent of the time) is to look for draft-sparks-sipcore-multiple-reasons first and if so, treat the -00 as a version. Only if you don't find a draft-sparks-sipcore-multiple-reasons would you go look for something named draft-sparks-sipcore-multiple-reasons-00.

Doing it in the order you suggest will cause a database lookup that will fail all but that very fractional small percent.

There are other ways to optimize this (that involve keeping extra configuration or state) but the above is the path the datatracker takes.

stefanomunarini · 2024-01-08T12:56:54Z

Hi @rjsparks , I am not fully confident of submitting a PR just yet, as the changes I've made affect a big part of the Internet Draft adapter. However, I deployed what is in this PR to our dev instance (https://dev.bibxml.org) and I invite you to test the new changes.

In particular, note that while before https://bib.ietf.org/get-one/by-docid/?docid=draft-ietf-tsvwg-ieee-802-11&doctype=Internet-Draft&query=draft-ietf-tsvwg-ieee-802&query_format=docid_regex&page=1 would point to https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-tsvwg-ieee-802.xml , the dev instance (https://dev.bibxml.org/get-one/by-docid/?docid=draft-ietf-tsvwg-ieee-802-11&doctype=Internet-Draft&query=draft-ietf-tsvwg-ieee-802&query_format=docid_regex&page=1) now points to the right document (https://dev.bibxml.org/public/rfc/bibxml3/reference.I-D.draft-ietf-tsvwg-ieee-802-11.xml).

The wrong URL (https://bib.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tsvwg-ieee-802.xml) can still be accessed manually tho, as I haven't found a way to block it without blocking other valid paths.

Please note: the way this part of the system was implemented makes it very hard to accommodate this special case without breaking other paths. We may have to reconsider a different approach if the above proposed solution does not satisfy the requirements.

rjsparks · 2024-01-08T23:25:30Z

@stefanomunarini - @kesara may have more to say after looking more closely, but I think that for the short term, getting the correct answer for a correct question is far more important than getting a wrong answer for the wrong question. These are rare enough that if the last example is problematic, we can block it or redirect it at cloudflare.

rjsparks added the bug Something isn't working label Nov 21, 2023

stefanomunarini self-assigned this Nov 22, 2023

stefanomunarini mentioned this issue Nov 29, 2023

chore: remove data directories before fetching new data ietf-tools/relaton-data-ids#41

Merged

stefanomunarini mentioned this issue Jan 10, 2024

xi:include not working for internet draft #399

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft-ietf-tsvwg-ieee-802 should not be a thing #381

draft-ietf-tsvwg-ieee-802 should not be a thing #381

rjsparks commented Nov 21, 2023 •

edited

Loading

TonyLHansen commented Nov 21, 2023

stefanomunarini commented Nov 22, 2023

stefanomunarini commented Nov 29, 2023

stefanomunarini commented Dec 1, 2023

rjsparks commented Dec 1, 2023 •

edited

Loading

stefanomunarini commented Dec 12, 2023 •

edited

Loading

stefanomunarini commented Dec 12, 2023

rjsparks commented Dec 12, 2023

rjsparks commented Dec 12, 2023

stefanomunarini commented Dec 12, 2023 •

edited

Loading

rjsparks commented Dec 13, 2023

stefanomunarini commented Jan 8, 2024

rjsparks commented Jan 8, 2024

draft-ietf-tsvwg-ieee-802 should not be a thing #381

draft-ietf-tsvwg-ieee-802 should not be a thing #381

Comments

rjsparks commented Nov 21, 2023 • edited Loading

Describe the issue

Code of Conduct

TonyLHansen commented Nov 21, 2023

stefanomunarini commented Nov 22, 2023

stefanomunarini commented Nov 29, 2023

stefanomunarini commented Dec 1, 2023

rjsparks commented Dec 1, 2023 • edited Loading

stefanomunarini commented Dec 12, 2023 • edited Loading

stefanomunarini commented Dec 12, 2023

rjsparks commented Dec 12, 2023

rjsparks commented Dec 12, 2023

stefanomunarini commented Dec 12, 2023 • edited Loading

rjsparks commented Dec 13, 2023

stefanomunarini commented Jan 8, 2024

rjsparks commented Jan 8, 2024

rjsparks commented Nov 21, 2023 •

edited

Loading

rjsparks commented Dec 1, 2023 •

edited

Loading

stefanomunarini commented Dec 12, 2023 •

edited

Loading

stefanomunarini commented Dec 12, 2023 •

edited

Loading