Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document, possibly standardize, "plus" scheme convention #230

Closed
mahmoud opened this issue Feb 3, 2017 · 10 comments
Closed

Document, possibly standardize, "plus" scheme convention #230

mahmoud opened this issue Feb 3, 2017 · 10 comments
Labels
clarification Standard could be clearer

Comments

@mahmoud
Copy link

mahmoud commented Feb 3, 2017

A new convention has arisen among URLs, as I'm sure many of you have seen, where the scheme specifies both a transport and protocol scheme, separated by a plus ("+") sign.

For instance, Python users can install a package directly from a git repo using: pip install git+ssh://[email protected]/mahmoud/boltons.git.

Given the number of transport-protocol combinations, and the straightforward meaning of most of them, documenting every single one seems excessive. Would it be worthwhile to capture this convention in the documentation?

This usage pattern dates back several years already, and seems beneficial, so I figure it might be good to have it on the board next time the standards committee rolls around :) Thanks!

@mnot
Copy link
Member

mnot commented Feb 3, 2017

What would it actually do? I don't think you can assume that every combination will work; e.g., git+ftp doesn't work AFAIK. Each combination seems like it will require documentation of its own (e.g., how Git uses HTTP, how git uses SSH).

@mahmoud
Copy link
Author

mahmoud commented Feb 3, 2017

Talk about a quick response! This isn't so much about the protocol itself, it's actually very URL-intrinsic.

So here's where this comes up. As you know, URLs with a scheme that uses a network location get a '//' whereas those without do not. http://github.com vs mailto:[email protected].

If the standard is enforced strictly, and the scheme is treated as a whole, it's not possible to guess if a scheme implies the URL has a network location. So we don't know whether it should be git+ftp://... or git+ftp:..., despite the fact that you or I can clearly guess.

A manifestation of this can be seen in Python's built-in URL support, here: https://hg.python.org/cpython/file/2.7/Lib/urlparse.py#l41

@mahmoud
Copy link
Author

mahmoud commented Feb 3, 2017

To really boil it down, I think it might be helpful to say something to the effect of "if using a conventional '+'-separated URL scheme, the last segment is what matters to the network location". Nip the ordering spat in the bud while everyone still agrees. (As another example, I believe docker uses http+unix:// to refer to http over unix domain socket) (they should probably be using the file scheme)

@mahmoud
Copy link
Author

mahmoud commented Feb 3, 2017

To rephrase:

If the scheme as a whole is not recognized, and a "+" is present in the scheme, and the last "+"-separated segment is a recognized scheme, it is reasonable for URL implementors to guess that that is the intended scheme for purposes of authority and default port behaviors.

I think that about covers my suggestion :)

@annevk
Copy link
Member

annevk commented Feb 7, 2017

If you want to include authority and have it parsed automatically, just use //. I don't think we want to introduce further complexity into the URL parser.

@mahmoud
Copy link
Author

mahmoud commented Feb 9, 2017

@annevk I understand your desire not to complicate the parser even more. The most I can hope for is someone to make a small note of the plus-convention for delineating multiple protocols/schemes. But to be clear, you can't "just use //". Whether or not a URL can have a // is defined on a per-scheme basis. file:///x/y/z is valid, whereas mailto://[email protected] is not.

There are dozens of schemes registered already, and with the + convention, there are dozens more that will likely bypass registration. A sensible heuristic isn't the worst thing to document.

@annevk
Copy link
Member

annevk commented Feb 9, 2017

That's true, but then it sounds like something that wouldn't really influence the parser. Better for a library that takes a URL record and does some post-processing.

@mahmoud
Copy link
Author

mahmoud commented Feb 9, 2017 via email

@annevk annevk added clarification Standard could be clearer and removed non-normative labels Apr 26, 2020
@annevk
Copy link
Member

annevk commented May 4, 2020

Closing this as it hasn't really come up much and it's also somewhat out-of-scope.

@brainstorm
Copy link

brainstorm commented May 11, 2022

Good read!

I suspect that to properly standardize this "+" scheme convention you'd have to go for RFC8615 and generate a ton of URI scheme drafts for IANA for all possible combinations which will in high likelihood be rejected by "the experts" (as noted in the aforementioned RFC)?:

https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml
https://en.wikipedia.org/wiki/List_of_URI_schemes

Pointing this out since I've been doing some research on this topic for work.

Please let me know if I'm wrong with my assumption(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification Standard could be clearer
Development

No branches or pull requests

5 participants