-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define hosts' public suffix and registrable domain. #391
Changes from 1 commit
e679f89
cbf9063
6ea048d
cc03e7b
bd35e7e
2be718c
8828de3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -272,6 +272,97 @@ for further processing. | |
U+0020 SPACE, U+0023 (#), U+0025 (%), U+002F (/), U+003A (:), U+003F (?), U+0040 (@), U+005B ([), | ||
U+005C (\), or U+005D (]). | ||
|
||
<p>A <a for=/>host</a>'s <dfn for=host export id=concept-host-public-suffix>public suffix</dfn> is | ||
the portion of a <a for=/>host</a> which is controlled by a registrar, public or otherwise. To | ||
obtain <var>host</var>'s <a for=host>public suffix</a>, run the following steps: | ||
|
||
<ol> | ||
<li><p>Let <var>parsed</var> be the result of <a lt="host parser">host parsing</a> <var>host</var>. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A host is already parsed (otherwise it wouldn't be a host). You also need to introduce the host variable in the paragraph before the algorithm. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hrm. Yeah, I guess it's reasonable to assume that we'll only be using this algorithm on already-parsed hosts.
Line 277 introduces There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I missed that on line 277. That's fine, but your alternative here works too. |
||
|
||
<li><p>If <var>parsed</var> is not a <a>domain</a>, return the empty string. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This kind of implies that the public suffix is also a string. Perhaps it's cleaner to return null? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the public suffix a host? I guess it could be. I was assuming it was a string, but treating it as a host seems reasonable. |
||
|
||
<li><p>Return the <a for=host>public suffix</a> obtained by executing the | ||
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List. [[!PSL]]. | ||
</ol> | ||
|
||
<p>A <a for=/>host</a>'s <dfn for=host export id=concept-host-registrable-domain>registrable | ||
domain</dfn> is a formally valid domain name that could be registered at a registry. To obtain | ||
<var>host</var>'s <a for=host>registrable domain</a>, run the following steps: | ||
|
||
<ol> | ||
<li><p>Let <var>parsed</var> be the result of <a lt="host parser">host parsing</a> <var>host</var>. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment as above. |
||
|
||
<li><p>If <var>parsed</var> is not a <a>domain</a>, return the empty string. | ||
|
||
<li><p>If <var>parsed</var>'s <a for=host>public suffix</a> is <var>host</var>, return the empty | ||
string. | ||
|
||
<li><p>Return the <a for=host>registrable domain</a> obtained by executing the | ||
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List. [[!PSL]]. | ||
</ol> | ||
|
||
<div class=example id=example-host-psl> | ||
<table> | ||
<tr> | ||
<th>Host | ||
<th>Public Suffix | ||
<th>Registrable Domain | ||
<tr> | ||
<td><code>com</code> | ||
<td><code>com</code> | ||
<td> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should change this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why? We don't use that convention anywhere. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example tables like this are special. We omit the quotes, substitute strings for structs, and use other conventions meant for visual clarity and not consistency. I certainly don't think we should capitalize "null" here, and I think italicizing it so that it's clear it's not just a registrable domain named Shrug. Just a thought. |
||
<tr> | ||
<td><code>example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>www.example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>sub.www.example.com</code> | ||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>EXAMPLE.COM</code> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not a host, but input to the host parser. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's helpful to point out that no matter how folks spell the URL, it's going to be normalized. Perhaps shifting this table to include a URL rather than a host would make that point, especially for the punycode bits? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's fine to just list hosts, but we should label it "host input" or some such, to not confuse it with host as a concept, which is already parsed and normalized. |
||
<td><code>com</code> | ||
<td><code>example.com</code> | ||
<tr> | ||
<td><code>github.io</code> | ||
<td><code>github.io</code> | ||
<td> | ||
<tr> | ||
<td><code>whatwg.github.io</code> | ||
<td><code>github.io</code> | ||
<td><code>whatwg.github.io</code> | ||
<tr> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't this row duplicated? The previous one looks the same. |
||
<td><code>whatwg.github.io</code> | ||
<td><code>github.io</code> | ||
<td><code>whatwg.github.io</code> | ||
<tr> | ||
<td><code>إختبار</code> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. And also applies below. |
||
<td><code>xn-kgbechtv</code> | ||
<td> | ||
<tr> | ||
<td><code>example.إختبار</code> | ||
<td><code>xn-kgbechtv</code> | ||
<td><code>example.xn-kgbechtv</code> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So one of the things is the PSL doesn't specify whether or not it returns U-Label or A-Label (that's left to the implementation). I'm curious the documentation here for the A-Label - is this an expectation of the contract? That is, are you trying to show that either U-Label or A-Label can be returned regardless of U-Label or A-Label input, or are you trying to state that A-Labels should be the consistent return? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently we don't rely on this anywhere (assuming it's consistent to be one or the other, is that at least required?), but A-label seems preferable as that'd be consistent with how the platform exposes URLs and origins overall. I suspect this will only matter if we add an API, but it really depends on whether PSL dependencies keep getting added or not. |
||
<tr> | ||
<td><code>sub.example.إختبار</code> | ||
<td><code>xn-kgbechtv</code> | ||
<td><code>example.xn-kgbechtv</code> | ||
</table> | ||
</div> | ||
|
||
<p>Two <a for=/>hosts</a>, <var>A</var> and <var>B</var> are said to be | ||
<dfn for=host export id=concept-host-same-site>same-site</dfn> with each other if either of the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have it as "same origin". Should this be "same site"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Meh. I think I'd have spelled it "same-origin" if you hadn't already spelled it "same origin". :) I'm happy to follow suit with "same site"; I'm not dogmatic about hyphenation. |
||
following statements are true: | ||
|
||
<ul class=brief> | ||
<li><p><var>A</var> is identical to <var>B</var> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should use concept-host-equals. |
||
<li><p><var>A</var>'s <a for=host>registrable domain</a> is not the empty string, and is identical | ||
to <var>B</var>'s <a for=host>registrable domain</a>. | ||
</ul> | ||
|
||
<h3 id=idna>IDNA</h3> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantry:
Isn't necessarily correct. This implies control over the DNS, which isn't always passed on (e.g. in the cast of hosting or DNS providers), and an example like
appspot.com
, that domain isn't controlled by a registrar.That was the intent of the PSL originally - reflecting ccTLDs registration policies - but that predates the advent of the PRIVATE section where it all began the descent into hell :)
publicsuffix.org
doesn't list 'what' a public suffix is, other than the result of running the algorithm. Logically, it represents the separation of domain boundaries indicating a change in administrative or technical control or security policy (which is why IETF called it DBOUND), but that's a bit of a mouthful... :/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we calling Google the registrar in this example? Or GitHub the registrar of
*.github.io
?Is there a term I could use that would be more accurate (and less than a sentence long :) )?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikewest Yeah, except neither GitHub nor Google are actually or acting as registrars. That was why it was sort of weird. They don't necessarily allow registration either (and may instead assign names, such as Amazon, based on project IDs)
For a given domain input, the PSL splits the labels on the first administrative boundary, with the registered domain being the set of labels that are operated according to a different set of domain policies than the public suffix (which itself may contain more domain splits).
Definitely a mouthful, and this is part of why we dance around it on publicsuffix.org, because we haven't found a pithy way of describing left/right except in their relationship to each other. :/
I was hoping your ability to condense these concepts would be better than mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely one of the longest outstanding proper definitions that we eventually need to clarify on the PSL project as well, and connected to publicsuffix/publicsuffix.org#12
If we consider only the ICANN section (mistakenly named like that, as it should be IANA), than the definition is probably correct. If we consider also the PRIVATE section, and then the list as a whole, we must come with a better definition of what is effectively that distinguish a suffix from a host.
The "controlled" part is the key. In both cases, the denominator is than an entity has control of a portion (a set of labels) in a host, and determine specific rules on how that portion of the name is operated. Everything beyond (on the left) of that label is basically not under direct control of that entity, and therefore each subzone should be considered independent from the others.
In the case of a registry, the controlled portion is for sure the TLD and perhaps extra lower levels (generally second, something third). In that case, the "registerable" definition potentially apply, as there is a direct assumption that the registrar makes those domain available for registration. Again, this is actually a potential incorrect assumption, as domains that belong to that zone may not be open for registration, but assigned explicitly.
If the "registerable" may potentially fit the registrar use case, it definitely doesn't fit the PRIVATE use case because the suffixes in this section may be there for a variety of reasons.
However, regardless the use case, the common pattern is that the entity that controls the suffix declares that every subzone beyond that suffix should be considered independent zones potentially managed by different users.