Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define hosts' public suffix and registrable domain. #391

Merged
merged 7 commits into from
Jun 7, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,94 @@ for further processing.
U+0020 SPACE, U+0023 (#), U+0025 (%), U+002F (/), U+003A (:), U+003F (?), U+0040 (@), U+005B ([),
U+005C (\), or U+005D (]).

<p>A <a for=/>host</a>'s <dfn for=host export>public suffix</dfn> is the portion of a
<a for=/>host</a> which is controlled by a registrar, public or otherwise. To obtain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "controlled by a registrar, public or otherwise" we could say "included on the Public Suffix List". This is boring, but factual and correct as I understand it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that works (as boring as it is)

<var>host</var>'s <a for=host>public suffix</a>, run these steps:

<ol>
<li><p>If <var>host</var> is not a <a>domain</a>, then return null.

<li><p>Return the <a for=host>public suffix</a> obtained by executing the
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on
<var>host</var>. [[!PSL]].
</ol>

<p>A <a for=/>host</a>'s <dfn for=host export>registrable domain</dfn> is a <a>domain</a> that could
be registered at a registry. To obtain <var>host</var>'s <a for=host>registrable domain</a>, run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is its public suffix including one domain label preceding its public suffix". Again, boring, but factual?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a given host may have multiple public suffixes expressed within it.

Perhaps:
The domain formed by the most specific public suffix for host, along with the domain label immediately preceding it?

From a spec question, what do you want this definition to entail for the appspot.com case?

That is,

  • foo.bar.appspot.com is "obviously" going to return bar.appspot.com as the registerable domain (with appspot.com as the public suffix), and the same would be expected if just bar.appspot.com.
  • What do you expect this machinery to return for appspot.com? appspot.com is on the PSL, so that is a public suffix, but appspot.com is also a registerable domain under the com PSL.

I seem to recall that different platform features interpret that differently (navigation vs cookies, for example)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really aware of this case. Do you know why they interpret it differently? I guess we want consistent answers with cookies, WebAuthn, etc. If by navigation you mean the address bar it seems consistency with that would not matter that much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikewest wrote that the registrable domain would be null in such a case (we have github.io as example).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d need to reaudit the Chrome code to figure out which cases are web visible. The results differ in this case based on whether or not you include private suffices and whether you treat wildcards as implicit entries of the parent. Chrome and FF differ on the latter, and the former is specified by the caller.

these steps:

<ol>
<li><p>If <var>host</var>'s <a for=host>public suffix</a> is null or <var>host</var>'s
<a for=host>public suffix</a> <a lt=concept-host-equals>equals</a> <var>host</var>, then return
null.

<li><p>Return the <a for=host>registrable domain</a> obtained by executing the
<a href="https://publicsuffix.org/list/">algorithm</a> defined by the Public Suffix List on
<var>host</var>. [[!PSL]].
</ol>

<div class=example id=example-host-psl>
<table>
<tr>
<th>Host input
<th>Public suffix
<th>Registrable domain
<tr>
<td><code>com</code>
<td><code>com</code>
<td>
Copy link
Member

@annevk annevk May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to <td>Null I suppose. (Though we could maybe add a paragraph that says that null values are omitted. Not sure what's nicer.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest <td><i>null</i>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We don't use that convention anywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example tables like this are special. We omit the quotes, substitute strings for structs, and use other conventions meant for visual clarity and not consistency. I certainly don't think we should capitalize "null" here, and I think italicizing it so that it's clear it's not just a registrable domain named null is helpful.

Shrug. Just a thought.

<tr>
<td><code>example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>www.example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>sub.www.example.com</code>
<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>EXAMPLE.COM</code>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a host, but input to the host parser.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's helpful to point out that no matter how folks spell the URL, it's going to be normalized. Perhaps shifting this table to include a URL rather than a host would make that point, especially for the punycode bits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to just list hosts, but we should label it "host input" or some such, to not confuse it with host as a concept, which is already parsed and normalized.

<td><code>com</code>
<td><code>example.com</code>
<tr>
<td><code>github.io</code>
<td><code>github.io</code>
<td>
<tr>
<td><code>whatwg.github.io</code>
<td><code>github.io</code>
<td><code>whatwg.github.io</code>
<tr>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this row duplicated? The previous one looks the same.

<td><code>whatwg.github.io</code>
<td><code>github.io</code>
<td><code>whatwg.github.io</code>
<tr>
<td><code>إختبار</code>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. And also applies below.

<td><code>xn-kgbechtv</code>
<td>
<tr>
<td><code>example.إختبار</code>
<td><code>xn-kgbechtv</code>
<td><code>example.xn-kgbechtv</code>
Copy link

@sleevi sleevi Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one of the things is the PSL doesn't specify whether or not it returns U-Label or A-Label (that's left to the implementation). I'm curious the documentation here for the A-Label - is this an expectation of the contract?

That is, are you trying to show that either U-Label or A-Label can be returned regardless of U-Label or A-Label input, or are you trying to state that A-Labels should be the consistent return?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we don't rely on this anywhere (assuming it's consistent to be one or the other, is that at least required?), but A-label seems preferable as that'd be consistent with how the platform exposes URLs and origins overall.

I suspect this will only matter if we add an API, but it really depends on whether PSL dependencies keep getting added or not.

<tr>
<td><code>sub.example.إختبار</code>
<td><code>xn-kgbechtv</code>
<td><code>example.xn-kgbechtv</code>
</table>
</div>

<p>Two <a for=/>hosts</a>, <var>A</var> and <var>B</var> are said to be
<dfn for=host export>same site</dfn> with each other if either of the following statements are true:

<ul class=brief>
<li><p><var>A</var> <a lt=concept-host-equals>equals</a> <var>B</var>
<li><p><var>A</var>'s <a for=host>registrable domain</a> is <var>B</var>'s
<a for=host>registrable domain</a> and is not null.
</ul>


<h3 id=idna>IDNA</h3>

Expand Down