Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should the presence of an HTTPS record suppress any certificate warning bypass option? #87

Closed
davidben opened this issue Dec 5, 2019 · 42 comments · Fixed by #93 or #274
Closed
Assignees

Comments

@davidben
Copy link
Contributor

davidben commented Dec 5, 2019

Elsewhere, HTTPSSVC's redirect was referred to as HSTS and it occurred to me we're missing one of HSTS's properties. It redirects and then it directs the browser to suppress the certificate click through button.

We could do something similar and say that HTTPS connections made off an HTTPSSVC record are assumed to have a competent TLS config and don't get a bypass button.

@davidben
Copy link
Contributor Author

davidben commented Dec 5, 2019

@bemasc
Copy link
Collaborator

bemasc commented Dec 13, 2019

@davidben There's a potential "footgun escalation" here, compared to HSTS. With HSTS, we at least know that the site had correctly configured HTTPS at some point in the past; otherwise HSTS could not have been enabled (except for HSTS preload...). With HTTPSSVC, there's no such guarantee. The server could have been misconfigured all along.

Personally, I think it would be perfectly reasonable for a client to adopt the behavior you're describing, but I don't see why it needs to be part of the standard. However, if you think it should be in the draft, we can certainly add it (e.g. "Clients MAY/SHOULD disable security bypass UI as per RFC 6797 Section 12.1.").

@davidben
Copy link
Contributor Author

davidben commented Dec 14, 2019

It needs to be part of the standard because the standard talks about what HTTPSSVC means. Right now it means "you can assume my http URL is a redirect". Does it also mean "you can assume my https URL is sensible"?

The point of standards is to promote interoperability. That means we need to all agree on the meanings of things.

My inclination is that it should mean this. A promise (up to DNS TTL) to deploy certificates correctly is not that much of a tall order over the existing promise to do https, and we nudge the web a little more towards security.

davidben added a commit to davidben/dns-alt-svc that referenced this issue Dec 16, 2019
This is the other half of HSTS. Closes MikeBishop#87.
@estark37
Copy link

@bemasc pointed me to this thread after I espoused the misconception that HTTPS RRs do not make certificate errors fatal, and I want to make an attempt to reverse this decision. :)

I know this is an unpopular opinion, but I think it was a mistake to have STS make cert errors fatal. Cert errors are overwhelmingly false positives, and many of them are caused by client or network misconfigurations that the server operator has no control over (misconfigured middleboxes, captive portals, bad client clocks, etc.). I suspect that the vast majority of server operators who enable STS don't know about the cert error behavior, and even if they do I suspect they don't know about the vast array of (mostly benign) conditions outside their control that can cause cert errors.

I think that the default configuration of HTTPS RRs should be to not make cert errors fatal, but maybe that can be a separate lever that site owners can configure. (Ideally I'd like STS to work this way too, but not sure if we can go back now.)

@ericorth
Copy link

I don't think IETF should be encouraging domain owners to make HTTPS protections bypassable unless we have a strong indication that not providing that option will significantly reduce adoption of HTTPS records. We should be moving the internet to a world where connections are secure and validated, not one where we keep skipping those protections and allowing bad client/network misconfigurations to continue.

Does your knowledge on the subject lead you to believe such adoption reduction would be significant?

@estark37
Copy link

I don't have data on whether it would impede adoption. However, I do know that false positive cert errors caused by non-server misconfigurations persist despite multiple person-years of effort to systematically drive them down (https://research.google/pubs/pub46359/). Also note that in modern browsers, cert errors are quite difficult to bypass, leading to low clickthrough rates that could probably be driven down even further with a bit more effort. For example users adhere to ~80% of cert errors in Chrome on Windows.

@ericorth
Copy link

Wouldn't low bypass clickthrough rates be a point against allowing bypass for new HTTPS technologies? Seems that in the vast majority of cases, users don't know how or don't want to bypass it, and that hasn't stopped HTTPS use on the internet. The users either take action to fix the client/network misconfiguration, or they stop using the website/network that they can't connect to.

The only usecase I see being supported by allowing website control of this is for websites that want to give their users directions on how to bypass the error page. I don't think that is a usecase IETF should be encouraging or further enabling as it seems terrible for user security. If those websites want to keep doing that, I think it would be best that they do so without access to the shiny new toys like HTTPS records.

@estark37
Copy link

Another possible interpretation of the low clickthrough rate is that more technical people who are able to reason correctly about the risks click through errors, while others leave the site. The point of having a bypass is that the browser can't always know what's in the user's best interest. That doesn't mean we can add an option/setting for every single security decision, but we already have this one. We (where by "we" here I mean a browser, not IETF) have a budget for how much we can inconvenience users in the name of security, and I'd rather spend that budget elsewhere (e.g., putting stronger warnings on http:// pages).

The only usecase I see being supported by allowing website control of this is for websites that want to give their users directions on how to bypass the error page

I don't think that's a relevant use case here; the user wouldn't be able to access the website for directions until they click through the cert error, at which point they wouldn't need the directions.

@davidben
Copy link
Contributor Author

davidben commented Oct 26, 2020

@estark37 Hah, clearly I should have checked with you here before writing the PR. :-)

That's a good point that certificate warnings have better adherence now than when HSTS was designed. I probably haven't sufficiently updated my thoughts for this new world. Let me try thinking aloud for a bit and maybe you can tell me where we're not on the same page?

I've generally considered the certificate click-through a legacy mistake, but one we're now stuck with because it has made self-signed or otherwise invalid certs just barely feasible enough that sites will do it, with the expectation that users bypass the error. Thus any ratchet we can use to get out of the hole and discourage "intentional" certificate errors is worthwhile. (After all, we don't ask users to validate syntax errors in HTTP/2 frames.)

I like the framing of certificate errors from server- and non-server-related causes. That seems a good way to think about it, since it tells us whether to fix via client behavior (clickthrough or otherwise), or via server behavior (server gives us an opt-in and/or opt-out signal telling us they're good at such-and-such).

For non-server-related causes, it seems to me neither a server opt-in nor opt-out bit would make sense because the server doesn't have any information about the cause. Of the examples (misconfigured middleboxes, captive portals, bad client clocks, etc.), it seems a cert bypass isn't a good remedy for captive portals anyway. We don't actually want or need to give HTTPS-breaking privileges to the network, just separately get through the captive portal flow. So improving the captive portal flow seems preferable to me.

I'm not such about the other two. In the bad clock case, I would have hoped the bad clock interstitial (which we should trigger independent of HSTS/HTTPSRR) would solve it, but I take it we still have gaps? Misconfigured middleboxes are interesting... my inclination would be that the enterprise or AV in question should be configuring this stuff rather than relying on users bypassing every single certificate error. Or is there another flow here?

For server-related causes, I guess it's a question of whether we think the ecosystem is ready enough for a particular commitment to be viable by default. A commitment to use a publicly trusted CA rather than relying on bypass seems pretty solid to me. I would like for sites to also be on the hook for resolving expired certificates (better yet, automate cert issuance!), but maybe we're not ready for that yet?

@davidben
Copy link
Contributor Author

Something else that comes to mind: whatever server commitment we end up with, the text probably should be tweaked a bit. For something like the TLS 1.0/1.1 deprecation, we ended up with a temporarily bypassable error, as an intermediate step to removing the protocols entirely. I think HSTS/SVCB should not block those bypasses, yet the text doesn't account for it. Those bypasses are an artifact of the meaning of "I promise to do HTTPS" adjusting over time. While we're in the transition state, the prompt behavior should also be in the transition state.

(Then again, unlike HSTS, SVCB does currently say SHOULD instead of MUST, so you could argue this case falls under the SHOULD. HSTS's text was maybe a bit prescriptive though. Ah well.)

@bemasc bemasc reopened this Oct 26, 2020
@ericorth
Copy link

(Then again, unlike HSTS, SVCB does currently say SHOULD instead of MUST, so you could argue this case falls under the SHOULD. HSTS's text was maybe a bit prescriptive though. Ah well.)

I don't think we should add temporary transition complications to the text. Being SHOULD instead of MUST is sufficient room for browsers to do any transition necessary.

And you now give me two additional chains of thought on this subject:

  1. If the bypass is potentially something we're stuck with mostly just because we've always had it, how much does that factor apply to HTTPS records? Is this enough of a transition that, by setting up HTTPS records, sites are reasonably transitioning into a new way of doing things, potentially making that a reasonable point to disable the old legacy behavior? Or does the website transitioning things not matter, because to the user perspective, it's still just the same https:// websites and they're the ones (or at least the technical users among them that know how to bypass) being inconvenienced by any change.
  2. Is this something websites should have control over at all (through HSTS, HTTPS records, etc)? If most of these reasons to allow bypass have nothing to do with the specific website, should those websites have any control over inconveniencing the technical risk-knowledgable users that want to bypass? Should a website's knowledge of things such as being a security-sensitive website override that? If there's no reason for websites to influence it, this almost seems something unnecessary for IETF to deal with in specs like this, leaving it up to browsers to either implement or not a bypass for all cert errors.

@estark37
Copy link

estark37 commented Oct 26, 2020

@estark37 Hah, clearly I should have checked with you here before writing the PR. :-)

That's a good point that certificate warnings have better adherence now than when HSTS was designed. I probably haven't sufficiently updated my thoughts for this new world. Let me try thinking aloud for a bit and maybe you can tell me where we're not on the same page?

I've generally considered the certificate click-through a legacy mistake, but one we're now stuck with because it has made self-signed or otherwise invalid certs just barely feasible enough that sites will do it, with the expectation that users bypass the error. Thus any ratchet we can use to get out of the hole and discourage "intentional" certificate errors is worthwhile. (After all, we don't ask users to validate syntax errors in HTTP/2 frames.)

Interesting, I don't think of cert clickthroughs as a legacy mistake. Definitely the IE6-style ones where OK was the default button, but not modern cert errors where the bypass exists but is quite buried. If we really thought that having a bypass at all was a legacy mistake, I don't actually think it would be that hard to get rid of it for all sites, and I'd rather go at it directly that way rather than add more knobs and edge cases to the decision tree of whether a cert error is bypassable.

I don't think we can exactly measure how many sites are using invalid certs expecting that users will click through, but my guess is that it's not common. In 2017, <=1% of all cert errors were self-signed certs. With modern cert errors making it so hard to click through, I doubt tons of sites are relying on users doing it.

I agree that we can't expose user settings for every security decision, but I do think settings make sense in some scenarios, and it's more of an art than a science deciding where it makes sense to have them. In my mind a big factor in the decision is the complexity of implementing/maintaining the setting, and as I mentioned above, if a goal is to get rid of the complexity of maintaining the setting, then I'd rather go about it directly for all cert errors -- which would also exert more influence over the server operators who knowingly deploy bad certs (since who knows how many of such server operators would use HTTPS RRs).

I like the framing of certificate errors from server- and non-server-related causes. That seems a good way to think about it, since it tells us whether to fix via client behavior (clickthrough or otherwise), or via server behavior (server gives us an opt-in and/or opt-out signal telling us they're good at such-and-such).

To clarify my stance, clickthrough is definitely not the ideal fix for any situation, but it could be the right tradeoff for some users in some situations in today's reality, and I don't predict that HTTPS RRs would change today's reality much.

For non-server-related causes, it seems to me neither a server opt-in nor opt-out bit would make sense because the server doesn't have any information about the cause. Of the examples (misconfigured middleboxes, captive portals, bad client clocks, etc.), it seems a cert bypass isn't a good remedy for captive portals anyway. We don't actually want or need to give HTTPS-breaking privileges to the network, just separately get through the captive portal flow. So improving the captive portal flow seems preferable to me.

Improving the captive portal flow would be great, but it's not really clear how to do it. Despite significant cycles spent to build OS- and browser-level captive portal detection, people still encounter cert errors while connecting to captive portals allllll the time, and often don't know what else to do but click through.

I'm not such about the other two. In the bad clock case, I would have hoped the bad clock interstitial (which we should trigger independent of HSTS/HTTPSRR) would solve it, but I take it we still have gaps? Misconfigured middleboxes are interesting... my inclination would be that the enterprise or AV in question should be configuring this stuff rather than relying on users bypassing every single certificate error. Or is there another flow here?

Bad clock interstitial works well in Chrome, but I don't think most browsers do that. And yeah, in the misconfigured middlebox case, as in every other misconfig case, it sure isn't the ideal outcome for users to bypass every cert error, but realistically that's what some people do now. If we want to go after improving the middlebox situation, for example, I think we'd be better off going after that situation directly rather than adding UX and implementation complexity to go after it indirectly by tying it to HTTPS RRs.

For server-related causes, I guess it's a question of whether we think the ecosystem is ready enough for a particular commitment to be viable by default. A commitment to use a publicly trusted CA rather than relying on bypass seems pretty solid to me. I would like for sites to also be on the hook for resolving expired certificates (better yet, automate cert issuance!), but maybe we're not ready for that yet?

In a world where we knew that all cert errors were caused by server errors, then yeah I think removing the bypass would make a lot more sense. But right now many cert errors aren't caused by server errors, and with STS/HTTPS RRs, the server is making a decision on behalf of the user without actually knowing anything about what caused the error and what the user's situation is.

@estark37
Copy link

(Then again, unlike HSTS, SVCB does currently say SHOULD instead of MUST, so you could argue this case falls under the SHOULD. HSTS's text was maybe a bit prescriptive though. Ah well.)

I don't think we should add temporary transition complications to the text. Being SHOULD instead of MUST is sufficient room for browsers to do any transition necessary.

And you now give me two additional chains of thought on this subject:

  1. If the bypass is potentially something we're stuck with mostly just because we've always had it, how much does that factor apply to HTTPS records? Is this enough of a transition that, by setting up HTTPS records, sites are reasonably transitioning into a new way of doing things, potentially making that a reasonable point to disable the old legacy behavior? Or does the website transitioning things not matter, because to the user perspective, it's still just the same https:// websites and they're the ones (or at least the technical users among them that know how to bypass) being inconvenienced by any change.
  2. Is this something websites should have control over at all (through HSTS, HTTPS records, etc)? If most of these reasons to allow bypass have nothing to do with the specific website, should those websites have any control over inconveniencing the technical risk-knowledgable users that want to bypass? Should a website's knowledge of things such as being a security-sensitive website override that? If there's no reason for websites to influence it, this almost seems something unnecessary for IETF to deal with in specs like this, leaving it up to browsers to either implement or not a bypass for all cert errors.

I think #2 is a very reasonable stance to take, in line with the priority of constituencies. A good chunk of cert errors have absolutely nothing to do with the server -- the server can't fix them, the server doesn't know what the right decision is for the user -- so why does the server get to have a say over how they're presented to the user? And if our position is that cert errors should never be bypassable, HTTPS RRs seems like an indirect way to go about accomplishing that.

@enygren
Copy link
Collaborator

enygren commented Oct 26, 2020

What are the options here? Top contenders seem to be:

  1. Leave it as the current "SHOULD NOT" bypass cert errors
  2. Weaken this to a "MAY" but add in a new parameter that explicitly says MUST NOT bypass

@ericorth
Copy link

I think the direction of this conversation might be leaning more into:

  1. Weaken to "MAY" or maybe even remove any mention of the topic, and do not add any parameter to allow even optional server control. Just leave it a matter for client implementors to decide how to communicate cert errors to users and whether or not the client UI will allow bypassing, same as any other HTTPS connections (except HSTS).

@moonshiner
Copy link
Contributor

Thinking as a dns person, In either case, we should have an example in the appendix where we show a setup where this would return cert errors. DNS people like to stand up broken DNS zones as examples.

Any user issue will hit a DNS person first who should have guidance on head scratching. But I also admit I need to reread this whole thread. thanks @estark37 !

@davidben
Copy link
Contributor Author

davidben commented Oct 26, 2020

Interesting, I don't think of cert clickthroughs as a legacy mistake. Definitely the IE6-style ones where OK was the default button, but not modern cert errors where the bypass exists but is quite buried. If we really thought that having a bypass at all was a legacy mistake, I don't actually think it would be that hard to get rid of it for all sites, and I'd rather go at it directly that way rather than add more knobs and edge cases to the decision tree of whether a cert error is bypassable.

I don't think we can exactly measure how many sites are using invalid certs expecting that users will click through, but my guess is that it's not common. In 2017, <=1% of all cert errors were self-signed certs. With modern cert errors making it so hard to click through, I doubt tons of sites are relying on users doing it.

Anecdotally, we get quite a lot of networking bug reports where people expect to bypass certificate errors. Sometimes it's developers whose workflow involves bypassing a local cert error. Sometimes it's a weird networking product that's, distressingly, designed to have the user bypass some error. :-( Even if all non-server issues were fixed, I suspect we'd still have difficulties getting rid of this bypass. Thus, the aim to ratchet away the invalid server configurations.

Regarding it being legacy or not, I think my ideal state would be one where the bypass wasn't there. Browsers never ask humans to evaluate X.509 certificates, and, through that, the expectation would be on technologies to work without expecting humans to do this. (A nice self-reenforcing cycle to avoid regression.) Maybe automated certificate issuance would be standard so expired certs aren't a problem. Maybe developer use cases would go through better tooling. Maybe this would incentivize captive portals to work correctly with detectors, rather than hide from them.

We are, of course, not in the state right now. This is extra annoying because the self-reenforcing cycle is against improvement rather than regression. And so the hope is increasingly broad ratcheting strategies would help us get there and, in the meantime, carve off chunks of the web where the risks are mitigated.

But ratcheting works best when the obstacle is a property of the server. It sounds like it's no longer the primary bottleneck and maybe we need to address those before we can broaden the ratchet. :-/ On the plus side, the improved error dialogs also make the problem less severe. (Annoyingly, saying "HTTPS RRs are a commitment not to rely on a bypass for server-related reasons" is kinda pointless if the non-server reasons prevent us from enforcing it. Unenforced parts of protocols are mostly fiction.)

Improving the captive portal flow would be great, but it's not really clear how to do it. Despite significant cycles spent to build OS- and browser-level captive portal detection, people still encounter cert errors while connecting to captive portals allllll the time, and often don't know what else to do but click through.

Hrm. I don't suppose we've considered something truly absurd like just including a link to captive portal HTTP URL from the link? Users clicking through certificate errors in hopes of hitting a captive portal seems the worst possible outcome. :-/ But I recognize that's proposing to build a thing that doesn't actually exist right now, so not very helpful in resolving this issue.

Bad clock interstitial works well in Chrome, but I don't think most browsers do that.

That's fair. Although I would advocate those browsers do that. If bad clock interstitials fully mitigate this (do they?), I'm quite happy to declare that one solved and rejoice. :-)

And yeah, in the misconfigured middlebox case, as in every other misconfig case, it sure isn't the ideal outcome for users to bypass every cert error, but realistically that's what some people do now. If we want to go after improving the middlebox situation, for example, I think we'd be better off going after that situation directly rather than adding UX and implementation complexity to go after it indirectly by tying it to HTTPS RRs.

I think I agree with everything you say here, except the conclusion. In my mind, the aim of tying something to HTTPS RRs is not to address middlebox misconfigurations, but to address server misconfigurations. The connection to non-server misconfigurations like middleboxes is, if they're too common, they make this strategy less viable.

@brian-peter-dickson
Copy link

brian-peter-dickson commented Oct 26, 2020 via email

@davidben
Copy link
Contributor Author

Sounds good. Being precise doesn't hurt.

To help with any existing ambiguity, the only relevant servers in this thread are HTTPS servers. They're what HTTPS records describe. The contents of the HTTPS record doesn't have much bearing on the other servers related to DNS. (Well, I suppose if your recursive is DoH and you use an HTTPS record to find it, this text applies to there too. But I'm not aware of any client that allows users to bypass HTTPS errors when establishing a connection to the DoH resolver, so it's moot.)

@fl1ger
Copy link

fl1ger commented Oct 27, 2020

Moin!

So this probably goes sideways, but as a DNS person my main feature for HTTPS was to solve the CNAME at the APEX problem. As a test to the current clients out there I setup a couple of DNS resource records to try that and a bunch of web server to figure out where stuff ends up. As I was lazy I used regular port 80 http servers. And it worked with the current mostly Apple based clients (try bla42.de, https.bla42.de and hintdirect.bla42.de) . Now re reading the draft after this discussion it shouldn't, but I see no harm that it does and agree that we should not demand STS outright, but make it configurable via an option.

So long
-Ralf

@ericorth
Copy link

Note that I believe this issue is mostly regarding just the behavior of whether clients can allow bypassing certificate errors via user action on error pages, or if the HTTP->HTTPS upgrade feature of HTTPS records of should follow the example of HSTS and require strictly terminating on cert errors with any option of bypass. Sounds like you're talking about making the upgrade functionality in-general optional. That's a different discussion, and there's already a separate bug for it: #100

bemasc pushed a commit that referenced this issue Oct 28, 2020
This change adjusts the requirements to require termination without
recourse only on connections where the client can be reasonably sure
that it is talking to the actual server, not a middlebox.  This reflects
the view that the server can and should vouch for its own correct
configuration, but the client is better positioned to judge how to
handle middlebox-related failures.

Fixes #87
bemasc pushed a commit that referenced this issue Oct 28, 2020
This change adjusts the requirements to require termination without
recourse only on connections where the client can be reasonably sure
that it is talking to the actual server, not a middlebox.  This reflects
the view that the server can and should vouch for its own correct
configuration, but the client is better positioned to judge how to
handle middlebox-related failures.

Fixes #87
@bemasc
Copy link
Collaborator

bemasc commented Oct 28, 2020

Hi all. I've proposed a middle-ground "no recourse" policy in #274; please take a look. That proposal is based on the notion that there are some errors that almost-certainly reflect a server misconfiguration, and we can reasonably ask servers using this new RR type to make sure not to be misconfigured in this way.

This is a compromise: it's a small ratchet step, but it doesn't do much for security on its own. It doesn't add knobs, but it does add edge cases.

Please comment ASAP. The draft deadline is Monday, and I'd like to get this settled in time to get a new draft out.

@enygren
Copy link
Collaborator

enygren commented Oct 28, 2020

I'm not thrilled by PR #274. I'm not sure why expiry is a no-recourse error by totally invalid certs are not. What attacks does this guard against? (Either way, it may be worth separating out the "without recourse" from the "terminate the connection" to make it clear that the latter is a MUST but that providing a recourse to the user is something that the U-A may wish to consider based on current best practices.)

@bemasc
Copy link
Collaborator

bemasc commented Oct 28, 2020

Expiry is a no-recourse error because it's definitely the server's fault, and the point of this section is that the server is promising to have working HTTPS. A middlebox can't cause an expiry error. Totally invalid certs could be a middlebox, so we leave the question of user recourse in that case entirely to the client.

This doesn't meaningfully guard against any attacks. It's not a security measure on its own. It's a "situation simplification" measure. If we can reduce the number of cases where an escape hatch is offered, we'll have fewer cases to tackle in the future. If we eventually close them all, then perhaps we can claim a security improvement.

I should say that I have no strong opinion on what the draft should say on this point. I'm just looking for a position that can command consensus.

@estark37
Copy link

Expiry is a no-recourse error because it's definitely the server's fault, and the point of this section is that the server is promising to have working HTTPS. A middlebox can't cause an expiry error.

You'd think so, but alas this isn't true. See Section 7.2 of https://storage.googleapis.com/pub-tools-public-publication-data/pdf/04822a2487f3cd27ff92dbfddf42d947acdc4257.pdf for an example where an AV product was installing and MITMing with an expired root. More generally, misconfigured client clocks can cause expired cert errors; we remediate this in Chrome with a custom error page, but not all browsers do that, and that wouldn't have helped with the case in Section 7.2 anyway.

I'd suggest that no-user-recourse should just be a MAY, without any SHOULDs attached to it. If the ecosystem subsequently changes such that ratcheting down on server-caused errors becomes palatable without causing too much user pain for the non-server-caused error situations (as @davidben is imagining, I think), then UAs could start removing user recourse at that point.

Repository owner deleted a comment from govhero Nov 5, 2020
@enygren
Copy link
Collaborator

enygren commented Nov 6, 2020

What about just replacing "SHOULD" with "MAY" and dropping the reference to HSTS? (The question then becomes whether we should add more discussion. We could reference some academic studies like the one from @estark37 as informational references (or not)?

@tialaramex
Copy link

At least estark37 and possibly others have suggested that the user knows best. This is at least arguably wrong. https://11foot8.com/ matches years of psychological studies. Once humans have set upon a plan (in that case, driving under the low bridge in a truck) merely warning them that it's a bad idea won't help because they will strongly resist deviating from the plan - even though they do not in fact have independent reason to know it will work. A giant sign with "Overheight Must Turn" and stop lights ought to be enough right? Nope. They smash their truck into the bridge anyway. A "bypass" - even the one Chrome includes for HSTS - will be abused not only by people who should know better but also by people who haven't a clue, since it seems to work and they imagine (wrongly) that they wouldn't be able to smash their metaphorical truck into a bridge. They're relying on developers to prevent that. We should do so.

Unlike Chrome, as far as I understand Firefox does not have any bypass for HSTS. I didn't see any recognisable Mozilla people in this thread (if I missed you, sorry) but I think it would be worth asking them about this because clearly they've had success (e.g. by shipping all public intermediates to sidestep AIA chasing/ caches) focusing on solving HTTPS user experience issues without leaving users constantly one wrong click away from doom.

@bemasc
Copy link
Collaborator

bemasc commented Nov 6, 2020

@tialaramex I think there may be a misunderstanding here. Chrome does not normally offer a bypass for HSTS, as you can verify here. The question at hand is whether to extend that lack of bypass to domains that publish the new "HTTPS" RR. (More precisely, the question is what recommendation the IETF should make regarding such a lack of bypass.)

@tialaramex
Copy link

Visiting the link @bemasc provided in Chrome on Windows behaved as I anticipated, you can simply type 'thisisunsafe' and bypass the HSTS error page.

@estark37
Copy link

estark37 commented Nov 6, 2020

@tialaramex The 'thisisunsafe' bypass isn't part of the UI and isn't documented, and is used very very rarely. Please see https://emilymstark.com/2020/07/14/debunking-the-users-always-click-yes-myth.html for security warning research on this topic.

@brian-peter-dickson
Copy link

I'd like to leave a (hopefully both brief and succinct) comment on the larger issue:
IMNSHO, the only party who should very specifically have control over any sort of fallback, should be the domain owner, i.e. the party publishing the HTTPS record.

I'm not sure if it is currently part of the set of supported SVCB or HTTPS record values (which is the list of supported protocols), but IFF (if and only if) the owner publishes a supported non-HTTPS transport should it be at all possible to fallback.

If there is a fallback path (to non-HTTPS) published, then that would be how to handle cert failure(s).
If there is no fallback path published, meaning only the HTTPS variants are published, cert failures MUST be fatal.

Is this a fairly clear logical if/then statement, and is the reason for this also obvious?

@estark37
Copy link

estark37 commented Nov 6, 2020

@brian-peter-dickson please see the preceding discussion for why it is not an obvious conclusion that cert failures MUST be fatal when there is no fallback path. (#87 (comment) and the subsequent discussion with @davidben)

@bemasc
Copy link
Collaborator

bemasc commented Nov 6, 2020

@brian-peter-dickson The word "fatal" here is somewhat ambiguous. Everyone agrees that cert failures always cause the page load to fail with a big scary warning. The question is whether the big scary warning can include some sort of "user recourse", i.e. escape hatch/bypass/override. (I wouldn't call it a "fallback".)

In my view, the cert failure is "fatal" when the big scary warning appears, whether or not it offers a "user recourse". Moreover, as @estark37 has highlighted, the presence or absence of "user recourse" is not really binary. Interaction designers can make a "user recourse" highly visible, nearly unreachable, or anything in between.

@ericorth
Copy link

ericorth commented Nov 6, 2020

What if the draft clarifies that domains and addresses found through HTTPS records should only be connected to via HTTPS? If a domain publishes legacy A/AAAA records at the original domain that any client could have used to attempt non-HTTPS anyway, those are fair game for a client to attempt to use, depending on the logic and behavior of the client. If a domain only publishes HTTPS, the domain owner is clearly communicating that that domain is intended to only handle HTTPS connections and no alternatives are given (and ideally, the server should reject any non-HTTPS connections).

Yes, this would mean that HTTPS-only domains would only be usable by clients that recognize HTTPS records, but those are the only clients that would know to only connect via HTTPS anyway. The domain would also have to live with being unreliable if anything goes wrong with the network/client to make certs or HTTPS records unusable, but that's exactly why the recommendation should be to include A/AAAA records, to give clients more options and increase reliability.

@bemasc
Copy link
Collaborator

bemasc commented Nov 6, 2020

@ericorth We're only talking about HTTPS here. Even the "user recourse" option still runs over TLS and retains the https:// origin.

Having HTTPS and A/AAAA records for the name will be the common case for the foreseeable future. I don't think there's any reason to sacrifice the HSTS behavior in that case. It doesn't particularly help to answer the "user recourse" question.

Also, we recommend using "TargetName=." to avoid extra indirection, but your logic here would require that indirection, in order to avoid putting AAAA records on the hostname.

@bemasc bemasc changed the title Should HTTPSSVC make certificate errors fatal? Should the presence of an HTTPS record suppress any certificate warning bypass option? Nov 6, 2020
@ericorth
Copy link

ericorth commented Nov 6, 2020

Ah. For some reason I thought that the user recourse was fully retrying using normal http://. If that's not the case, nevermind. My idea doesn't work.

@tialaramex
Copy link

@estark37 The research doesn't seem to say anything about 'thisisunsafe' beyond the general observation (which I agree with) that opinionated bypass UI is more effective.

We should be clear that this feature is documented, but I presume you meant that Chrome/ Chromium teams don't provide that documentation. The result is that users learn about it from third parties possibly without accompanying cautions of the consequences. For example I actually don't know if it bypasses WebAuthn restrictions, if a remote HTTPS server has no proof it is mybank.example can I send it working mybank.example credentials anyway now that I've typed a sequence of letters in a language I don't understand following the instructions in this phishing email? Or is that correctly locked out because My Bank has HSTS?

The main thing I took away from this research (and similar research I've seen previously) is that we can't hope to achieve universal or even near-universal Comprehension. That is, the user does not in fact know best. But anybody with a young child knows that while comprehension is desirable it's not strictly necessary to safety. A twelve month old does not understand why they shouldn't run towards the open fireplace, but fire guards work anyway. The child does not learn why fires are dangerous, but horrible accidents are prevented.

@bemasc Thank you for the clarifying title change.

@estark37
Copy link

estark37 commented Nov 7, 2020

@tialaramex yes this research is not about the 'thisisunsafe' bypass specifically; rather, the point is that people do not ignore security warnings categorically, and to the contrary they heed them quite often when they are well-designed.

@davidben
Copy link
Contributor Author

davidben commented Dec 3, 2020

Oops, lost track of this thread at some point. I'm fine switching it to MAY. This whole situation makes me sad, but so it goes.

If we do that, I'd suggest we also be clearer as to what it means for a server to send HTTPS DNS records. If the MAY is to allow clients to use fatal errors if they feel they've addressed the non-server-fault cases, we should support that with the server rules. (For the web, something like you must refresh your certs on time and don't intentionally deploy self-signed or otherwise invalid certs.)

It also occurs to me that, when HTTPS records are fetched over DoH, the non-server-fault cases are probably already a lost cause. If the cert error is due to bad client clock, captive portal, or misconfigured "trusted" MITM, the DoH connection itself will probably fail anyway. By the time a browser has managed to fetch anything over DoH, I think we can more-or-less assume remaining causes are server-fault.

HTTPS records over Do53 are another matter, of course. There the concerns around non-server-fault causes still apply. (Minus possibly the captive portal one, if the captive portal is hijacking DNS anyway.)

@ericorth
Copy link

ericorth commented Dec 4, 2020

Feels to me like we've gone beyond the scope of DNSOP with this one. We're debating basic HTTPS/TLS behavior, which currently allows user recourse. Making suggestions and writing up security considerations is one thing, but SVCB shouldn't be making requirements (unless necessary for the SVCB interaction) for the HTTPS/TLS connections to behave differently from normal. If people don't like that TLS allows clients options to continue after cert failures, better to take that up in TLS-related WGs (who could later add a relevant SVCB parameter if they need to make it specifiable through DNS).

I don't think we're going to get consensus here or stay reasonable to DNSOP scope by doing anything other than making this a MAY and writing up a couple paragraphs in security considerations.

@bemasc
Copy link
Collaborator

bemasc commented Dec 4, 2020

I've updated #274 to reflect the apparent consensus here. Please review.

@ericorth
Copy link

ericorth commented Dec 4, 2020

Looks good to me. @estark37: How is this new language for you?

enygren pushed a commit that referenced this issue Dec 5, 2020
* Relax the "no recourse" policy for HTTPS-RR-HSTS

This change adjusts the requirements to require termination without
recourse (to MAY)  while being clear that servers publishing HTTPS/SVCB 
records MUST NOT rely on clients having a user-recourse option.
Fixes #87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants