Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always send the Host header first #468

Open
JustAnotherArchivist opened this issue Sep 27, 2021 · 1 comment · May be fixed by #472
Open

Always send the Host header first #468

JustAnotherArchivist opened this issue Sep 27, 2021 · 1 comment · May be fixed by #472

Comments

@JustAnotherArchivist
Copy link
Contributor

Currently, the Host header is always sent last because it is added automatically on wpull.protocol.http.request.Request.prepare_for_send after the other headers were already set. I propose to change this to always send the Host header line first.

Theoretically, this shouldn't matter. The order of header lines is not significant in HTTP. From RFC 7230 section 3.2.2:

The order in which header fields with differing field names are
received is not significant. However, it is good practice to send
header fields that contain control data first, such as Host on
requests and Date on responses, so that implementations can decide
when not to handle a message as early as possible.

Unfortunately, it appears that Cloudflare is (since recently?) treating requests where the Host header doesn't come first differently.

Example of different header order producing different results on Cloudflare with curl
> curl -A 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0' https://bund.lkr.de/ -sv --http1.1
[snip]
> GET / HTTP/1.1
> Host: bund.lkr.de
> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/1.1 307 Temporary Redirect
< Date: Mon, 27 Sep 2021 22:35:35 GMT
< Content-Type: text/html;charset=UTF-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< location: /start/
< CF-Cache-Status: DYNAMIC
< Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Report-To: [snip]
< NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
< Strict-Transport-Security: max-age=0; includeSubDomains; preload
< X-Content-Type-Options: nosniff
< Server: cloudflare
< CF-RAY: [snip]
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400
< 
* Connection #0 to host bund.lkr.de left intact

> curl -A 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0' -H 'Host:' -H 'Host: bund.lkr.de' https://bund.lkr.de/ -sv --http1.1
[snip]
> GET / HTTP/1.1
> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
> Accept: */*
> Host: bund.lkr.de
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/1.1 503 Service Temporarily Unavailable
< Date: Mon, 27 Sep 2021 22:35:45 GMT
< Content-Type: text/html; charset=UTF-8
< Transfer-Encoding: chunked
< Connection: close
< X-Frame-Options: SAMEORIGIN
< Permissions-Policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
< Cache-Control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< Expires: Thu, 01 Jan 1970 00:00:01 GMT
< Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Report-To: [snip]
< NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
< Strict-Transport-Security: max-age=0; includeSubDomains; preload
< X-Content-Type-Options: nosniff
< Server: cloudflare
< CF-RAY: [snip]
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400
< 
<!DOCTYPE HTML>
<html lang="en-US">
<head>
  <meta charset="UTF-8" />
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
  <meta name="robots" content="noindex, nofollow" />
  <meta name="viewport" content="width=device-width,initial-scale=1" />
  <title>Just a moment...</title>
[snip]

-H 'Host:' -H 'Host: bund.lkr.de' first removes the header and then adds it again, forcing it to be at the end. The 307 is the expected response for this site, the 503 is the Cloudflare JS challenge.

@JustAnotherArchivist
Copy link
Contributor Author

Another example we stumbled across in #archivebot today. Note that it only happens with HTTP/1.1. Buttflare's HTTP servers are very broken...

Example
> curl -sv -A 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0' -H 'Host:' -H 'Host: cop.unasiapacific.org' https://cop.unasiapacific.org/feed
[snip]
> GET /feed HTTP/2
> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
> Accept: */*
> Host: cop.unasiapacific.org
> 
[snip]
< HTTP/2 200 
[snip]

> curl -sv -A 'Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0' -H 'Host:' -H 'Host: cop.unasiapacific.org' --http1.1 https://cop.unasiapacific.org/feed
[snip]
> GET /feed HTTP/1.1
> User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:60.0) Gecko/20100101 Firefox/60.0
> Accept: */*
> Host: cop.unasiapacific.org
> 
[snip]
< HTTP/1.1 403 Forbidden

@Pokechu22 Pokechu22 linked a pull request Oct 24, 2022 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant