-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GREASE incompatible with Cloudflare #206
Comments
The goal of grease is to prevent ossification of mechanisms that should allow for protocol extension. Cloudflare's implementation does try to support all kinds (that's where I learned of the kinds of grease from in the first place). It could be:
|
I tried it out and got the same error as you. In general there are 3 of grease types. One for settings-id, frame type and stream type. My guess was that the peer received a wrong body, because we got a http response (which says "400 Invlid query") an not a connection closure with a http3 error. I also cloned and modified the quiche http3 example server a bit to log the received body. When sending the post from your "minimal" reproducible example" i received the same body with and without grease. Also important to say, i got the error one time when sending no grease frame (with the setting to true but without this https://github.com/hyperium/h3/blob/dccb3cdae9d5a9d720fae5f774b53f0bd8a16019/h3/src/connection.rs#L845C15-L847C55 ) All of this and the fact that the google server worked lets me guess that there is a race condition which get maybe triggered by the way the cloudflare server responds. And which is more likely to happen when sending the grease frame. But these are just guesses. If I have the time i will look at it tomorrow again to see if i can find out more. |
Description
Patch
|
Update@Ruben2424 In some complex use cases, failures sometimes occur even if send_grease is false (HTTP/3.0 500 Internal Server Error, Worker threw exception, Exception GET, Request with a GET or HEAD method cannot have a body.) Good times and bad (Sometimes it works, sometimes it fails), maybe there are data races issues. Complex casesuse std::{net::{IpAddr, Ipv4Addr, SocketAddr}, path::PathBuf, sync::Arc, time::Duration};
use bytes::{Buf, Bytes, BytesMut};
use http::{header::HOST, uri::PathAndQuery, HeaderValue, Request, StatusCode, Uri};
use rustls::pki_types::{CertificateDer, PrivateKeyDer};
use structopt::StructOpt;
use tokio::{fs::File, io::AsyncReadExt, sync::Mutex, time::sleep};
use tracing::{error, info, trace_span, warn};
use h3::{error::ErrorLevel, quic::BidiStream, server::RequestStream};
use h3_quinn::quinn::{self, crypto::rustls::QuicServerConfig};
#[derive(StructOpt, Debug)]
#[structopt(name = "server")]
struct Opt {
#[structopt(
name = "dir",
short,
long,
help = "Root directory of the files to serve. \
If omitted, server will respond OK."
)]
pub root: Option<PathBuf>,
#[structopt(
short,
long,
default_value = "[::1]:4433",
help = "What address:port to listen for new connections"
)]
pub listen: SocketAddr,
#[structopt(flatten)]
pub certs: Certs,
#[structopt(
long,
default_value = "https://example.com",
help = "Uri"
)]
pub uri: String,
}
#[derive(StructOpt, Debug)]
pub struct Certs {
#[structopt(
long,
short,
default_value = "examples/server.cert",
help = "Certificate for TLS. If present, `--key` is mandatory."
)]
pub cert: PathBuf,
#[structopt(
long,
short,
default_value = "examples/server.key",
help = "Private key for the certificate."
)]
pub key: PathBuf,
}
static ALPN: &[u8] = b"h3";
async fn create_connection(uri: &str) -> Result<
(
h3::client::Connection<h3_quinn::Connection, bytes::Bytes>,
h3::client::SendRequest<h3_quinn::OpenStreams, bytes::Bytes>
),
Box<dyn std::error::Error>>
{
let base_uri = uri.parse::<http::Uri>()?;
if base_uri.scheme() != Some(&http::uri::Scheme::HTTPS) {
return Err("uri scheme must be 'https'".into());
}
let auth = base_uri.authority().ok_or("uri must have a host")?.clone();
let port = auth.port_u16().unwrap_or(443);
let addr = tokio::net::lookup_host((auth.host(), port))
.await?
.next()
.ok_or("dns found no addresses")?;
info!("DNS lookup for {:?}: {:?}", base_uri, addr);
// create quinn client endpoint
// load CA certificates stored in the system
let mut roots = rustls::RootCertStore::empty();
match rustls_native_certs::load_native_certs() {
Err(e) => error!("couldn't load any default trust roots: {}", e),
Ok(certs) => {
for cert in certs {
if let Err(e) = roots.add(cert) {
error!("failed to parse trust anchor: {}", e);
}
}
}
};
let mut tls_config = rustls::ClientConfig::builder()
.with_root_certificates(roots)
.with_no_client_auth();
tls_config.enable_early_data = true;
tls_config.alpn_protocols = vec![ALPN.into()];
let mut client_endpoint = h3_quinn::quinn::Endpoint::client("[::]:0".parse().unwrap())?;
let client_config = quinn::ClientConfig::new(Arc::new(
quinn::crypto::rustls::QuicClientConfig::try_from(tls_config)?,
));
client_endpoint.set_default_client_config(client_config);
let conn = client_endpoint.connect(addr, auth.host())?.await?;
info!("QUIC connection established");
// create h3 client
let quinn_conn = h3_quinn::Connection::new(conn);
let part_h3_client : Result<(
h3::client::Connection<h3_quinn::Connection, bytes::Bytes>,
h3::client::SendRequest<h3_quinn::OpenStreams, bytes::Bytes>
), h3::Error> = h3::client::builder()
.send_grease(false)
.build(quinn_conn).await;
match part_h3_client {
Ok(value) => Ok(value),
Err(error) => Err(Box::new(error)),
}
}
async fn create_endpoint(opt : &Opt) -> Result<h3_quinn::Endpoint, Box<dyn std::error::Error>>
{
let result_certificate_private_key_text = utils::get_certificate_private_key_text(
String::from("root_certificate.crt"),
String::from("root_private_key.pem")
);
let (root_certificate, root_private_key) = match result_certificate_private_key_text {
Ok((root_certificate, root_private_key)) => (root_certificate, root_private_key),
Err(error) => return Err(error.into()),
};
let result_rcgen_certificate_private_key = utils::get_rcgen_certificate_private_key(
&root_certificate,
&root_private_key
);
let (root_certificate, root_private_key) = match result_rcgen_certificate_private_key {
Ok((root_certificate, root_private_key)) => (root_certificate, root_private_key),
Err(error) => return Err(error.into()),
};
// create quinn server endpoint and bind UDP socket
// both cert and key must be DER-encoded
let Certs { cert, key } = &opt.certs;
let cert = CertificateDer::from(std::fs::read(cert)?);
let key = PrivateKeyDer::try_from(std::fs::read(key)?)?;
let mut tls_config = rustls::ServerConfig::builder()
.with_no_client_auth()
.with_single_cert(vec![cert], key)?;
tls_config.max_early_data_size = u32::MAX;
tls_config.alpn_protocols = vec![ALPN.into()];
let server_config = quinn::ServerConfig::with_crypto(
Arc::new(
QuicServerConfig::try_from(tls_config)?
)
);
let result_endpoint = quinn::Endpoint::server(
server_config,
opt.listen,
);
match result_endpoint {
Ok(value) => Ok(value),
Err(error) => Err(Box::new(error))
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
tracing_subscriber::fmt()
.with_env_filter(tracing_subscriber::EnvFilter::from_default_env())
.with_span_events(tracing_subscriber::fmt::format::FmtSpan::FULL)
.with_writer(std::io::stderr)
.with_max_level(tracing::Level::TRACE)
.with_line_number(true)
.with_level(true)
.init();
// process cli arguments
let opt = Opt::from_args();
let root = if let Some(root) = &opt.root {
if !root.is_dir() {
return Err(format!("{}: is not a readable directory", root.display()).into());
} else {
info!("serving {}", root.display());
Arc::new(Some(root.clone()))
}
} else {
Arc::new(None)
};
info!("listening on {}", opt.listen);
let endpoint = create_endpoint(&opt).await?;
// handle incoming connections and requests
while let Some(new_conn) = endpoint.accept().await {
let root = root.clone();
trace_span!("New connection being attempted");
let (driver, send_request) = create_connection(&opt.uri).await?; // <----------
let _driver = Arc::new(driver);
tokio::spawn(async move {
let conn = match new_conn.await {
Ok(value) => value,
Err(error) => {
error!("accepting connection failed: {:?}", error);
return
},
};
info!("new connection established");
let mut h3_conn = match h3::server::Connection::new(h3_quinn::Connection::new(conn)).await {
Ok(value) => value,
Err(error) => {
error!("{:?}", &error);
return
}
};
loop {
let (http_request, stream) = match h3_conn.accept().await {
Ok(Some((req, stream))) => (req, stream),
// indicating no more streams to be received
Ok(None) => {
break;
},
Err(err) => {
error!("error on accept {}", err);
match err.get_error_level() {
ErrorLevel::ConnectionError => break,
ErrorLevel::StreamError => continue,
}
},
};
let arc_driver = Arc::clone(&_driver);
let cloned_send_request = send_request.clone(); // <----------
let root = root.clone();
info!("new request: {:#?}", http_request);
tokio::spawn(async {
let result = handle_request(
arc_driver,
cloned_send_request, // <----------
http_request,
stream, root
);
if let Err(err) = result.await {
error!("handling request failed: {}", err);
}
});
}
});
}
// shut down gracefully
// wait for connections to be closed before exiting
endpoint.wait_idle().await;
Ok(())
}
async fn handle_request<T>(
arc_driver: Arc<h3::client::Connection<h3_quinn::Connection, bytes::Bytes>>,
mut send_request: h3::client::SendRequest<h3_quinn::OpenStreams, bytes::Bytes>,
mut http_request: Request<()>,
mut stream: RequestStream<T, Bytes>,
serve_root: Arc<Option<PathBuf>>,
) -> Result<(), Box<dyn std::error::Error>>
where
T: BidiStream<Bytes>,
{
let new_http_request = http::request::Request::builder()
// .method("GET")
.uri("https://xxx.worker.dev")
// .header("Accept", "*/*")
.body(())
.unwrap();
let mut _send_request = match send_request.send_request(new_http_request).await {
Ok(value) => value,
Err(error) => {
warn!("CLC: {:?}", error);
return Err(Box::new(error))
},
};
// sleep(Duration::from_secs(2));
match _send_request.finish().await {
Ok(value) => value,
Err(error) => {
warn!("CLA: {:?}", error);
return Err(Box::new(error))
},
};
// ============================== //
// ============================== //
// ============================== //
// ============================== //
// ============================== //
let response = match _send_request.recv_response().await {
Ok(value) => value,
Err(error) => {
warn!("CLT: {:?}", error);
return Err(Box::new(error))
},
};
if let Err(error) = stream.send_response(response).await {
warn!("CLT: {:?}", error);
return Err(Box::new(error))
};
loop {
match _send_request.recv_data().await {
Ok(None) => { break },
Ok(Some(mut value)) => {
warn!("YKYP: {:?}", value.remaining());
let payload = value.copy_to_bytes(value.remaining());
if let Err(error) = stream.send_data(payload).await {
error!("{:?}", &error);
return Err(Box::new(error))
}
},
Err(error) => {
error!("{:?}", &error);
return Err(Box::new(error))
}
}
}
match _send_request.recv_trailers().await {
Ok(None) => {},
Ok(Some(trailers)) => {
if let Err(error) = stream.send_trailers(trailers).await {
error!("{:?}", &error);
return Err(Box::new(error))
}
},
Err(error) => {
error!("{:?}", &error);
return Err(Box::new(error))
}
}
match stream.finish().await {
Ok(value) => value,
Err(error) => {
warn!("CLA: {:?}", error);
return Err(Box::new(error))
},
};
Ok(())
} |
Thanks @higlyrecommand for taking a look. What exactly is the "Complex cases" example code you provided? Did you manage to reproduce it with the h3 server? I hope to find time this weekend to take another look. |
Update@Ruben2424 Thanks for your reply, i can reproduce it, I thought this repository was no longer maintained. :) I am currently implementing an By the way SSL_KEY_LOG_file doesn't seem to work and Wireshark can't decrypt it, which prevents me from seeing the specific content.
I.e.Browser DNS <=> LocalDNS (E.g. google.com => 127.0.0.1) ...Currently, the After another day of studying the logs, I've finally made some progress. It seems that during the connection process for the 500 Error, the logs show two pieces of information:
For a 200 OK, there is only one piece of information:
The data begins with @Ruben2424 I apologize for my limited expertise; I've done my best. I thought this project was no longer maintained and was about to give up, but then I saw your reply :) (Currently, I am using Pseudo-code: info!("listening on {}", opt.listen);
let endpoint = create_endpoint(&opt).await?;
// handle incoming connections and requests
while Endpoint.accept {
tokio::spawn(async move {
h3_conn = match Incoming.await {
Ok(Connection) => h3::server::Connection::new(h3_quinn::Connection::new(value)),
Err => ...
}
// `sleep` or `yield_now` added before create_connection has no effect
let (endpoint, mut driver, send_request) = match create_connection("xx.com").await {
Err => ...
}
// sleep(Duration::from_nanos(1)).await; // Add sleep to avoid triggering HTTP 500 Error
// tokio::task::yield_now().await; // Or add this to avoid triggering HTTP 500 Error
tokio::spawn(async move {
tokio::task::yield_now().await; // Or add this to avoid triggering HTTP 500 Error
warn!("KAKAC: Request");
test_request(&mut send_request).await;
warn!("KAKAD: End");
});
let _ = driver.wait_idle().await;
client_endpoint.wait_idle().await;
})
}
// shut down gracefully
// wait for connections to be closed before exiting
endpoint.wait_idle().await;
Ok(())
// ==================================== //
async fn test_request(
send_request: &mut h3::client::SendRequest<h3_quinn::OpenStreams, bytes::Bytes>,
) -> Result<(), Box<dyn std::error::Error>>
{
let new_http_request = http::request::Request::builder()
// .method("GET")
.uri("https://xxx.com")
// .header("Accept", "*/*")
.header("Mirror-Host", "example.com")
.body(())
.unwrap();
let mut _send_request = match send_request.send_request(new_http_request).await {
Ok(value) => value,
Err(error) => {
warn!("CLC: {:?}", error);
return Err(Box::new(error))
},
};
match _send_request.finish().await {
Ok(value) => value,
Err(error) => {
warn!("CLA: {:?}", error);
return Err(Box::new(error))
},
};
if let Ok(response) = _send_request.recv_response().await {
println!("{:?}, {:?}", response.headers(), response.status())
};
Ok(())
} Tracing Log (I tried my best to reduce the difference)
|
Update@Ruben2424 part_error.log |
I tried the original example with doh3 again. I captured the traffic with wireshark and the data sent in the data frame is identical when success or failure. I lean more and more towards thinking that this might be a bug in Cloudflare. |
When I was still debugging this I tried Quiche as a client as well and was unable to reproduce the issue with it. I remember also trying it with cURL, which also used very different ways to establish a HTTP/3 connection (but still the same data). So my best guess until now is that Cloudflare's DNS server doesn't like the way we handle/create QUIC streams to send the message. If Cloudflare uses Quiche for their DNS server as well, we might be able to reproduce the issue locally. |
No matter how many times it is retried, the log always shows: @Ruben2424 But for my case, Emmm, I still think this is data races. |
What for me does not make sense is the fact that the errors are application http errors.
If this is a Problem on the quic or http level I would expect an error like this: Both errors indicate that there is a problem with the body.
My first thought was that there may be an issue that there is wrong data sent on the body triggered by a race condition.
This indicates the same behavior. In any case there are 49 bytes sent on the request stream. This includes headers, body data, and potential grease frames. The only difference that i can see here that the FIN bit is set in a different quic frame. As far as I know this is is legit and not an error. But to be 100% sure that in this case there was the same data sent in both cases, we need to see what exactly the 49 bytes are.
I honestly dont know.
I will try again to reproduce it localy with quiche. But in case I cannot reproduce it locally I have no idea hot to find the cause of the problem. |
@Ruben2424 Okay, got it, Thank you for your help. I created the repository (https://github.com/highlyrecommand/h3_test), There is no problem with AWS or Azure in Hong Kong and other countries (cannot reproduce). This problem is very strange. Thanks for everyone's answers and help, I tried to solve it myself and will update here if there is any new progress.
addEventListener(
'fetch', event => {
const worker_mirror_host = event.request.headers.get('Worker-Mirror-Host');
const url = new URL(event.request.url);
url.hostname = worker_mirror_host;
const modifiedRequest = new Request(url, {
body: event.request.body,
headers: event.request.headers,
method: event.request.method,
redirect: event.request.redirect
});
modifiedRequest.headers.delete('Worker-Mirror-Host');
modifiedRequest.headers.delete('CF-Connecting-IP');
modifiedRequest.headers.delete('X-Forwarded-For');
modifiedRequest.headers.delete('X-Real-IP');
const request = new Request(url, modifiedRequest);
event.respondWith(fetch(request));
}
) |
Update@Ruben2424 OK, I finally solved the problem with an acceptable solution (Cloudflare Workers side) (if ("GET" == request.method) || ("HEAD" == request.method);) Workers side hackexport default {
async fetch(request, env, ctx) {
const get_or_head = ("GET" == request.method) || ("HEAD" == request.method);
const worker_mirror_host = request.headers.get('Worker-Mirror-Host');
const url = new URL(request.url);
url.hostname = worker_mirror_host;
const headers = new Headers(request.headers);
headers.delete('Worker-Mirror-Host');
headers.delete('CF-Connecting-IP');
headers.delete('X-Forwarded-For');
headers.delete('X-Real-IP');
const modifiedRequest = new Request(url, {
method: request.method,
headers: headers,
body: (get_or_head ? null : request.body),
redirect: request.redirect,
});
return fetch(modifiedRequest);
},
}; My guess: maybe some strange reason makes cloudflare think there is a body in the request, and the body length is 0. But there is no need to do this in quiche curl, and 500 Error will never be triggered.
Finally or EndingI can accept this solution, thank you all for your help. |
I am still curious why cloudflare receives a body when there is none sent. Would you mind printing out the body or something like that when one is received. Maybe this will help finding the underlying problem. |
Reply@Ruben2424 No problem, I have recaptured two logs, one with 500 ERROR and one with 200 OK. The response Body is in the log.I replaced the china IP address and domain name with The execution order of function calls in the logs of 500ERROR and 200OK is very different, just because a This could be affecting the async scheduler?, the queue?, or some internal mechanism?, poll?, I only have a rudimentary understanding of asynchronous, so doing my best to figure out the problem. And I found the strangest thing, if I put and I cannot decrypt the traffic via SSL_KEY_LOG_FILE, if I can successfully decrypt it I will provide you the wireshark pcap file. (I don't know why, how do you do it?) By the wayBy the way, It seems that this and quinn is very different from quiche. quinn uses inner padding (the padding content is also encrypted), while quiche uses outer padding (00000000 zero data until 1200 Bytes), The Great Firewall of China can recognize/identify quinn data packet, but not quiche. ThanksThank you again for replying to me. 🙏 Attachments |
As per the logs it looks like http/500 contains a request header It would be interesting to see from the client logs if this header is part of the sent h3 request. |
I noticed in https://github.com/bluejekyll/trust-dns/pull/1987 that Cloudflare's HTTP/3 implementation seems to be incompatible (or there might be a bug?) with
h3
.I made a "minimal" reproducible example here: https://github.com/daxpedda/doh3-test.
But to summarize:
This sends a DoH3 query over POST for
www.example.com
to Cloudflare's 1.1.1.1 DNS server, which responds with400 Bad Request
and a body withInvalid query.
.Couple of observations I already made:
h3
on Wireshark, they look quite different, so it might not be an issue with GREASE, but some HTTP/3 incompatibility that is triggered by GREASE.I didn't dig deeper into this as I'm not familiar with the HTTP/3 protocol and this I "easily" solved for me by just deactivating the GREASE.
I'm happy to dig deeper into it with some guidance.
Related #205.
The text was updated successfully, but these errors were encountered: