-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iroh-net): Implement the https probe #2903
Changes from all commits
790006d
500973c
4ef613b
f654d2c
9fd2035
feca9ef
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,14 +35,15 @@ use tokio::{ | |
}; | ||
use tokio_util::task::AbortOnDropHandle; | ||
use tracing::{debug, debug_span, error, info_span, trace, warn, Instrument, Span}; | ||
use url::Host; | ||
|
||
use super::NetcheckMetrics; | ||
use crate::{ | ||
defaults::DEFAULT_STUN_PORT, | ||
dns::{DnsResolver, ResolverExt}, | ||
netcheck::{self, Report}, | ||
ping::{PingError, Pinger}, | ||
relay::{RelayMap, RelayNode, RelayUrl}, | ||
relay::{http::RELAY_PROBE_PATH, RelayMap, RelayNode, RelayUrl}, | ||
stun, | ||
util::MaybeFuture, | ||
}; | ||
|
@@ -466,6 +467,7 @@ impl Actor { | |
.as_ref() | ||
.and_then(|l| l.preferred_relay.clone()); | ||
|
||
let dns_resolver = self.dns_resolver.clone(); | ||
let dm = self.relay_map.clone(); | ||
self.outstanding_tasks.captive_task = true; | ||
MaybeFuture { | ||
|
@@ -474,7 +476,7 @@ impl Actor { | |
debug!("Captive portal check started after {CAPTIVE_PORTAL_DELAY:?}"); | ||
let captive_portal_check = tokio::time::timeout( | ||
CAPTIVE_PORTAL_TIMEOUT, | ||
check_captive_portal(&dm, preferred_relay) | ||
check_captive_portal(&dns_resolver, &dm, preferred_relay) | ||
.instrument(debug_span!("captive-portal")), | ||
); | ||
match captive_portal_check.await { | ||
|
@@ -743,7 +745,7 @@ async fn run_probe( | |
} | ||
Probe::Https { ref node, .. } => { | ||
debug!("sending probe HTTPS"); | ||
match measure_https_latency(node).await { | ||
match measure_https_latency(&dns_resolver, node, None).await { | ||
Ok((latency, ip)) => { | ||
result.latency = Some(latency); | ||
// We set these IPv4 and IPv6 but they're not really used | ||
|
@@ -853,7 +855,11 @@ async fn run_stun_probe( | |
/// return a "204 No Content" response and checking if that's what we get. | ||
/// | ||
/// The boolean return is whether we think we have a captive portal. | ||
async fn check_captive_portal(dm: &RelayMap, preferred_relay: Option<RelayUrl>) -> Result<bool> { | ||
async fn check_captive_portal( | ||
dns_resolver: &DnsResolver, | ||
dm: &RelayMap, | ||
preferred_relay: Option<RelayUrl>, | ||
) -> Result<bool> { | ||
// If we have a preferred relay node and we can use it for non-STUN requests, try that; | ||
// otherwise, pick a random one suitable for non-STUN requests. | ||
let preferred_relay = preferred_relay.and_then(|url| match dm.get_node(&url) { | ||
|
@@ -881,9 +887,23 @@ async fn check_captive_portal(dm: &RelayMap, preferred_relay: Option<RelayUrl>) | |
} | ||
}; | ||
|
||
let client = reqwest::ClientBuilder::new() | ||
.redirect(reqwest::redirect::Policy::none()) | ||
.build()?; | ||
let mut builder = reqwest::ClientBuilder::new().redirect(reqwest::redirect::Policy::none()); | ||
if let Some(Host::Domain(domain)) = url.host() { | ||
// Use our own resolver rather than getaddrinfo | ||
// | ||
// For some reason reqwest wants SocketAddr rather than IpAddr but then proceeds to | ||
// ignore the port, extracting it from the URL instead. We supply a dummy port. | ||
// | ||
// Ideally we would try to resolve **both** IPv4 and IPv6 rather than purely race | ||
// them. But our resolver doesn't support that yet. | ||
let addrs: Vec<_> = dns_resolver | ||
.lookup_ipv4_ipv6_staggered(domain, DNS_TIMEOUT, DNS_STAGGERING_MS) | ||
.await? | ||
.map(|ipaddr| SocketAddr::new(ipaddr, 80)) | ||
.collect(); | ||
builder = builder.resolve_to_addrs(domain, &addrs); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another option is to implement reqwests' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I really like the idea of being able to keep using Is it feasible to lean in to reqwest even more heavily? It seems like it supports HTTP upgrades and thus we could use it in even more places. If we manage to replace all usage of direct hyper with reqwest, we might be able to have one universal way of configuring custom DNS resolvers or HTTPS proxying. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. https://docs.rs/iroh-net/latest/iroh_net/endpoint/struct.Builder.html#method.dns_resolver is the current way to configure the DNS resolver for users. That's what this is picking up. I don't have a lot of opinion on reqwest use, I've seen people remove it in favour of more low-level control and to reduce the dependency count. We also need to take into account #2901. But if we can do everything with reqwest and there's benefits to it then I don't have anything against it. |
||
} | ||
let client = builder.build()?; | ||
|
||
// Note: the set of valid characters in a challenge and the total | ||
// length is limited; see is_challenge_char in bin/iroh-relay for more | ||
|
@@ -1023,33 +1043,72 @@ async fn run_icmp_probe( | |
Ok(report) | ||
} | ||
|
||
/// Executes an HTTPS probe. | ||
/// | ||
/// If `certs` is provided they will be added to the trusted root certificates, allowing the | ||
/// use of self-signed certificates for servers. Currently this is used for testing. | ||
#[allow(clippy::unused_async)] | ||
async fn measure_https_latency(_node: &RelayNode) -> Result<(Duration, IpAddr)> { | ||
bail!("not implemented"); | ||
// TODO: | ||
// - needs relayhttp::Client | ||
// - measurement hooks to measure server processing time | ||
|
||
// metricHTTPSend.Add(1) | ||
// let ctx, cancel := context.WithTimeout(httpstat.WithHTTPStat(ctx, &result), overallProbeTimeout); | ||
// let dc := relayhttp.NewNetcheckClient(c.logf); | ||
// let tlsConn, tcpConn, node := dc.DialRegionTLS(ctx, reg)?; | ||
// if ta, ok := tlsConn.RemoteAddr().(*net.TCPAddr); | ||
// req, err := http.NewRequestWithContext(ctx, "GET", "https://"+node.HostName+"/relay/latency-check", nil); | ||
// resp, err := hc.Do(req); | ||
|
||
// // relays should give us a nominal status code, so anything else is probably | ||
// // an access denied by a MITM proxy (or at the very least a signal not to | ||
// // trust this latency check). | ||
// if resp.StatusCode > 299 { | ||
// return 0, ip, fmt.Errorf("unexpected status code: %d (%s)", resp.StatusCode, resp.Status) | ||
// } | ||
// _, err = io.Copy(io.Discard, io.LimitReader(resp.Body, 8<<10)); | ||
// result.End(c.timeNow()) | ||
|
||
// // TODO: decide best timing heuristic here. | ||
// // Maybe the server should return the tcpinfo_rtt? | ||
// return result.ServerProcessing, ip, nil | ||
async fn measure_https_latency( | ||
dns_resolver: &DnsResolver, | ||
node: &RelayNode, | ||
certs: Option<Vec<rustls::pki_types::CertificateDer<'static>>>, | ||
) -> Result<(Duration, IpAddr)> { | ||
let url = node.url.join(RELAY_PROBE_PATH)?; | ||
|
||
// This should also use same connection establishment as relay client itself, which | ||
// needs to be more configurable so users can do more crazy things: | ||
// https://github.com/n0-computer/iroh/issues/2901 | ||
let mut builder = reqwest::ClientBuilder::new().redirect(reqwest::redirect::Policy::none()); | ||
if let Some(Host::Domain(domain)) = url.host() { | ||
// Use our own resolver rather than getaddrinfo | ||
// | ||
// For some reason reqwest wants SocketAddr rather than IpAddr but then proceeds to | ||
// ignore the port, extracting it from the URL instead. We supply a dummy port. | ||
// | ||
// The relay Client uses `.lookup_ipv4_ipv6` to connect, so use the same function | ||
// but staggered for reliability. Ideally this tries to resolve **both** IPv4 and | ||
// IPv6 though. But our resolver does not have a function for that yet. | ||
let addrs: Vec<_> = dns_resolver | ||
.lookup_ipv4_ipv6_staggered(domain, DNS_TIMEOUT, DNS_STAGGERING_MS) | ||
.await? | ||
.map(|ipaddr| SocketAddr::new(ipaddr, 80)) | ||
.collect(); | ||
builder = builder.resolve_to_addrs(domain, &addrs); | ||
} | ||
if let Some(certs) = certs { | ||
for cert in certs { | ||
let cert = reqwest::Certificate::from_der(&cert)?; | ||
builder = builder.add_root_certificate(cert); | ||
} | ||
} | ||
let client = builder.build()?; | ||
|
||
let start = Instant::now(); | ||
let mut response = client.request(reqwest::Method::GET, url).send().await?; | ||
let latency = start.elapsed(); | ||
if response.status().is_success() { | ||
// Drain the response body to be nice to the server, up to a limit. | ||
const MAX_BODY_SIZE: usize = 8 << 10; // 8 KiB | ||
let mut body_size = 0; | ||
while let Some(chunk) = response.chunk().await? { | ||
body_size += chunk.len(); | ||
if body_size >= MAX_BODY_SIZE { | ||
break; | ||
} | ||
} | ||
|
||
// Only `None` if a different hyper HttpConnector in the request. | ||
let remote_ip = response | ||
.remote_addr() | ||
.context("missing HttpInfo from HttpConnector")? | ||
.ip(); | ||
Ok((latency, remote_ip)) | ||
} else { | ||
Err(anyhow!( | ||
"Error response from server: '{}'", | ||
response.status().canonical_reason().unwrap_or_default() | ||
)) | ||
} | ||
} | ||
|
||
/// Updates a netcheck [`Report`] with a new [`ProbeReport`]. | ||
|
@@ -1118,8 +1177,13 @@ fn update_report(report: &mut Report, probe_report: ProbeReport) { | |
mod tests { | ||
use std::net::{Ipv4Addr, Ipv6Addr}; | ||
|
||
use testresult::TestResult; | ||
|
||
use super::*; | ||
use crate::defaults::staging::{default_eu_relay_node, default_na_relay_node}; | ||
use crate::{ | ||
defaults::staging::{default_eu_relay_node, default_na_relay_node}, | ||
test_utils, | ||
}; | ||
|
||
#[test] | ||
fn test_update_report_stun_working() { | ||
|
@@ -1368,4 +1432,28 @@ mod tests { | |
panic!("Ping error: {err:#}"); | ||
} | ||
} | ||
|
||
#[tokio::test] | ||
async fn test_measure_https_latency() -> TestResult { | ||
let _logging_guard = iroh_test::logging::setup(); | ||
let (_relay_map, relay_url, server) = test_utils::run_relay_server().await?; | ||
let dns_resolver = crate::dns::resolver(); | ||
warn!(?relay_url, "RELAY_URL"); | ||
let node = RelayNode { | ||
stun_only: false, | ||
stun_port: 0, | ||
url: relay_url.clone(), | ||
}; | ||
let (latency, ip) = | ||
measure_https_latency(dns_resolver, &node, server.certificates()).await?; | ||
|
||
assert!(latency > Duration::ZERO); | ||
|
||
let relay_url_ip = relay_url | ||
.host_str() | ||
.context("host")? | ||
.parse::<std::net::IpAddr>()?; | ||
assert_eq!(ip, relay_url_ip); | ||
Ok(()) | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,7 +44,9 @@ pub async fn run_relay_server() -> Result<(RelayMap, RelayUrl, Server)> { | |
pub async fn run_relay_server_with( | ||
stun: Option<StunConfig>, | ||
) -> Result<(RelayMap, RelayUrl, Server)> { | ||
let cert = rcgen::generate_simple_self_signed(vec!["localhost".to_string()]).unwrap(); | ||
let cert = | ||
rcgen::generate_simple_self_signed(vec!["localhost".to_string(), "127.0.0.1".to_string()]) | ||
.expect("valid"); | ||
let rustls_cert = rustls::pki_types::CertificateDer::from(cert.serialize_der().unwrap()); | ||
let private_key = | ||
rustls::pki_types::PrivatePkcs8KeyDer::from(cert.get_key_pair().serialize_der()); | ||
|
@@ -67,7 +69,7 @@ pub async fn run_relay_server_with( | |
metrics_addr: None, | ||
}; | ||
let server = Server::spawn(config).await.unwrap(); | ||
let url: RelayUrl = format!("https://localhost:{}", server.https_addr().unwrap().port()) | ||
let url: RelayUrl = format!("https://{}", server.https_addr().expect("configured")) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Notice that this changes the relay URL for tests to be the IP address. This is needed to avoid the DNS lookups for |
||
.parse() | ||
.unwrap(); | ||
let m = RelayMap::from_nodes([RelayNode { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flub I think you should probably set these ports to 0, if I'm understanding this issue correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs have seemingly not been updated yet. I'll look into the source if they're actually unused or not now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Also, bad reqwest! #2906
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry... the change was in
hyper_util
not in reqwest so that is why the documentation stayed outdated. seanmonstar/reqwest#2473There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my defense, this change in
hyper_util
is really useful to leverage HTTPS records in custom resolvers. Even more work needed on Reqwest side to decide when to switch to HTTP2/3 according to the ALPN property in HTTPS records.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good. IMO it's more weird to pass in bogus info that's unused. This is what made me skeptical in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also ugh I didn't quite follow the data-flow in reqwest for how these domain overrides are used. It goes through some
Resolver
andtower
traits and just gets complicated quickly. So I'm glad to see someone who knows just update the docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, no worries. These things can be hard to catch, even the maintainer missed it. Thanks for following up on the docs!
And yes, that work you're doing there is certainly cool and useful!