Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable choice from multiple Unicode back ends #965

Merged
merged 11 commits into from
Oct 29, 2024
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,7 @@ URL library for Rust, based on the [URL Standard](https://url.spec.whatwg.org/).
[Documentation](https://docs.rs/url)

Please see [UPGRADING.md](https://github.com/servo/rust-url/blob/main/UPGRADING.md) if you are upgrading from a previous version.

## Alternative Unicode back ends

`url` depends on the `idna` crate. By default, `idna` uses [ICU4X](https://github.com/unicode-org/icu4x/) as its Unicode back end. If you wish to opt for different tradeoffs between correctness, run-time performance, binary size, compile time, and MSRV, please see the [README of the latest version of the `idna_adapter` crate](https://docs.rs/crate/idna_adapter/latest) for how to opt into a different Unicode back end.
9 changes: 4 additions & 5 deletions idna/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
[package]
name = "idna"
version = "1.0.2"
version = "1.0.3"
authors = ["The rust-url developers"]
description = "IDNA (Internationalizing Domain Names in Applications) and Punycode."
keywords = ["no_std", "web", "http"]
repository = "https://github.com/servo/rust-url/"
license = "MIT OR Apache-2.0"
autotests = false
edition = "2018"
rust-version = "1.67"
rust-version = "1.57" # For panic in const context

[lib]
doctest = false
Expand All @@ -17,7 +17,7 @@ doctest = false
default = ["std", "compiled_data"]
std = ["alloc"]
alloc = []
compiled_data = ["icu_normalizer/compiled_data", "icu_properties/compiled_data"]
compiled_data = ["idna_adapter/compiled_data"]

[[test]]
name = "tests"
Expand All @@ -36,10 +36,9 @@ tester = "0.9"
serde_json = "1.0"

[dependencies]
icu_normalizer = "1.4.3"
icu_properties = "1.4.2"
utf8_iter = "1.0.4"
smallvec = { version = "1.13.1", features = ["const_generics"]}
idna_adapter = "1"

[[bench]]
name = "all"
Expand Down
4 changes: 4 additions & 0 deletions idna/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,10 @@ Apps that need to display host names to the user should use `uts46::Uts46::to_us
* `std` - Adds `impl std::error::Error for Errors {}` (and implies `alloc`).
* By default, all of the above are enabled.

## Alternative Unicode back ends

By default, `idna` uses [ICU4X](https://github.com/unicode-org/icu4x/) as its Unicode back end. If you wish to opt for different tradeoffs between correctness, run-time performance, binary size, compile time, and MSRV, please see the [README of the latest version of the `idna_adapter` crate](https://docs.rs/crate/idna_adapter/latest) for how to opt into a different Unicode back end.

## Breaking changes since 0.5.0

* Stricter IDNA 2008 restrictions are no longer supported. Attempting to enable them panics immediately. UTS 46 allows all the names that IDNA 2008 allows, and when transitional processing is disabled, they resolve the same way. There are additional names that IDNA 2008 disallows but UTS 46 maps to names that IDNA 2008 allows (notably, input is mapped to fold-case output). UTS 46 also allows symbols that were allowed in IDNA 2003 as well as newer symbols that are allowed according to the same principle. (Earlier versions of this crate allowed rejecting such symbols. Rejecting characters that UTS 46 maps to IDNA 2008-permitted characters wasn't supported in earlier versions, either.)
Expand Down
6 changes: 6 additions & 0 deletions idna/benches/all.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ fn to_ascii_cow_plain(bench: &mut Bencher) {
bench.iter(|| idna::domain_to_ascii_cow(black_box(encoded), idna::AsciiDenyList::URL));
}

fn to_ascii_cow_hyphen(bench: &mut Bencher) {
let encoded = "hyphenated-example.com".as_bytes();
bench.iter(|| idna::domain_to_ascii_cow(black_box(encoded), idna::AsciiDenyList::URL));
}

fn to_ascii_cow_leading_digit(bench: &mut Bencher) {
let encoded = "1test.example".as_bytes();
bench.iter(|| idna::domain_to_ascii_cow(black_box(encoded), idna::AsciiDenyList::URL));
Expand Down Expand Up @@ -99,6 +104,7 @@ benchmark_group!(
to_ascii_simple,
to_ascii_merged,
to_ascii_cow_plain,
to_ascii_cow_hyphen,
to_ascii_cow_leading_digit,
to_ascii_cow_unicode_mixed,
to_ascii_cow_punycode_mixed,
Expand Down
Loading
Loading