Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable choice from multiple Unicode back ends #965

Merged
merged 11 commits into from
Oct 29, 2024

Conversation

hsivonen
Copy link
Collaborator

(This PR is intentionally to the idna-v1x branch as an intermediate step and not directly to main.)

This PR is a draft due to the dependency declaration using path.

Build time charts of reqwest without application code

Logically, the review here involves reviewing also:

  • The main branch of idna_mapping.
    • TODO before release:
      • Change version to 1.0.0
      • Optional: Move the repo from github.com/hsivonen/ to github.com/servo/
    • The unicode-joining-type dependency is behind this crate to hide its upcoming semver break from idna_adapter and to keep open the option to provide the data directly in idna_mapping if needed.
  • idna_adapter branches:
    • Optional TODO before release: Move the repo from github.com/hsivonen/ to github.com/servo/
    • main branch (ICU4X 1.x from icu_normalizer 1.4.3 and icu_properties 1.4.2 onwards but not 2.x. Currently in practice 1.5.)
      • TODO before release: Change version to 1.2.0
    • unicode-rs branch (idna_mapping, unicode-normalization, unicode-bidi)
      • TODO before release: Change version to 1.1.0
    • no-unicode branch (No Unicode back end; rejects Unicode-form input and does not check Punycode input for rules that require Unicode data)

idna_adapter also has a branch called icu4x-trunk. It has a path dependency to an adjacent icu4x checkout. Once ICU4X 2.0 has been released, the expectation is to merge this code to the main branch of idna_adapter as version 1.2.1.

I have omitted previously-prototyped ICU4X-1.x-relevant performance hacks from idna when the performance issues will be remedied by ICU4X 2.0.

On AMD Ryzen Threadripper PRO 5975WX, the fastest path that does not call into idna_adapter at all apart from its constructor is a nanosecond slower with ICU4X 1.x or 2.x compared with unicode-rs or no-unicode. In the ICU4X 2.0 case, it would be possible to put the adapter in a static, which logically is supposed to avoid running the constructor in the fastest-path case, but that makes it another nanosecond slower, which generally makes no sense. I didn't investigate this further, but from this, it's reasonable to guess that there are likely some issues with the code layout interacting with the instruction cache or somesuch in an unlucky way, and it's not worthwhile to try to optimize for such effects. (When there's any Unicode involved at all, ICU4X performs better than unicode-rs.)

Technically, idna_adapter and idna_mapping could be published to crates.io with the repos being under my GitHub account. My understanding of Servo policies is that moving the repos to under github.com/servo/ requires an OK from the TSC as a matter of formality, since the licenses are not MPL2. I have posted a request to get such an OK when the TSC meets on September 23rd.

@hsivonen hsivonen requested a review from valenting September 16, 2024 08:09
@hsivonen
Copy link
Collaborator Author

The CI failures are due to the path dependencies, which can go away once idna_mapping and idna_adapter are on crates.io.

@hsivonen
Copy link
Collaborator Author

And the intent, of course, is to make idna depend on idna_adapter version 1 once replacing the path dependency.

@valenting
Copy link
Collaborator

Looks great so far!

@hsivonen
Copy link
Collaborator Author

Per Zulip regarding Servo TSC discussion about the Unicode-3.0 part here: "everyone was fine, and it uses an open source license so no issues with that"

@hsivonen
Copy link
Collaborator Author

Per today's Servo TSC meeting, I'll publish idna_adapter and idna_mapping to crates.io with the repos remaining under my GitHub account at least for now.

Copy link

codecov bot commented Oct 29, 2024

Codecov Report

Attention: Patch coverage is 73.52941% with 9 lines in your changes missing coverage. Please review.

Project coverage is 80.48%. Comparing base (f700ca8) to head (662970f).
Report is 15 commits behind head on idna-v1x.

Files with missing lines Patch % Lines
idna/src/uts46.rs 73.52% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           idna-v1x     #965      +/-   ##
============================================
+ Coverage     79.60%   80.48%   +0.88%     
============================================
  Files            23       24       +1     
  Lines          4221     4253      +32     
============================================
+ Hits           3360     3423      +63     
+ Misses          861      830      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hsivonen
Copy link
Collaborator Author

Looks like the coverage check is bad at dealing with expressions formatted over multiple lines.

@hsivonen hsivonen merged commit 59c7ea3 into servo:idna-v1x Oct 29, 2024
11 checks passed
@hsivonen
Copy link
Collaborator Author

hsivonen commented Oct 29, 2024

Looks great so far!

Merged per out-of-band check to interpret this comment as review.

@hsivonen hsivonen deleted the adapter branch October 29, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants