-
-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICU-22838 Add WebAssembly/WASI cross-compilation support #3067
base: main
Are you sure you want to change the base?
ICU-22838 Add WebAssembly/WASI cross-compilation support #3067
Conversation
134d917
to
c079574
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
c079574
to
f398520
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
The original patch is still under review in the upstream ICU project, but it is needed to unblock the swift-foundation build on WebAssembly. See unicode-org/icu#3067
The original patch is still under review in the upstream ICU project, but it is needed to unblock the swift-foundation build on WebAssembly. See unicode-org/icu#3067
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support" The original patch is still under review in the upstream ICU project, but it is needed to unblock the swift-foundation build on WebAssembly. See unicode-org/icu#3067 * [Build] Update compile definitions for WASI target `U_TIMEZONE` must not be defined and dynamic loading features must be disabled for WASI target.
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support" The original patch is still under review in the upstream ICU project, but it is needed to unblock the swift-foundation build on WebAssembly. See unicode-org/icu#3067 * [Build] Update compile definitions for WASI target `U_TIMEZONE` must not be defined and dynamic loading features must be disabled for WASI target.
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support" The original patch is still under review in the upstream ICU project, but it is needed to unblock the swift-foundation build on WebAssembly. See unicode-org/icu#3067 * [Build] Update compile definitions for WASI target `U_TIMEZONE` must not be defined and dynamic loading features must be disabled for WASI target. Co-authored-by: Yuta Saito <[email protected]>
Please rebase on current main and resolve conflicts. Note that the autoconf scripts have been updated in another PR. |
f398520
to
6b36650
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
@markusicu Thanks for updating autoconf stuff. I rebased the branch on the latest main. |
* Add config/mh-wasi and update configure script. WebAssembly/WASI does not support threads and dynamic linking, so the new mh-wasi file disables those features in ICU. * Teach putilimp.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI WASI does not define timezone-related functionalities, and wasi-libc does not provide tzset, timezone, and tzname. This change teaches putilimp.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI. * Add `U_HAVE_ATOMICS` and support platforms without atomics like WASI This change adds a new macro `U_HAVE_ATOMICS` to ICU. This macro is always defined to 1 except for platforms that do not support atomic operations like WASI. Due to the lack of threading support in WASI, the wasi-sdk does not provide `std::atomic`, `std::mutex`, and related types, so the new macro is used to disable the use of these types.
6b36650
to
9d05a26
Compare
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
delete fields->atomicParser.exchange(nullptr); | ||
delete fields->atomicCurrencyParser.exchange(nullptr); | ||
#else | ||
delete fields->atomicParser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably need to assign these two parser fields to be nullptr. The owning object is not being deleted, leaving live invalid pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just implement the exchange
function in your shim.
@kateinoigakukun please don't squash any more until this PR is approved, to help GitHub track comments. |
@aheninger Thank you for your feedback. They totally make sense to me. Addressed them in 008b01c @markusicu Oops, sorry. I'll keep separate fixup commits for further changes. |
* icu4c/source/i18n/decimfmt.cpp: Explicitly set the pointers to nullptr after deleting them. * icu4c/source/i18n/number_mapper.h: Use `= {}` to initialize pointers to nullptr instead of `= nullptr` to keep project-wide consistency.
051260b
to
008b01c
Compare
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
Are you aware of ICU4X, which we've built from the ground up to work well with WebAssembly? It is no_std so doesn't even need WASI in order to build. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of U_HAVE_ATOMICS poking so many holes in the ICU4C code base. Synchronization code is among the hardest and most-error-prone code, and this just makes it more so.
It looks like wasi-threads is still stage 1; is that why you can't use it?
A better solution would be to compile only a subset of ICU4C that doesn't use atomics (I know of clients already doing this), and anything that needs atomics just doesn't get compiled.
Alternatively, see if ICU4X serves your needs.
#ifdef U_HAVE_ATOMICS | ||
/* Use the predefined value. */ | ||
#elif defined(__wasi__) && !defined(_REENTRANT) | ||
/* WASI does not support atomics when wasi-threads feature is not enabled */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why not use wasi-threads, then? It seems error-prone to add a flag that skips mutexes.
delete fields->atomicParser.exchange(nullptr); | ||
delete fields->atomicCurrencyParser.exchange(nullptr); | ||
#else | ||
delete fields->atomicParser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just implement the exchange
function in your shim.
auto* ptr = fields->atomicParser.load(); | ||
#else | ||
auto* ptr = fields->atomicParser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please just implement the load function in your shim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we should add something like icu::atomic<T>
to shim std::atomic<T>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style-wise, it seems cleaner if your new cfg would just polyfill std::atomic
with something that works in WebAssembly. I don't think it's useful for code now and into the future to have to figure out how and where to put your atomic ifdefs.
sffc wrote:
Me neither. I was assuming getting ICU running in this environment was considered worth the added complexity. |
Thanks @sffc for your feedback
Yes, I'm aware of the ground-up work. For our needs, because the Foundation project heavily depends on ICU4C, the migration might not be trivial and take a long time to get all things stable (e.g. API migrations especially for APIs without no drop-in replacement like IDNA, build system integration with Rust toolchain, and so on). At least, it's hard to reason the migration decision just for WASI platform support.
First, wasi-threads is a proposal to provide thread creation APIs and not to provide synchronization primitives. And also the proposal is considered legacy and to be replaced with another proposal (but no implementation for the future proposal yet) The synchronization primitives are provided by a core spec proposal WebAssembly/threads. Under the circumstances, WASI-SDK currently supports two platform variations:
Given that wasi-threads is today consiered legacy, both of them don't satisfy the needs here. So what we want here is synchrnoization primitives but not thread creation (WebAssembly/threads enabled but wasi-threads disabled). However, both WASI-SDK and even libcxx do not support such platform, "synchronization primitives available but no thread creation" for now. I understand your concern about poking holes, and it's a typical pain point when porting existing software to WebAssembly. To fill the gap, there is an ongoing effort to provide stubs for synchronization primitive APIs without actual synchronization operations (WebAssembly/wasi-libc#518) in wasi-libc and WASI-SDK side. However, it requires some more effort to bring libcxx's
Yes, we considered that direction, however the Foundation projects depends on many ICU features including formatters, locales, and timezones and they all use |
@@ -103,6 +103,8 @@ typedef size_t uintptr_t; | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: please build this in GitHub Actions, because otherwise it could break again at any time.
* \def U_HAVE_ATOMICS | ||
* Defines whether the platform supports atomic operations. | ||
* @internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: maybe name it U_ATOMICS_UNSAFE_POLYFILL
or something like that to indicate that this is not disabling atomic code but is instead replacing it with a non-atomic alternative.
From browsing the project, it looks like most calls of ICU go through here: The bulk of the functions in that list have ICU4X equivalents, including calendrical calculations, datetime formatting, custom datetime patterns and symbols, number formatting, relative datetime formatting (experimental in ICU4X), and list formatting. The only function in that list that sticks out where there's not yet an equivalent is number parsing. You mentioned IDNA; I don't see IDNA in that list, but ICU4X aims to export at least the primitives for IDNA (unicode-org/icu4x#2614). I can't comment on your concern about Rust toolchains except to observe that other projects with large amounts of legacy C/C++ code and complex build systems have adopted it in the last two years (e.g. Chromium). Having worked on ICU4C for 5 years before working on ICU4X for 3 years, I feel much more confident in Rust's ability to deliver fast, reliable code in the i18n space.
I don't think a migration would be too difficult, though it would be most successful with a Foundation contributor driving the process in collaboration with the ICU4X team who could fill in any potential gaps through the process. My most recent talk on ICU4X, at UTW 2023, goes over some of the metrics, showing how the team has delivered better code size, memory usage, and performance in most components, especially formatters, which seems to be one of the main components you use in Foundation. I hope to be giving off a healthy sense of enthusiasm in being a point of contact if you decide to go this route.
I see; so it sounds like there is likely room to continue making progress in this space in WASI itself. Do you think once WASI lands support for Also, it seems like there should already exist polyfills for |
fwiw, if WebAssembly/wasi-libc#518 manages to achieve consensus and get merged, the only thing that has to be done to enable |
With this patch, we can cross-compile ICU4C with wasi-sdk as follows:
Sanity check
Patch overview
Add config/mh-wasi and update configure script.
WebAssembly/WASI does not support threads and dynamic linking, so
the new mh-wasi file disables those features in ICU.
The new config file
mh-wasi
is used when the target matches*-wasi*
,which includes
wasm32-unknown-wasi
,wasm32-unknown-wasip1
, and etc.Teach platform.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI
WASI does not define timezone-related functionalities, and wasi-libc
does not provide tzset, timezone, and tzname. This change teaches
platform.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI.
Add
U_HAVE_ATOMICS
and support platforms without atomics like WASIThis change adds a new macro
U_HAVE_ATOMICS
to ICU. This macro isalways defined to 1 except for platforms that do not support atomic
operations like WASI. Due to the lack of threading support in WASI, the
wasi-sdk does not provide
std::atomic
,std::mutex
, and relatedtypes, so the new macro is used to disable the use of these types.
Checklist