Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICU-22838 Add WebAssembly/WASI cross-compilation support #3067

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kateinoigakukun
Copy link
Contributor

@kateinoigakukun kateinoigakukun commented Jul 25, 2024

With this patch, we can cross-compile ICU4C with wasi-sdk as follows:

$ mkdir tools
$ mkdir cross
$ cd tools
$ ../../icu/icu4c/source/configure && make
$ cd ../cross
$ ../../icu/icu4c/source/configure --host=wasm32-unknown-wasi \
            --with-cross-build="$PWD/../tools" \
            --enable-static --disable-shared \
            --disable-tools --disable-tests \
            --disable-samples --disable-extras \
            --with-data-packaging=static \
            --prefix / \
            CC="$WASI_SDK_PATH/bin/clang" CXX="$WASI_SDK_PATH/bin/clang++" \
            AR="$WASI_SDK_PATH/bin/llvm-ar" RANLIB="$WASI_SDK_PATH/bin/llvm-ranlib" \
            CFLAGS="--sysroot $WASI_SDK_PATH/share/wasi-sysroot -D_WASI_EMULATED_SIGNAL" \
            CXXFLAGS="-fno-exceptions --sysroot $WASI_SDK_PATH/share/wasi-sysroot -D_WASI_EMULATED_SIGNAL"
$ make PKGDATA_OPTS="--without-assembly -O $PWD/data/icupkg.inc" -j16
Sanity check
$ cat sample.c
#include <stdio.h>
#include <unicode/ucnv.h>
#include <unicode/ustring.h>

int main() {
    UErrorCode status = U_ZERO_ERROR;
    UConverter *cnv = ucnv_open("UTF-8", &status);
    if (U_FAILURE(status)) {
        fprintf(stderr, "ucnv_open failed: %s\n", u_errorName(status));
        return 1;
    }

    char buffer[1024];
    int length = fread(buffer, 1, sizeof(buffer), stdin);
    if (length < 0) {
        fprintf(stderr, "fread failed\n");
        return 1;
    }

    UChar uchars[1024];
    int32_t uchars_length = ucnv_toUChars(cnv, uchars, sizeof(uchars) / sizeof(UChar), buffer, length, &status);
    if (U_FAILURE(status)) {
        fprintf(stderr, "ucnv_toUChars failed: %s\n", u_errorName(status));
        return 1;
    }

    int32_t char_count = u_countChar32(uchars, uchars_length);
    printf("%d\n", char_count);

    return 0;
}

$ $WASI_SDK_PATH/bin/clang sample.c -I ../install/include/ -L ../install/lib/ -licuuc -licudata
$ echo "Pokémon" | wasmtime ./a.out
8

Patch overview

  • Add config/mh-wasi and update configure script.

    WebAssembly/WASI does not support threads and dynamic linking, so
    the new mh-wasi file disables those features in ICU.
    The new config file mh-wasi is used when the target matches *-wasi*,
    which includes wasm32-unknown-wasi, wasm32-unknown-wasip1, and etc.

  • Teach platform.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI

    WASI does not define timezone-related functionalities, and wasi-libc
    does not provide tzset, timezone, and tzname. This change teaches
    platform.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI.

  • Add U_HAVE_ATOMICS and support platforms without atomics like WASI

    This change adds a new macro U_HAVE_ATOMICS to ICU. This macro is
    always defined to 1 except for platforms that do not support atomic
    operations like WASI. Due to the lack of threading support in WASI, the
    wasi-sdk does not provide std::atomic, std::mutex, and related
    types, so the new macro is used to disable the use of these types.

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22838
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/putilimp.h is different
  • icu4c/source/common/unicode/platform.h is now changed in the branch
  • icu4c/source/i18n/unicode/numberrangeformatter.h is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/number_mapper.h is different
  • icu4c/source/i18n/numrange_fluent.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

kateinoigakukun added a commit to kateinoigakukun/swift-foundation-icu that referenced this pull request Jul 26, 2024
The original patch is still under review in the upstream ICU project, but
it is needed to unblock the swift-foundation build on WebAssembly.

See unicode-org/icu#3067
@markusicu
Copy link
Member

@srl295 @sffc could you help review these changes?

kateinoigakukun added a commit to kateinoigakukun/swift-foundation-icu that referenced this pull request Aug 2, 2024
The original patch is still under review in the upstream ICU project, but
it is needed to unblock the swift-foundation build on WebAssembly.

See unicode-org/icu#3067
iCharlesHu pushed a commit to swiftlang/swift-foundation-icu that referenced this pull request Aug 6, 2024
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support"

The original patch is still under review in the upstream ICU project, but
it is needed to unblock the swift-foundation build on WebAssembly.

See unicode-org/icu#3067

* [Build] Update compile definitions for WASI target

`U_TIMEZONE` must not be defined and dynamic loading features must be
disabled for WASI target.
iCharlesHu pushed a commit to iCharlesHu/swift-foundation-icu that referenced this pull request Aug 6, 2024
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support"

The original patch is still under review in the upstream ICU project, but
it is needed to unblock the swift-foundation build on WebAssembly.

See unicode-org/icu#3067

* [Build] Update compile definitions for WASI target

`U_TIMEZONE` must not be defined and dynamic loading features must be
disabled for WASI target.
iCharlesHu added a commit to swiftlang/swift-foundation-icu that referenced this pull request Aug 7, 2024
* Cherry-pick "ICU-22838 Add WebAssembly/WASI cross-compilation support"

The original patch is still under review in the upstream ICU project, but
it is needed to unblock the swift-foundation build on WebAssembly.

See unicode-org/icu#3067

* [Build] Update compile definitions for WASI target

`U_TIMEZONE` must not be defined and dynamic loading features must be
disabled for WASI target.

Co-authored-by: Yuta Saito <[email protected]>
@markusicu
Copy link
Member

Please rebase on current main and resolve conflicts. Note that the autoconf scripts have been updated in another PR.
I will try to be better about nudging pull requests along...

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/unicode/platform.h is different
  • icu4c/source/common/unifiedcache.cpp is different
  • icu4c/source/configure is different
  • icu4c/source/i18n/numrange_fluent.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@kateinoigakukun
Copy link
Contributor Author

@markusicu Thanks for updating autoconf stuff. I rebased the branch on the latest main.

icu4c/source/common/umutex.cpp Outdated Show resolved Hide resolved
icu4c/source/common/umutex.h Outdated Show resolved Hide resolved
icu4c/source/common/umutex.h Outdated Show resolved Hide resolved
icu4c/source/common/unifiedcache.cpp Outdated Show resolved Hide resolved
* Add config/mh-wasi and update configure script.

  WebAssembly/WASI does not support threads and dynamic linking, so
  the new mh-wasi file disables those features in ICU.

* Teach putilimp.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI

  WASI does not define timezone-related functionalities, and wasi-libc
  does not provide tzset, timezone, and tzname. This change teaches
  putilimp.h not to define U_TZSET, U_TIMEZONE, and U_TZNAME for WASI.

* Add `U_HAVE_ATOMICS` and support platforms without atomics like WASI

  This change adds a new macro `U_HAVE_ATOMICS` to ICU. This macro is
  always defined to 1 except for platforms that do not support atomic
  operations like WASI. Due to the lack of threading support in WASI, the
  wasi-sdk does not provide `std::atomic`, `std::mutex`, and related
  types, so the new macro is used to disable the use of these types.
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/common/umutex.cpp is different
  • icu4c/source/common/umutex.h is different
  • icu4c/source/common/unifiedcache.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

delete fields->atomicParser.exchange(nullptr);
delete fields->atomicCurrencyParser.exchange(nullptr);
#else
delete fields->atomicParser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to assign these two parser fields to be nullptr. The owning object is not being deleted, leaving live invalid pointers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just implement the exchange function in your shim.

icu4c/source/i18n/number_mapper.h Outdated Show resolved Hide resolved
@markusicu
Copy link
Member

@kateinoigakukun please don't squash any more until this PR is approved, to help GitHub track comments.

@kateinoigakukun
Copy link
Contributor Author

kateinoigakukun commented Sep 23, 2024

@aheninger Thank you for your feedback. They totally make sense to me. Addressed them in 008b01c

@markusicu Oops, sorry. I'll keep separate fixup commits for further changes.

* icu4c/source/i18n/decimfmt.cpp: Explicitly set the pointers to nullptr
  after deleting them.
* icu4c/source/i18n/number_mapper.h: Use `= {}` to initialize pointers
  to nullptr instead of `= nullptr` to keep project-wide consistency.
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@sffc
Copy link
Member

sffc commented Sep 24, 2024

Are you aware of ICU4X, which we've built from the ground up to work well with WebAssembly? It is no_std so doesn't even need WASI in order to build.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of U_HAVE_ATOMICS poking so many holes in the ICU4C code base. Synchronization code is among the hardest and most-error-prone code, and this just makes it more so.

It looks like wasi-threads is still stage 1; is that why you can't use it?

A better solution would be to compile only a subset of ICU4C that doesn't use atomics (I know of clients already doing this), and anything that needs atomics just doesn't get compiled.

Alternatively, see if ICU4X serves your needs.

#ifdef U_HAVE_ATOMICS
/* Use the predefined value. */
#elif defined(__wasi__) && !defined(_REENTRANT)
/* WASI does not support atomics when wasi-threads feature is not enabled */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why not use wasi-threads, then? It seems error-prone to add a flag that skips mutexes.

delete fields->atomicParser.exchange(nullptr);
delete fields->atomicCurrencyParser.exchange(nullptr);
#else
delete fields->atomicParser;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just implement the exchange function in your shim.

auto* ptr = fields->atomicParser.load();
#else
auto* ptr = fields->atomicParser;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just implement the load function in your shim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should add something like icu::atomic<T> to shim std::atomic<T>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style-wise, it seems cleaner if your new cfg would just polyfill std::atomic with something that works in WebAssembly. I don't think it's useful for code now and into the future to have to figure out how and where to put your atomic ifdefs.

@aheninger
Copy link
Contributor

sffc wrote:

I'm not a fan of U_HAVE_ATOMICS poking so many holes in the ICU4C code base.

Me neither. I was assuming getting ICU running in this environment was considered worth the added complexity.

@kateinoigakukun
Copy link
Contributor Author

kateinoigakukun commented Sep 25, 2024

Thanks @sffc for your feedback

Are you aware of ICU4X, which we've built from the ground up to work well with WebAssembly? It is no_std so doesn't even need WASI in order to build.

Yes, I'm aware of the ground-up work. For our needs, because the Foundation project heavily depends on ICU4C, the migration might not be trivial and take a long time to get all things stable (e.g. API migrations especially for APIs without no drop-in replacement like IDNA, build system integration with Rust toolchain, and so on). At least, it's hard to reason the migration decision just for WASI platform support.
But I personally agree it might be a direction we should take in the long term for various other advantages of ICU4X.

It looks like wasi-threads is still stage 1; is that why you can't use it?

First, wasi-threads is a proposal to provide thread creation APIs and not to provide synchronization primitives. And also the proposal is considered legacy and to be replaced with another proposal (but no implementation for the future proposal yet)

The synchronization primitives are provided by a core spec proposal WebAssembly/threads.

Under the circumstances, WASI-SDK currently supports two platform variations:

  1. Only WASI Preview1
    • Pre-compiled libc and libcxx are compiled without WebAssembly/threads feature, no synchroniztion primitive APIs
  2. WASI Preview1 + wasi-threads extension.
    • Pre-compiled libc and libcxx provide synchronization primive APIs

Given that wasi-threads is today consiered legacy, both of them don't satisfy the needs here.

So what we want here is synchrnoization primitives but not thread creation (WebAssembly/threads enabled but wasi-threads disabled). However, both WASI-SDK and even libcxx do not support such platform, "synchronization primitives available but no thread creation" for now.

I understand your concern about poking holes, and it's a typical pain point when porting existing software to WebAssembly. To fill the gap, there is an ongoing effort to provide stubs for synchronization primitive APIs without actual synchronization operations (WebAssembly/wasi-libc#518) in wasi-libc and WASI-SDK side. However, it requires some more effort to bring libcxx's std::mutex and std::atomic, so we don't have a fixed timeline in this direction.

A better solution would be to compile only a subset of ICU4C that doesn't use atomics (I know of clients already doing this), and anything that needs atomics just doesn't get compiled.

Yes, we considered that direction, however the Foundation projects depends on many ICU features including formatters, locales, and timezones and they all use umutex.h, so it's not a realistic approach for us unfortunately.

@@ -103,6 +103,8 @@ typedef size_t uintptr_t;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: please build this in GitHub Actions, because otherwise it could break again at any time.

Comment on lines +317 to +319
* \def U_HAVE_ATOMICS
* Defines whether the platform supports atomic operations.
* @internal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: maybe name it U_ATOMICS_UNSAFE_POLYFILL or something like that to indicate that this is not disabling atomic code but is instead replacing it with a non-atomic alternative.

@sffc
Copy link
Member

sffc commented Sep 25, 2024

Thanks @sffc for your feedback

Are you aware of ICU4X, which we've built from the ground up to work well with WebAssembly? It is no_std so doesn't even need WASI in order to build.

Yes, I'm aware of the ground-up work. For our needs, because the Foundation project heavily depends on ICU4C, the migration might not be trivial and take a long time to get all things stable (e.g. API migrations especially for APIs without no drop-in replacement like IDNA, build system integration with Rust toolchain, and so on).

From browsing the project, it looks like most calls of ICU go through here:

https://github.com/swiftlang/swift-corelibs-foundation/blob/main/Sources/CoreFoundation/internalInclude/CFICULogging.h

The bulk of the functions in that list have ICU4X equivalents, including calendrical calculations, datetime formatting, custom datetime patterns and symbols, number formatting, relative datetime formatting (experimental in ICU4X), and list formatting. The only function in that list that sticks out where there's not yet an equivalent is number parsing. You mentioned IDNA; I don't see IDNA in that list, but ICU4X aims to export at least the primitives for IDNA (unicode-org/icu4x#2614).

I can't comment on your concern about Rust toolchains except to observe that other projects with large amounts of legacy C/C++ code and complex build systems have adopted it in the last two years (e.g. Chromium). Having worked on ICU4C for 5 years before working on ICU4X for 3 years, I feel much more confident in Rust's ability to deliver fast, reliable code in the i18n space.

At least, it's hard to reason the migration decision just for WASI platform support. But I personally agree it might be a direction we should take in the long term for various other advantages of ICU4X.

I don't think a migration would be too difficult, though it would be most successful with a Foundation contributor driving the process in collaboration with the ICU4X team who could fill in any potential gaps through the process.

My most recent talk on ICU4X, at UTW 2023, goes over some of the metrics, showing how the team has delivered better code size, memory usage, and performance in most components, especially formatters, which seems to be one of the main components you use in Foundation.

I hope to be giving off a healthy sense of enthusiasm in being a point of contact if you decide to go this route.

I understand your concern about poking holes, and it's a typical pain point when porting existing software to WebAssembly. To fill the gap, there is an ongoing effort to provide stubs for synchronization primitive APIs without actual synchronization operations (WebAssembly/wasi-libc#518) in wasi-libc and WASI-SDK side. However, it requires some more effort to bring libcxx's std::mutex and std::atomic, so we don't have a fixed timeline in this direction.

I see; so it sounds like there is likely room to continue making progress in this space in WASI itself. Do you think once WASI lands support for std::mutex and std::atomic, you might be able to revert this workaround in ICU4C?

Also, it seems like there should already exist polyfills for std::mutex and std::atomic. Can we land a copy of those in ICU4C (assuming compatible licenses) to avoid reinventing the wheel?

@ArcaneNibble
Copy link

However, it requires some more effort to bring libcxx's std::mutex and std::atomic, so we don't have a fixed timeline in this direction.

fwiw, if WebAssembly/wasi-libc#518 manages to achieve consensus and get merged, the only thing that has to be done to enable std::mutex and std::atomic is changing a few lines of cmake to ON in WASI-SDK (achieving consensus on that may or may not require yet another discussion, but it's not a difficult technical problem)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants