Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issues for CI spurious network failures with example.com #10369

Open
alexcrichton opened this issue Mar 11, 2025 · 5 comments
Open

Tracking issues for CI spurious network failures with example.com #10369

alexcrichton opened this issue Mar 11, 2025 · 5 comments
Labels

Comments

@alexcrichton
Copy link
Member

CI just spuriously failed where a test hits example.com as a live testing service. This was known that it was probably going to be flaky so I wanted to have an issue here tracking this so we know what's going on. I've also opened an issue at WebAssembly/wasi-tls#9 for more error information

cc @jsturtevant and @badeend

Currently the failure looks like this:

running 1 test

thread 'main' panicked at crates/test-programs/src/bin/tls_sample_application.rs:33:14:
test tls_sample_application ... FAILED
called `Result::unwrap()` on an `Err` value: ()

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
error: test failed, to rerun pass `-p wasmtime-wasi-tls --test main`

---- tls_sample_application stdout ----
Error: error while executing at wasm backtrace:
    0: 0x1cd96 - tls_sample_application-3d73890cb2b55afd.wasm!__rust_start_panic
    1: 0x1cce8 - tls_sample_application-3d73890cb2b55afd.wasm!rust_panic
    2: 0x1ccbb - tls_sample_application-3d73890cb2b55afd.wasm!std::panicking::rust_panic_with_hook::he1beb51ce54153b5
    3: 0x1bf2e - tls_sample_application-3d73890cb2b55afd.wasm!std::panicking::begin_panic_handler::{{closure}}::hdbff82bd9d56f6f9
    4: 0x1be9a - tls_sample_application-3d73890cb2b55afd.wasm!std::sys::backtrace::__rust_end_short_backtrace::h5bb26fcae04a79f5
    5: 0x1c64f - tls_sample_application-3d73890cb2b55afd.wasm!rust_begin_unwind
    6: 0x21b2b - tls_sample_application-3d73890cb2b55afd.wasm!core::panicking::panic_fmt::h2781e55e5f70e742
    7: 0x2339b - tls_sample_application-3d73890cb2b55afd.wasm!core::result::unwrap_failed::he956e17bd675892a
    8: 0x5fea - tls_sample_application-3d73890cb2b55afd.wasm!tls_sample_application::test_tls_sample_application::h85c84c9b597e7424
    9: 0x6903 - tls_sample_application-3d73890cb2b55afd.wasm!tls_sample_application::main::hd80f60ac75378593
   10: 0x21c8 - tls_sample_application-3d73890cb2b55afd.wasm!core::ops::function::FnOnce::call_once::hadd62004787a8069
   11: 0x2167 - tls_sample_application-3d73890cb2b55afd.wasm!std::sys::backtrace::__rust_begin_short_backtrace::h7e8098332588ec78
   12: 0x2107 - tls_sample_application-3d73890cb2b55afd.wasm!std::rt::lang_start::{{closure}}::h9add34816d3fff1f
   13: 0x1a60a - tls_sample_application-3d73890cb2b55afd.wasm!std::rt::lang_start_internal::hb9a72f8093679cde
   14: 0x20a4 - tls_sample_application-3d73890cb2b55afd.wasm!std::rt::lang_start::hf7b99a0ab1f3109d
   15: 0x6927 - tls_sample_application-3d73890cb2b55afd.wasm!__main_void
   16: 0x1fdd - tls_sample_application-3d73890cb2b55afd.wasm!_start
   17: 0x50bb69 - wit-component:adapter:wasi_snapshot_preview1!wasi:cli/[email protected]#run
note: using the `WASMTIME_BACKTRACE_DETAILS=1` environment variable may show more debugging information

Caused by:
    wasm trap: wasm `unreachable` instruction executed


failures:
    tls_sample_application

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.34s
@cfallin
Copy link
Member

cfallin commented Mar 11, 2025

Idea: if we need a live testing endpoint, would it be possible to use github.com or api.github.com as the target instead? That way, as long as CI is running, we know the endpoint is up (put another way: the one service we know is working if we're running on GitHub CI is GitHub).

@jsturtevant
Copy link
Contributor

Idea: if we need a live testing endpoint, would it be possible to use github.com or api.github.com as the target instead? That way, as long as CI is running, we know the endpoint is up (put another way: the one service we know is working if we're running on GitHub CI is GitHub).

We could try this, the intent behind a live endpoint was to test the system cert store. If it becomes a larger issues we could add retry. I would like to add a local only test as well.

I'll look into returning an error as well.

@alexcrichton
Copy link
Member Author

One idea might be to test N different domains and return success if any succeed. That way we're resilient to some domains going down as well. I suspect github.com isn't 100% tied to GHA but it's probably more closely related to example.com vs GHA, but por que no los dos?

@alexcrichton
Copy link
Member Author

Another failure

@jsturtevant
Copy link
Contributor

For a few minutes I was able to reproduce this locally by running the test in a tight loop but recent tries its not failing.

It's failing on connection; We will need to surface those errors with changes to the spec as suggested in WebAssembly/wasi-tls#9.

In the mean time started to work on One idea might be to test N different domains and return success if any succeed. to remove the test flakes while we work on the tls changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants