You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The hello world thallium RPC example doesn't work in a heterogeneous environment (mac + linux). See hello-world. I modified the source to use 'sockets' provider instead of TCP. I am posting this here because the error messages come from mercury and maybe libfabric?
Run the server on mac:
~/hello-thallium $ ./server
Server running at address ofi+sockets://10.50.58.248:39517
# [80739.928023] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:2431# na_ofi_addr_map_insert(): fi_av_insert() failed, inserted: 0# [80739.928109] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:2320# na_ofi_addr_key_lookup(): Could not insert new address# [80739.928120] mercury->addr: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:4756# na_ofi_cq_process_recv_unexpected_event(): Could not lookup address# [80739.928128] mercury->msg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/na/na_ofi.c:4680# na_ofi_cq_process_event(): Could not process unexpected recv event# [80739.928156] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3917# hg_core_progress_na(): Could not make progress on NA (NA_PROTOCOL_ERROR)# [80739.928167] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3809# hg_core_poll_wait(): hg_core_progress_na() failed# [80739.928173] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:3708# hg_core_progress(): Could not make blocking progress on context# [80739.928180] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury_core.c:5077# HG_Core_progress(): Could not make progress# [80739.928208] mercury->hg: [error] /var/folders/9p/hrppv1m97xs53_jcddyysq4x6t32tv/T/jaswant.panchumarti/spack-stage/spack-stage-mercury-master-nocsov6z3xrlrmlvlisfupujclikc2hu/spack-src/src/mercury.c:2074# HG_Progress(): Could not make progress on context (HG_PROTOCOL_ERROR)
[critical] unexpected return code (12: HG_PROTOCOL_ERROR) from HG_Progress()
Assertion failed: (0), function__margo_hg_progress_fn, file margo-core.c, line 1659.
zsh: abort ./server
and client on Linux:
$ ./client ofi+sockets://10.50.58.248:39517
I get the same output for a client on mac and a server on linux.
To Reproduce
Steps to reproduce the behavior:
On macOS, spack installs [email protected] which simply crashes the server (segmentation fault), so use argobots@main on both Linux and mac with this command.
Describe the bug
The hello world thallium RPC example doesn't work in a heterogeneous environment (mac + linux). See hello-world. I modified the source to use 'sockets' provider instead of TCP. I am posting this here because the error messages come from mercury and maybe libfabric?
Run the server on mac:
and client on Linux:
I get the same output for a client on mac and a server on linux.
To Reproduce
Steps to reproduce the behavior:
On macOS, spack installs [email protected] which simply crashes the server (segmentation fault), so use argobots@main on both Linux and mac with this command.
Compile
Platforms:
MacOS: Monterey 12.5.1 on M1 with clang-13.1.6
Linux: Ubuntu 22.04 with GCC 11.2.0
Here's output of spack spec mochi-thallium on each platform.
The text was updated successfully, but these errors were encountered: