Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm barrier assert in generation on PowerPC #1820

Open
elliottslaughter opened this issue Jan 11, 2025 · 5 comments
Open

Realm barrier assert in generation on PowerPC #1820

elliottslaughter opened this issue Jan 11, 2025 · 5 comments
Assignees

Comments

@elliottslaughter
Copy link
Contributor

This is a fairly old runtime, so this may not be actionable. But I wanted to document this, and we can close the issue if we conclude that it's been resolved in a newer version.

This is S3D on Lassen at 32 nodes. It's non-deterministic, probability < 20%.

s3d.x: /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/event_impl.cc:2759: static void Realm::BarrierSubscribeMessage::handle_message(Realm::NodeID, const Realm::BarrierSubscribeMessage&, const void*, size_t): Assertion `it->second > impl->generation.load()' failed.

Backtrace:

[22] #10 0x000020000083fcb0 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
[22] #11 0x000020000084200c in __GI_abort () at abort.c:90
[22] #12 0x00002000008357d4 in __assert_fail_base (fmt=0x20000099b7d0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x200006815c70 "it->second > impl->generation.load()", file=0x200006812eb8 "/g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/event_impl.cc", line=<optimized out>, function=<optimized out>) at assert.c:92
[22] #13 0x00002000008358c4 in __GI___assert_fail (assertion=0x200006815c70 "it->second > impl->generation.load()", file=0x200006812eb8 "/g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/event_impl.cc", line=<optimized out>, function=0x200006815bb8 "static void Realm::BarrierSubscribeMessage::handle_message(Realm::NodeID, const Realm::BarrierSubscribeMessage&, const void*, size_t)") at assert.c:101
[22] #14 0x000020000539627c in Realm::BarrierSubscribeMessage::handle_message (sender=<optimized out>, args=..., data=<optimized out>, datalen=<optimized out>) at /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/event_impl.cc:2104
[22] #15 0x0000200005599a78 in Realm::IncomingMessageManager::do_work (this=0x164ce3f0, work_until=<error reading variable: DWARF-2 expression error: DW_OP_GNU_uninit must always be the very last op.>) at /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/activemsg.cc:770
[22] #16 0x0000200005377024 in Realm::BackgroundWorkManager::Worker::do_work (this=this@entry=0x200066f3eb30, max_time_in_ns=max_time_in_ns@entry=-1, interrupt_flag=interrupt_flag@entry=0x0) at /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/timers.inl:288
[22] #17 0x0000200005377b40 in Realm::BackgroundWorkThread::main_loop (this=0x16728ed0) at /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/bgwork.cc:103
[22] #18 0x00002000054e1740 in Realm::KernelThread::pthread_entry (data=0x1ccdb6e0) at /g/g19/eslaught/s3d-subranks-2025-01-08-sep2024-legion/legion/runtime/realm/threads.cc:854
[22] #19 0x00002000007c8cd4 in start_thread (arg=0x200066f3f8b0) at pthread_create.c:309
[22] #20 0x0000200000927f14 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104

Legion version is a branch based on a September 2024 commit, plus one fix: https://gitlab.com/StanfordLegion/legion/-/commit/d4392d6f01762cdde198a8f44da8c05643c9ea34

@elliottslaughter elliottslaughter changed the title Realm barrier assert in generation Realm barrier assert in generation on PowerPC Jan 12, 2025
@lightsighter
Copy link
Contributor

@elliottslaughter Which commit from September 2024 is this based on?

@elliottslaughter
Copy link
Contributor Author

It's based on b8316fb.

@apryakhin apryakhin self-assigned this Jan 13, 2025
@apryakhin
Copy link
Contributor

If there is any bug that will be on me

@lightsighter
Copy link
Contributor

@apryakhin I think this particular assertion predates the changes you made to the barriers which is why I wanted to see the version of the code that matters.

@elliottslaughter That commit hash does not align with the backtrace that you gave above. Here's the assertion from the backtrace, but you can see that the line numbers don't align (and I don't think the difference can be explained by the one other commit that you merged).
https://gitlab.com/StanfordLegion/legion/-/blob/b8316fb258ee06ec40b80848d4af550455d73703/runtime/realm/event_impl.cc#L2759

@elliottslaughter
Copy link
Contributor Author

Hi all, I have not been able to reproduce this one on master yet. Until then, please put this on the back burner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants