-
Notifications
You must be signed in to change notification settings - Fork 41
RMA WG 06 27 2019
- Review put-with-signal proposal updates (Naveen)
- Performance variables (Wasi)
- Memory model (Anshuman)
- Manju, Wasi, Naveen, Dave, Jim, Pasha, Nick, (Anshuman?), (Others?)
-
added
shmem_signal_wait_until
API, which returns the signal value. -
Then added
shmem_signal_fetch
. -
Wants to assure
sig_addr
update text captures atomicity guarantees correctly:
- Overall, it looks good.
- "
shmem_signal_fetch
fetches on the remote PE"... Suggestion: "on the remote PE" is a little confusing, sinceshmem_signal_fetch
is a local operation - try rewording for clarity. - "as if performed atomically" could be removed/reworded.
- "TYPE" should be
uint64_t
. - 2nd sentence may be unnecessary - could try to leverage existing atomicity guarantees text (Section 3.1)
- "single/multi-threaded" may not be needed (they're not distinguished in any way here).
- The reference to other sections (9.8.1,9.8.2) is helpful.
- Is a
uint64_t
really a "signal data object"? Is that termed defined?- might be useful terminology for the shmem_signal.
- Define under Section 9.8, "Signaling Operations".
- put atomicity section in 9.8 as well.
- API summary sentence could say "fetches the value"
- Should put_with_signal be added to both quiet and fence descriptions?
- Does it work to treat put_with_signal as a single operation?
- Quiet is easy (both pieces are completed by shmem_quiet).
- "Fence" has 2 types of ordering here (internal fence w.r.t. put/signal, or a fence that is external to other operations).
-
shmem_signal_wait_until
andshmem_signal_fetch
are local (load-like operations), should it be affected by fence?- Conclusion; won't add to quiet/fence, but we'll review text later.
- Overview slides
-
Workshop paper describes an API for gathering ligthweight performance counters.
-
These counters are associated with SHMEM contexts (or a PE in 1 case).
-
Counters are monotonically increasing pending/completed put/get/amo operations.
-
Query APIs implemented in SOS.
-
Q: are these APIs relevant to non-blocking only?
A: Not necessarily: issued vs. completed values may differ for blocking-ops with small messages. which are buffered for puts and non-fetching AMOs). Same when bounce buffering is enabled...If issued counter value always equals completed counter value, then there's no overlap happening.
-
Q: What are the motivating examples?
A:
- understanding performance issues
- detecting performance bugs
- profiling data can be associated with runtime status - better insights
-
Q: Is it really a "performance counter" - or is it more like the MPI message queue dumping interface?
A: Unlike MPI message queues, these counters give current (realtime) info about communication status.
-
Q:Does supporting performance counters degrade app performance?
A:
- This proposal places no requirement to support specific counters.
- Paper results suggest the pending put/get counter overhead is negligible.
-
Summarized ISx findings with these counters:
- Detected unexpected non-random communication behavior
- Can see non-blocking API achieves overlapping via counter values
-
Performance Variable Classes proposed
-
-
Working Groups
-
Errata