Skip to content

RMA WG 06 27 2019

David Ozog edited this page Jun 27, 2019 · 2 revisions

Agenda

  1. Review put-with-signal proposal updates (Naveen)
  2. Performance variables (Wasi)
  3. Memory model (Anshuman)

Attendees

  • Manju, Wasi, Naveen, Dave, Jim, Pasha, Nick, (Anshuman?), (Others?)

Notes

Naveen - shmem_signal_fetch overview

  • added shmem_signal_wait_until API, which returns the signal value.

  • Then added shmem_signal_fetch.

  • Wants to assure sig_addr update text captures atomicity guarantees correctly:

Feedback on atomicity text:

  • Overall, it looks good.
  • "shmem_signal_fetch fetches on the remote PE"... Suggestion: "on the remote PE" is a little confusing, since shmem_signal_fetch is a local operation - try rewording for clarity.
  • "as if performed atomically" could be removed/reworded.

shmem_signal_fetch - informal reading

Feedback:

  • "TYPE" should be uint64_t.
  • 2nd sentence may be unnecessary - could try to leverage existing atomicity guarantees text (Section 3.1)
    • "single/multi-threaded" may not be needed (they're not distinguished in any way here).
    • The reference to other sections (9.8.1,9.8.2) is helpful.
  • Is a uint64_t really a "signal data object"? Is that termed defined?
    • might be useful terminology for the shmem_signal.
    • Define under Section 9.8, "Signaling Operations".
    • put atomicity section in 9.8 as well.
  • API summary sentence could say "fetches the value"
  • Should put_with_signal be added to both quiet and fence descriptions?
    • Does it work to treat put_with_signal as a single operation?
    • Quiet is easy (both pieces are completed by shmem_quiet).
    • "Fence" has 2 types of ordering here (internal fence w.r.t. put/signal, or a fence that is external to other operations).
  • shmem_signal_wait_until and shmem_signal_fetch are local (load-like operations), should it be affected by fence?
    • Conclusion; won't add to quiet/fence, but we'll review text later.

Wasi - Performance Counters and APIs

  • Overview slides
    • Workshop paper describes an API for gathering ligthweight performance counters.

    • These counters are associated with SHMEM contexts (or a PE in 1 case).

    • Counters are monotonically increasing pending/completed put/get/amo operations.

    • Query APIs implemented in SOS.

    • Q: are these APIs relevant to non-blocking only?
      A: Not necessarily: issued vs. completed values may differ for blocking-ops with small messages. which are buffered for puts and non-fetching AMOs). Same when bounce buffering is enabled...

      If issued counter value always equals completed counter value, then there's no overlap happening.

    • Q: What are the motivating examples?

      A:

      • understanding performance issues
      • detecting performance bugs
      • profiling data can be associated with runtime status - better insights
    • Q: Is it really a "performance counter" - or is it more like the MPI message queue dumping interface?

      A: Unlike MPI message queues, these counters give current (realtime) info about communication status.

    • Q:Does supporting performance counters degrade app performance?

      A:

      • This proposal places no requirement to support specific counters.
      • Paper results suggest the pending put/get counter overhead is negligible.
    • Summarized ISx findings with these counters:

      • Detected unexpected non-random communication behavior
      • Can see non-blocking API achieves overlapping via counter values
    • Performance Variable Classes proposed

Clone this wiki locally