Releases: ovis-hpc/ldms
Releases · ovis-hpc/ldms
Release OVIS-4.3.5
This is the OVIS-4.3.5 G/A Release This release includes the following features and fixes: * Compatability with OVIS-4.3.3 and OVIS-4.3.4 * Support for the Maestro load balancer * Allow root user to access ldmsd configuration objects regardless of euid/egid of the process * Zap socket performance improvements * Zap fabric performance and resiliency improvements * Zap RDMA support for OmniPath * Zap uGNI resiliency improvements * Fix LDMS Streams Service data loss on process exit * Metric set permission handling improvements * Fixes for memory leaks and uninitialized data found by static analysis tools * Numerous build and packaging improvements
Release OVIS-4.3.4
This is the OVIS-4.3.4 G/A Release Significant testing on the socket, RDMA, and uGNI transports has been done with Socket and uGNI scaling to three levels of aggregation and 30,000 sets in the aggregate. The RDMA transport has been tested to a few thousands of sets. The fabric transport should be considered Alpha and is suitable for development, but not deployment at this time. This release includes the following new features * LDMS Transport performance statistics (ldmsd_controller xprt_stats command) * Zap Thread utilization tracking (ldmsd_controller thread_stats command) * uGNI resliency improvements to aid with resource error handling * Packaging updates and github automation to help with tarball generation and release tagging * A reference counting service has been implemented that supports 'named references'. In debug mode (when REF_TRACK is defined), references are tracked (function name, and line number) when they are taken and when they are released, and individual reference counts are kept for each name. This makes it easier to debug reference tracking during development. * The new ref_t reference counting mechanism has been added to struct ldms_set and struct ldms_rbuf_desc in support of a robust set-delete capability * An "end-to-end" protocol has been added for deleting metric sets. When an ldmsd deletes a set, each peer that has a memory handle on the set is notified. The set resources are not freed until all peers acknowledge that they have received the delete notification. * A service (zap_zerr2errno) has been added to consistently map Zap errors to Unix errno * Updates to the lustre2_client sampler to support newer version of Lustre
Release OVIS-4.3.4-beta.1
This is the OVIS-4.3.4 release tag
Release OVIS-4.3.4-alpha.1
This release includes the following updates and fixes: * Packaging updates and github automation to help with tarball generation and release tagging * Fixes for issues found by static analysis tools * The JSON parser had a memory leak that on the socket transport could leak as much as 1MB per message * A service (zap_zerr2errno) has been added to consistently map Zap errors to Unix errno * A reference counting service has been implemented that supports 'named references'. In debug mode (when REF_TRACK is defined), references are tracked (function name, and line number) when they are taken and when they are released, and individual reference counts are kept for each name. This makes it easier to debug reference tracking during development. * The new ref_t reference counting mechanism has been added to struct ldms_set and struct ldms_rbuf_desc in support of a robust set-delete capability * An "end-to-end" protocol has been added for deleting metric sets. When an ldmsd deletes a set, each peer that has a memory handle on the set is notified. The set resources are not freed until all peers acknowledge that they have received the delete notification. * LDMS transport 'telemetry' data has been added that tracks statistics on the primary transport operations DIR, LOOKUP, UPDATE, SEND, and RECV. The intent is to determine when/if an ldmsd becomes overloaded, underutilized, etc... * Zap uGNI Transport fixes * Ensure socket is closed in uGNI transport * Destroy the Cdm in the uGNI transport * Refactor Zap uGNI disconnect path * Aggressively flush incomplete RdmaPost descriptors. * Add more detailed error handling in Zap uGNI * Added a thread to subscribe to and report errors on the uGNI transport. * Make certain that GNI_EpUnbind does not fail. This ensures that NTT resources held by the endpoint are released.
OVIS-4.3.3 G/A
Fix compilation warnings for `-O3 -Wall -Werror`
OVIS 4.3.3 Release Candidate 1
Add are to ldms_set_hdr for compatible updates (#103) Reserve an area in the set hdr to accomodate changes that may affect this structure but still support backward compatability.
OVIS-4.3.3-beta
This is a release that track OVIS-4.3.3-beta
OVIS-4.3.2
OVIS-4.3.2
OVIS-4.3.1
OVIS Version 4.3.1
LDMS v4.3_beta release schedule and high level overview of new features
LDMS features
- Metric sets are now removed by ldms_set_delete
- ldms_xprt_dir now conveys set meta-data information including size, and set_info information
- libfabrics LDMS transport plugin
LDMSD features
- ldmsd stream service:
- A publish/subscribe service in ldmsd that allows external programs to send data (events) over an LDMS Transport to ldmsd plugins
- Improvements to prdcr performance
- ldms_ls provides summary set size information as an aid to ldmsd aggregator memory configuration
New sampler plugins
- SPANK slurm_notifier: a Slurm SPANK plugin that uses ldmsd_stream to notify subscribers (plugins) of job events (e.g. start/stop).
- Used by slurm_sampler, papi_sampler, and syspapi_sampler
- slurm_sampler:
- Multi-tenant capable slurm job information sampler
- PAPI Job Sampler (papi_sampler):
- Collects hardware event counters per-process for all processes of a job
- Receives configuration from a job's environment the slurm stream
- PAPI System Sampler (syspapi_sampler):
- Collects hardware event counters per-core, system wide
- Uses libpfm for sampling and libpapi for event name to event-mask mapping
- Allows consistent configuration to be used between syspapi and papi samplers.
-Samples hardware performance counters on a per-core/uncore basis
- Allows consistent configuration to be used between syspapi and papi samplers.
- IBM OCC sampler (ibm_occ)
New store plugins
- slurm_store:
- SOS store plugin that converts multi-tenant job information into a form more suitable for analysis
- papi_store:
- SOS store plugin that converts PAPI job information into a form more suitable for analysis
OVIS-4.2.3
This update fixes the following problem in 4.2.2:
The outstanding update message condition was being tested before the set matching condition. Hence, the sets that did not match the regex but were being updated as set group members were incorrectly marked as "outstanding update". The 4.2.2 release may get incorrect warnings which pollute the log file, but do not affect the collected data.