Nidhogg
Pre-release
Pre-release
Dragon 0.10 Release Summary
This release adds support for infiniband networks, checkpointing to the distributed dictionary, and provides initial support for telemetry monitoring via a Grafana frontend. Other features include:
- Complete overhaul of ProcessGroup to improve reliability and user debugging
- Performance improvements in HSTA
- Better exception handling and logging
- Better support for specifying placement of processes
- Addition of a
dragon-activate
script to make it easier to set-up runtime environment - Improved stability for the Distributed Dictionary and runtime overall
Two sets of packages are below. The ones with "HSN" in the name include the RDMA-based transport feature for both slingshot and infiniband networks. The other packages use the TCP-based transport and will work on generic clusters and single node/laptops/etc. Note that the TCP-based transport package may not scale for some use cases above 16 nodes.