Skip to content

Nidhogg

Pre-release
Pre-release
Compare
Choose a tag to compare
@mendygral mendygral released this 24 Oct 14:10
· 2 commits to main since this release
e853233

Dragon 0.10 Release Summary

This release adds support for infiniband networks, checkpointing to the distributed dictionary, and provides initial support for telemetry monitoring via a Grafana frontend. Other features include:

  • Complete overhaul of ProcessGroup to improve reliability and user debugging
  • Performance improvements in HSTA
  • Better exception handling and logging
  • Better support for specifying placement of processes
  • Addition of a dragon-activate script to make it easier to set-up runtime environment
  • Improved stability for the Distributed Dictionary and runtime overall

Two sets of packages are below. The ones with "HSN" in the name include the RDMA-based transport feature for both slingshot and infiniband networks. The other packages use the TCP-based transport and will work on generic clusters and single node/laptops/etc. Note that the TCP-based transport package may not scale for some use cases above 16 nodes.