Skip to content

Merlin: HugeCTR V3.1 Beta

Compare
Choose a tag to compare
@zehuanw zehuanw released this 20 May 04:15
· 2 commits to v3.1_preview since this release

Release Notes

Bigger model and large scale training are always the main requirements in recommendation system. In v3.1, we provide a set of new optimizations for good scalability as below, and now they are available in this beta version.

  • Distributed Hybrid embedding - Model/data parallel split of embeddings based on statistical access frequency to minimize embedding exchange traffic.
  • Optimized communication collectives - Hierarchical multi-node all-to-all for NVLINK aggregation and oneshot algorithm for All-reduce.
  • Optimized data reader - Async I/O based data reader to maximize I/O utilization, minimize interference with collectives and eval caching.
  • MLP fusions - Fused GEMM + Relu + Bias fprop and GEMM + dRelu + bgrad bprop.
  • Compute-communication overlap - Generalized embedding and bottom MLP overlap.
  • Holistic CUDA graph - Full iteration graph capture to reduce launch latencies and jitter.