Release Merlin: HugeCTR V3.1 Beta · NVIDIA-Merlin/HugeCTR

Release Notes

Bigger model and large scale training are always the main requirements in recommendation system. In v3.1, we provide a set of new optimizations for good scalability as below, and now they are available in this beta version.

Distributed Hybrid embedding - Model/data parallel split of embeddings based on statistical access frequency to minimize embedding exchange traffic.
Optimized communication collectives - Hierarchical multi-node all-to-all for NVLINK aggregation and oneshot algorithm for All-reduce.
Optimized data reader - Async I/O based data reader to maximize I/O utilization, minimize interference with collectives and eval caching.
MLP fusions - Fused GEMM + Relu + Bias fprop and GEMM + dRelu + bgrad bprop.
Compute-communication overlap - Generalized embedding and bottom MLP overlap.
Holistic CUDA graph - Full iteration graph capture to reduce launch latencies and jitter.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merlin: HugeCTR V3.1 Beta

Release Notes