You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Low Latency mode, rdma_recv_buff is big enough to hold all limited tokens from every rank and to every expert. thus buffer backpressure is unnecessary. Therefore, the sender role's kernel exits immediately after completing one-sided transmission. Only when the hook function is invoked, the remaining receiver role kernel will be executed to finalize the copy operation. Thus, they likely mean that during computation (moe kernel) phases, there's no need for continuous polling by the receiver role kernel, reducing SM occupancy. You can invoke recv role kernel after moe kernel is finished in a lazy behaviour.
It's my personal understanding, not from deepseek team.
IBGDA dispatch kernels are running in backgound, Why not take up computing SMs resources?
Thanks.
The text was updated successfully, but these errors were encountered: