Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] softmax_f32_kernel 和 softmax_f32x4_kernel 的实现可能有误 #251

Open
xiaoxiaosuaxuan opened this issue Feb 14, 2025 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@xiaoxiaosuaxuan
Copy link

这两个 kernel 都用 __threadfence() 来保证其他所有 block 对 exp_sum 的增加对当前 block 可见。但是 __threadfence() 没法做到 block 之间的同步。如果输入长度长一点(比如 256 x 1024),计算结果就会有误。

  if (tid == 0) atomicAdd(total, exp_sum);
  __threadfence(); // grid level memory fence
  // e^x_i/sum(e^x_0,...,e^x_n-1) 
  // printf("N: %d, idx: %d, bid: %d, tid: %d, exp_val: %f, exp_sum: %f, total: %f\n", 
  //         N,     idx, blockIdx.x,  tid,     exp_val,     exp_sum,     *total);
  if (idx < N) y[idx] = exp_val / (*total); 
@DefTruth DefTruth self-assigned this Feb 19, 2025
@DefTruth DefTruth added the bug Something isn't working label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants