We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这两个 kernel 都用 __threadfence() 来保证其他所有 block 对 exp_sum 的增加对当前 block 可见。但是 __threadfence() 没法做到 block 之间的同步。如果输入长度长一点(比如 256 x 1024),计算结果就会有误。
if (tid == 0) atomicAdd(total, exp_sum); __threadfence(); // grid level memory fence // e^x_i/sum(e^x_0,...,e^x_n-1) // printf("N: %d, idx: %d, bid: %d, tid: %d, exp_val: %f, exp_sum: %f, total: %f\n", // N, idx, blockIdx.x, tid, exp_val, exp_sum, *total); if (idx < N) y[idx] = exp_val / (*total);
The text was updated successfully, but these errors were encountered:
DefTruth
No branches or pull requests
这两个 kernel 都用 __threadfence() 来保证其他所有 block 对 exp_sum 的增加对当前 block 可见。但是 __threadfence() 没法做到 block 之间的同步。如果输入长度长一点(比如 256 x 1024),计算结果就会有误。
The text was updated successfully, but these errors were encountered: