It’s Pytorch-version Cut-Cross-Entropy(CCE) implementation:
- GEMM
- LSE-style CCE Forward
- LSE CCE Backward
- Linear-Cross-Entropy backward
Here is a blog for explaination of CCE: zhihu
It’s Pytorch-version Cut-Cross-Entropy(CCE) implementation:
Here is a blog for explaination of CCE: zhihu