v0.0.2
What's Changed
- Refactor fused modules by @casper-hansen in #18
- fuse_layers bug fix by @qwopqwop200 in #21
- support speedtest to benchmark FP16 model by @wanzhenchn in #25
- Implement batch size for speed test by @casper-hansen in #26
- [BUG] Fix illegal memory access + Quantized Multi-GPU support by @casper-hansen in #28
- YaRN support for LLaMa models by @casper-hansen in #23
New Contributors
- @wanzhenchn made their first contribution in #25
Full Changelog: v0.0.1...v0.0.2