From 2d1024fa3090fce3274183095be3c71633e5a84e Mon Sep 17 00:00:00 2001 From: Duyi-Wang Date: Thu, 21 Dec 2023 15:52:47 +0800 Subject: [PATCH] [Version] v1.2.0. (#148) --- CHANGELOG.md | 17 +++++++++++++++++ VERSION | 2 +- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 8ab6171a..3641ace9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,22 @@ # CHANGELOG +# [Version v1.2.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.2.0) +v1.2.0 - Qwen models and much more data types supported. +## Models +- Introduced Qwen models support and added the convert tool for Qwen models. +- ChatGLM3 model is verfied and API supported. + +## Performance Optimizations +- Update xDNN to version 1.4.2 to improve performance and support more data types. +- Accelerate first token's generation with BF16-gemm Multi-Head Attention. + +## Functionality +- Introduce more data types supports, including `W8A8`, `INT4`, and `NF4`. The hybrid data types between these new data types are supported. +- Add accuracy evaluation script to assess the impact of different precisions on the text generation performance of the model. +- Introduce `XFT_VERBOSE` macro to help profile model performance of each gemm. Set `1` to enable information ouput and default is `0`. +- Decouple oneCCL and MPI dependencies into a communication helper library. oneCCL environment is no longer needed when running in single-rank mode. + + # [Version 1.1.0](https://github.com/intel/xFasterTransformer/releases/tag/v1.1.0) v1.1.0 - Baichuan models supported. diff --git a/VERSION b/VERSION index 1cc5f657..867e5243 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.1.0 \ No newline at end of file +1.2.0 \ No newline at end of file