24 Jan 05:21

Duyi-Wang

aa72c17

v1.3.1

BUG fix

Fix oneCCL environment is still needed when running in single-rank mode.

What's Changed

[demo] Add qwen demo deps. by @marvin-Yu in #193
Add base_initial to store the original base passed from config by @a3213105 in #196
[Comm] Check mpirun and env before load helper. by @Duyi-Wang in #197
[Version] v1.3.1. by @Duyi-Wang in #198

Full Changelog: v1.3.0...v1.3.1

Contributors

a3213105, marvin-Yu, and Duyi-Wang

Assets 2

23 Jan 03:42

Duyi-Wang

v1.3.0

4c78988

v1.3.0 - Qwen model support enhancement and added support for the SecLLM (YaRN-Llama) model.

Models

Introduce SecLLM(YaRN-Llama) model support.
Integrating the Qwen web demo, enhancing Qwen model support, and fix known issues in the Qwen convert tool.

Functionality

Introduce new generation configuration, repetition_penalty and stop_words_ids.
Rotary embedding supports BF16 data type now.
Introduce attention interfaces similar to page attention.
Add a whitelist to gather timeline events based on filtered events.

BUG fix

Fix libxft_comm_helper.so can't be found issue in multi-ranks mode.
Fix assert error in MLP when CAT_MLP opt is enabled.
Fix a w8a8 crash issue due to buffer size isn't big enough.
Correct GCC version for AVX512_BF16 instruction set.
Fix int32 overflow issue for larger size.

What's Changed

[Fix] fix assert error in MLP when enable CAT_MLP opt. by @abenmao in #151
clang-format: Remove duplicated IndentPPDirectives entry by @huaqiangwang in #159
[Demo] fix mpi stick the stdout/err in web_demo. by @marvin-Yu in #162
[Fix] comm_helper can't found issue. by @Duyi-Wang in #161
[Demo] use average next-token latency as the info to user. by @huaqiangwang in #160
[Search] Add repetition penalty. by @Duyi-Wang in #163
Add attention interface (like page attention) by @pujiang2018 in #143
[Fix] index ID lowerbound check in repetition penalty. by @Duyi-Wang in #167
Add a white list to collect timeline events on a filtered events by @huaqiangwang in #156
[Fix] Fix build issue for the TIMELINE filter feature by @huaqiangwang in #169
[ChatGLM2] Remove unused code. by @a3213105 in #168
[Benchmark] fix Benchmark performance issue. by @marvin-Yu in #170
[Framework] Attention/LayerNorm/RmsNorm refactor/enhance to better support BF16 inference. by @pujiang2018 in #171
[kernel] Fix w8a8 crash issue due to buffer size not big enough by @xiangzez in #158
[Benchmark] Avoid float in core pre numa calculation. by @Duyi-Wang in #164
[Model][SecLLM] Add SecLLM(YaRN-Llama) model support by @abenmao in #172
[LOG] Default disable fake loading log print. by @Duyi-Wang in #173
[Layer] BF16 support for rotary embedding by @pujiang2018 in #176
[examples] add qwen & chatglm3 model config. by @marvin-Yu in #177
[convert] Fix qwen convert with no eos id. by @marvin-Yu in #181
[demo] Fix chatGLM3 webdemo. by @marvin-Yu in #184
[Generation] Add stop_words_ids generation config. by @Duyi-Wang in #183
[Page attention]Prefill add kv cache by @aurora327 in #178
[common/utils] Fix bug of int32 overflow for larger size by @abenmao in #187
[Generate] Sync stop words ids in multi-rank mode. by @Duyi-Wang in #190
[demo] Add qwen demo. by @marvin-Yu in #180
[Kernel] Add Qwen rotary_embedding ntk support. by @changqi1 in #189
[Version] v1.3.0. by @Duyi-Wang in #191

New Contributors

@huaqiangwang made their first contribution in #159

Full Changelog: v1.2.0...v1.3.0

Contributors

a3213105, huaqiangwang, and 7 other contributors

Assets 2

21 Dec 08:12

Duyi-Wang

v1.2.0

2d1024f

v1.2.0 - Qwen models and much more data types supported.

Models

Introduced Qwen models support and added the convert tool for Qwen models.
ChatGLM3 model is verfied and API supported.

Performance Optimizations

Update xDNN to version 1.4.2 to improve performance and support more data types.
Accelerate first token's generation with BF16-gemm Multi-Head Attention.

Functionality

Introduce more data types supports, including W8A8, INT4, and NF4. The hybrid data types between these new data types are supported.
Add accuracy evaluation script to assess the impact of different precisions on the text generation performance of the model.
Introduce XFT_VERBOSE macro to help profile model performance of each gemm. Set 1 to enable information ouput and default is 0.
Decouple oneCCL and MPI dependencies into a communication helper library. oneCCL environment is no longer needed when running in single-rank mode.

Assets 2

01 Dec 01:53

Duyi-Wang

v1.1.0

d5e576d

v1.1.0 - Baichuan models supported.

Models

Introduced Baichuan models support and added the convert tool for Baichuan models.

Performance Optimizations

Update xDNN to version 1.2.1 to improve performance of BF16 data type with AMX instruction on 4th generation Intel Xeon Scalable processors.
Improved performance of BF16 data type inference by adding matMul bf16bf16bf16 primitives and optimizing kernel selection strategy.
Improved performance of the model with unbalanced split allocation.

Functionality

Introduced prefix sharing feature.
Add sample strategy for token search, support temperature, top k, and top P parameter.
Introduce convert module to xfastertransformer python API.
Introduced grouped-query attention support for Llama2.
Auto-detect oneCCL environment and enter single-rank model if oneCCL does not exist.
Auto-detect oneCCL environment in compilation. If not detected, oneCCL will be built from source.
Add C++ exit function for multi-rank model.
Remove mklml 3rd party dependency.
Export normalization and position embedding C++ API, including alibi embedding and rotary embedding.
Introduced XFT_DEBUG_DIR environment value to specify the debug file directory.

BUG fix

Fix runtime issue of oneCCL shared memory model.
Fix path concat issue in convert tools.

Assets 2

01 Nov 07:42

Duyi-Wang

source-publish

d8838af

source-publish Pre-release

Pre-release

Sources used in xFasterTransformer docker release image with a license that requires publication: GPL, LGPL, MPL

Each of the tar files is a component in xFasterTransformer that has a license
that requires the publication of the sources. This includes GPL, LGPL, and MPL. We
are publishing the original sources including any patches. Each component has
its own license, so we do not provide a license for this release.

If you need the sources for a component that is not included here, please
contact [email protected]

Assets 19

18 Oct 03:05

changqi1

v1.0.0

cf13767

v1.0.0

This is the 1st official release of xFasterTransformer.🎇🎇🎇

Support models

ChatGLM-6B
ChatGLM2-6B
Llama 1, both 7B, 33B, and 65B
Llama 2, both 7B, 13B, and 70B
Opt larger than 1.3B

Features

Support Python and C++ API to integrate xFasterTransformer into the user's own solutions. Example codes are provided to demonstrate the usage.
Support hybrid data types such as BF16+FP16 and BF16+INT8 to accelerate the generation of the 1st token, in addition to supporting single data types like FP16, BF16, and INT8.
Support multiple instances to accelerate model inference, both locally and through the network.
Support Intel AMX instruction on 4th generation Intel Xeon Scalable processors.
Support 4th generation Intel Xeon Scalable processors with HBM which has a higher memory bandwidth and shows a much better performance on LLM.
Provide web demo scripts for users to show the performance of LLM models optimized by xFasterTransformer.
Support multiple distribution methods, both PyPI and docker images.

Assets 2

19 Mar 08:46

changqi1

gpuDNN

2a847b6

gpuDNN

gpuDNN v0.1 (based on oneAPI 2024.0)

Add dnn kernels.

Assets 2

26 Sep 15:07

changqi1

IntrinsicGemm

2a847b6

IntrinsicGemm

xDNN v1.5.2

Add AMX_FP16 support for small gemm.
Built with GCC 13.2.1

xDNN v1.5.1

Add xdnn_hgemm_f32f16f32_packb_block.

xDNN v1.5.0

Add hgemm w/ fp32 bias.

xDNN v1.4.6

Add alpha and beta param to small_sgemm_f32f16bf16.

xDNN v1.4.5

Add post op gelu activation.

xDNN v1.4.4

Fix AMX illegal instruction issue.

xDNN v1.4.3

Add small_sgemm_f32bf16bf16.
Add small_sgemm_f32f16bf16.

xDNN v1.4.2

Support amx_gemm_bf16bf16bf16 kernel w/ any shapes.

xDNN v1.4.1

Add sgemm_bf16bf16f32 and sgemm_f32bf16bf16 kernels.
Add softmax kernels.

xDNN v1.4.0

Add hgemm_f32u4f32 kernels.
Add sgemm_f32nf4f32 kernels.

xDNN v1.3.1

Fix sgemm_f32u4f32 kernels parallel bug.

xDNN v1.3.0

Add sgemm_f32u4f32 kernels

xDNN v1.2.1

Add xdnn_small_amx_sgemm_bf16bf16bf16_packb implemention with transposed weight.

xDNN v1.2

Add xdnn_small_amx_sgemm_bf16bf16bf16_packb implemention.
Add xdnn_small_amx_sgemm_bf16bf16bf16_compute implemention.

xDNN v1.1

Add bgemm_f32bf16f32_packb weight format BA16a64b2a.
Add intrinsic extension api.

xDNN v1.0

Add sgemm kernels
Add sgemm_f32f16f32 kernels
Add sgemm_f32i8f32 kernels
Add hgemm_f32f16f32 kernels
Add hgemm_f16f16f32 kernels
Add hgemm_f32i8f32 kernels
Add bgemm_f32bf16f32 kernels

Assets 32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG fix

What's Changed

Contributors

Models

Functionality

BUG fix

What's Changed

New Contributors

Contributors

Models

Performance Optimizations

Functionality

Models

Performance Optimizations

Functionality

BUG fix

Support models

Features

gpuDNN v0.1 (based on oneAPI 2024.0)

xDNN v1.5.2

xDNN v1.5.1

xDNN v1.5.0

xDNN v1.4.6

xDNN v1.4.5

xDNN v1.4.4

xDNN v1.4.3

xDNN v1.4.2

xDNN v1.4.1

xDNN v1.4.0

xDNN v1.3.1

xDNN v1.3.0

xDNN v1.2.1

xDNN v1.2

xDNN v1.1

xDNN v1.0

Releases: intel/xFasterTransformer

v1.3.1

BUG fix

What's Changed

Contributors

v1.3.0 - Qwen model support enhancement and added support for the SecLLM (YaRN-Llama) model.

Models

Functionality

BUG fix

What's Changed

New Contributors

Contributors

v1.2.0 - Qwen models and much more data types supported.

Models

Performance Optimizations

Functionality

v1.1.0 - Baichuan models supported.

Models

Performance Optimizations

Functionality

BUG fix

source-publish

v1.0.0

Support models

Features

gpuDNN

gpuDNN v0.1 (based on oneAPI 2024.0)

IntrinsicGemm

xDNN v1.5.2

xDNN v1.5.1

xDNN v1.5.0

xDNN v1.4.6

xDNN v1.4.5

xDNN v1.4.4

xDNN v1.4.3

xDNN v1.4.2

xDNN v1.4.1

xDNN v1.4.0

xDNN v1.3.1

xDNN v1.3.0

xDNN v1.2.1

xDNN v1.2

xDNN v1.1

xDNN v1.0