Do you have any plan to expand the scope to CPU? #1118
mengniwang95
started this conversation in
General
Replies: 1 comment
-
Any updates on this? Looking for one click installers to connect multiple llm models for testing on Intel iris xe gpu on a Lenovo i7 seems impossible to get anything running even 50% to what the other nvda are capable of running. I just got this thing as a gift otherwise would replace it. Thanks in advance |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I find this repo mainly focus on LLM inference on GPUs currently. Do you have any plan to expand the scope to CPU?
Our team develop the Intel® Extension for Transformers, which is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed Sapphire Rapids). The toolkit provides the below key features:
Seamless user experience of model compressions (include RTN, AWQ, GPTQ, bitsandbytes and other our own algorithms in the future for weight-only quantization) on Transformer-based models by extending Hugging Face transformers APIs and leveraging Intel® Neural Compressor
Advanced software optimizations and unique compression-aware runtime.
Optimized Transformer-based model packages.
NeuralChat, a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins and SOTA optimizations.
Inference of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels.
We want to make some contributions to LLM ecosystem sincerely and TGI is a really popular project. So, is there any chance to integrate part of our work into TGI?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions