Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

[Optimization] Text-generation support qwen #513

Merged
merged 43 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
f04d0fd
[CPP Graph] Opt qbits dequant (#465)
zhewang1-intc Oct 19, 2023
4adacf1
use INC 2.3.1
VincyZhang Oct 19, 2023
d962f58
use INC 2.3.1 (#500)
VincyZhang Oct 19, 2023
66238a5
[RUNTIME] Enabing streaming llm for Runtime (#501)
zhenwei-intel Oct 19, 2023
ea112e7
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 19, 2023
51485c6
Reduce the UT evaluation time (#498)
changwangss Oct 19, 2023
ff4abb8
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 19, 2023
9bdc764
Minor fix (#507)
VincyZhang Oct 19, 2023
6bd2b60
support qwen
changwangss Oct 19, 2023
ea720c2
Fix ChatGLM2 model loading issue (#510)
lvliang-intel Oct 19, 2023
02523e9
Update README.md
hshen14 Oct 19, 2023
0cff05a
Remove OneDNN env setint for BF16 inference (#509)
lvliang-intel Oct 20, 2023
1bee379
remove invalid code
changwangss Oct 20, 2023
ea69f9a
support Avx2 (#493)
yuchengliu1 Oct 20, 2023
f7d0d97
add neuralchat ut for audio util (#466)
Liangyx2 Oct 20, 2023
b9155ef
reduce ut time consumption (#499)
xin3he Oct 20, 2023
5f4175a
update python api readme (#504)
zhenwei-intel Oct 20, 2023
a8873ea
Add docker setup session for neuralchat finetuning sample (#496)
louie-tsai Oct 20, 2023
22fe7ad
Update README.md
hshen14 Oct 20, 2023
53b1b61
Update run_generation.py
changwangss Oct 20, 2023
b38241d
Update README.md
hshen14 Oct 20, 2023
1d91245
Update README.md
hshen14 Oct 20, 2023
18d9c57
Update README.md
hshen14 Oct 20, 2023
f98d72a
Update README.md
hshen14 Oct 20, 2023
0f6aee6
Update README.md
hshen14 Oct 20, 2023
a8db98f
Update README.md for fast token issue (#515)
louie-tsai Oct 21, 2023
52717e4
Fix typo in README.md (#516)
eltociear Oct 21, 2023
3cf68ee
Update README.md
hshen14 Oct 21, 2023
7fb944a
Update README.md
hshen14 Oct 21, 2023
7fed478
Update README.md
hshen14 Oct 21, 2023
dc81e4c
Update README.md
hshen14 Oct 21, 2023
dcfbcfd
improve Avx2 (#511)
yuchengliu1 Oct 21, 2023
a615905
Merge branch 'main' of https://github.com/intel/intel-extension-for-t…
VincyZhang Oct 21, 2023
61993cc
Revert "update python api readme (#504)"
VincyZhang Oct 21, 2023
4144197
Merge branch 'main' into wangchang/qwen
VincyZhang Oct 21, 2023
5b01e95
Update README.md
hshen14 Oct 22, 2023
bfb6a25
Update README.md (#519)
ayushrakesh Oct 22, 2023
0e0a9eb
docs: fix typos in question answering of pytorch (#520)
shresthasurav Oct 22, 2023
ec29f2f
fixed typos (#522)
Smoothieewastaken Oct 23, 2023
1357a02
Updated README.md (#517)
alienishi Oct 23, 2023
b3e4b25
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
572ecbf
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
2e77b6b
Merge branch 'main' into wangchang/qwen
changwangss Oct 23, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions .github/workflows/script/unitTest/env_setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,16 @@ if [ ${inc} != 0 ]; then
fi

echo "Install neural_compressor binary..."
n=0
until [ "$n" -ge 5 ]; do
git clone https://github.com/intel/neural-compressor.git /neural-compressor
cd /neural-compressor
pip install -r requirements.txt
python setup.py install && break
n=$((n + 1))
sleep 5
done
pip install neural-compressor
#n=0
#until [ "$n" -ge 5 ]; do
# git clone https://github.com/intel/neural-compressor.git /neural-compressor
# cd /neural-compressor
# pip install -r requirements.txt
# python setup.py install && break
# n=$((n + 1))
# sleep 5
#done

# Install test requirements
cd /intel-extension-for-transformers/tests
Expand Down
4 changes: 3 additions & 1 deletion .github/workflows/unit-test-neuralchat.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ jobs:
podman run -dit --disable-content-trust --privileged --name=${{ env.CONTAINER_NAME }} -v /dev/shm:/dev/shm \
-v ${{ github.workspace }}:/intel-extension-for-transformers \
-v ~/.cache/oneAPI:/cache \
-v /models:/models \
-v /media:/media \
${{ env.REPO_NAME }}:${{ env.REPO_TAG }}

- name: Env build
Expand Down Expand Up @@ -143,4 +145,4 @@ jobs:
with:
name: Neural Chat Unit Test
path: ${{ github.workspace }}/log_dir
retention-days: 5
retention-days: 5
47 changes: 24 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,11 @@ Intel® Extension for Transformers
</div>

## 🚀Latest News
* <b>NeuralChat has been showcased in [Intel Innovation’23 Keynote](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.</b>
* <b>NeuralChat supports custom chatbot development and deployment on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md) and see below sample code. </b>

```python
# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
```

* <b>LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting mainstream low precision data types such as INT8/FP8/INT4/FP4/NF4.</b>
* [2023/10] LLM runtime, an Intel-optimized [GGML](https://github.com/ggerganov/ggml) compatible runtime, demonstrates **up to 15x performance gain in 1st token generation and 1.5x in other token generation** over the default [llama.cpp](https://github.com/ggerganov/llama.cpp).
* [2023/10] LLM runtime now supports LLM inference with **infinite-length inputs up to 4 million tokens**, inspired from [StreamingLLM](https://arxiv.org/abs/2309.17453).
* [2023/09] NeuralChat has been showcased in [**Intel Innovation’23 Keynote**](https://www.youtube.com/watch?v=RbKRELWP9y8&t=2954s) and [Google Cloud Next'23](https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next-23) to demonstrate GenAI/LLM capabilities on Intel Xeon Scalable Processors.
* [2023/08] NeuralChat supports **custom chatbot development and deployment within minutes** on broad Intel HWs such as Xeon Scalable Processors, Gaudi2, Xeon CPU Max Series, Data Center GPU Max Series, Arc Series, and Core Processors. Check out [Notebooks](./intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).
* [2023/07] LLM runtime extends Hugging Face Transformers API to provide seamless low precision inference for popular LLMs, supporting low precision data types such as INT3/INT4/FP4/NF4/INT5/INT8/FP8.

---
<div align="left">
Expand All @@ -34,25 +28,31 @@ pip install intel-extension-for-transformers
> For more installation methods, please refer to [Installation Page](./docs/installation.md)

## 🌟Introduction
Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:

Intel® Extension for Transformers is an innovative toolkit to accelerate Transformer-based models on Intel platforms, in particular, effective on 4th Intel Xeon Scalable processor Sapphire Rapids (codenamed [Sapphire Rapids](https://www.intel.com/content/www/us/en/products/docs/processors/xeon-accelerated/4th-gen-xeon-scalable-processors.html)). The toolkit provides the below key features and examples:

* Seamless user experience of model compressions on Transformer-based models by extending [Hugging Face transformers](https://github.com/huggingface/transformers) APIs and leveraging [Intel® Neural Compressor](https://github.com/intel/neural-compressor)


* Advanced software optimizations and unique compression-aware runtime (released with NeurIPS 2022's paper [Fast Distilbert on CPUs](https://arxiv.org/abs/2211.07715) and [QuaLA-MiniLM: a Quantized Length Adaptive MiniLM](https://arxiv.org/abs/2210.17114), and NeurIPS 2021's paper [Prune Once for All: Sparse Pre-Trained Language Models](https://arxiv.org/abs/2111.05754))


* Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list) and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa)
* Optimized Transformer-based model packages such as [Stable Diffusion](examples/huggingface/pytorch/text-to-image/deployment/stable_diffusion), [GPT-J-6B](examples/huggingface/pytorch/text-generation/deployment), [GPT-NEOX](examples/huggingface/pytorch/language-modeling/quantization#2-validated-model-list), [BLOOM-176B](examples/huggingface/pytorch/language-modeling/inference#BLOOM-176B), [T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), [Flan-T5](examples/huggingface/pytorch/summarization/quantization#2-validated-model-list), and end-to-end workflows such as [SetFit-based text classification](docs/tutorials/pytorch/text-classification/SetFit_model_compression_AGNews.ipynb) and [document level sentiment analysis (DLSA)](workflows/dlsa)

* [NeuralChat](intel_extension_for_transformers/neural_chat), a customizable chatbot framework to create your own chatbot within minutes by leveraging a rich set of plugins [Knowledge Retrieval](./intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md), [Speech Interaction](./intel_extension_for_transformers/neural_chat/pipeline/plugins/audio/README.md), [Query Caching](./intel_extension_for_transformers/neural_chat/pipeline/plugins/caching/README.md), [Security Guardrail](./intel_extension_for_transformers/neural_chat/pipeline/plugins/security/README.md).


* [Inference](intel_extension_for_transformers/llm/runtime/graph) of Large Language Model (LLM) in pure C/C++ with weight-only quantization kernels, supporting [GPT-NEOX](intel_extension_for_transformers/llm/runtime/graph/models/gptneox), [LLAMA](intel_extension_for_transformers/llm/runtime/graph/models/llama), [MPT](intel_extension_for_transformers/llm/runtime/graph/models/mpt), [FALCON](intel_extension_for_transformers/llm/runtime/graph/models/falcon), [BLOOM-7B](intel_extension_for_transformers/llm/runtime/graph/models/bloom), [OPT](intel_extension_for_transformers/llm/runtime/graph/models/opt), [ChatGLM2-6B](intel_extension_for_transformers/llm/runtime/graph/models/chatglm), [GPT-J-6B](intel_extension_for_transformers/llm/runtime/graph/models/gptj) and [Dolly-v2-3B](intel_extension_for_transformers/llm/runtime/graph/models/gptneox)


## 🌱Getting Started
Below are the sample code to enable weight-only low precision inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).
Below is the sample code to enable the chatbot. See more [examples](intel_extension_for_transformers/neural_chat/docs/full_notebooks.md).

### Chatbot
```python
# pip install intel-extension-for-transformers
from intel_extension_for_transformers.neural_chat import build_chatbot
chatbot = build_chatbot()
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
```

Below is the sample code to enable weight-only INT4/INT8 inference. See more [examples](intel_extension_for_transformers/llm/runtime/graph).

### INT4 Inference
```python
Expand Down Expand Up @@ -90,7 +90,7 @@ outputs = tokenizer.batch_decode(gen_tokens)

## 🎯Validated Models
Here is the average accuracy of validated models on Lambada (OpenAI), HellaSwag, Winogrande, PIQA, and WikiText.
The next token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.
The subsequent token latency is based on 32 input tokens and greedy search on Intel's 4th Generation Xeon Scalable Sapphire Rapids processor.

| Model | FP32 | INT4 Accuracy (Group size 32) | INT4 Accuracy (Group size 128) | Next Token Latency |
|---------------------|:----------------------:|:-----------------------:|:----------------------------:|:------------:|
Expand Down Expand Up @@ -136,8 +136,9 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_
</tr>
<tr>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md">LLM Runtime</a></td>
<td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
<td colspan="3" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/README.md#2-run-llm-with-python-api">Streaming LLM</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/core/README.md">Low Precision Kernels</a></td>
<td colspan="2" align="center"><a href="intel_extension_for_transformers/llm/runtime/graph/tensor_parallelism.md">Tensor Parallelism</a></td>
</tr>
<tr>
<th colspan="8" align="center">LLM COMPRESSION</th>
Expand Down Expand Up @@ -204,10 +205,10 @@ Find other models like ChatGLM, ChatGLM2, StarCoder... in [LLM Runtime](./intel_


## Acknowledgements
* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), and many others.
* Excellent open-source projects: [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [FastChat](https://github.com/lm-sys/FastChat), [fastRAG](https://github.com/IntelLabs/fastRAG), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [llama.cpp](https://github.com/ggerganov/llama.cpp), [lm-evauation-harness](https://github.com/EleutherAI/lm-evaluation-harness), [peft](https://github.com/huggingface/peft), [trl](https://github.com/huggingface/trl), [streamingllm](https://github.com/mit-han-lab/streaming-llm) and many others.

* Thanks to all the contributors including [Ikko Eltociear Ashimine](https://github.com/eltociear), [Hardik Kamboj](https://github.com/hardikkamboj), [Sangjune Park](https://github.com/JJukE), [Kevin Ta](https://github.com/kta-intel), [Huiyan Cao](https://github.com/huiyan2021), [Xigui Wang](https://github.com/xiguiw), [Jiafu Zhang](https://github.com/jiafuzha), [Tyler Titsworth](https://github.com/tylertitsworth), [Yi Wang](https://github.com/sywangyi), [Samanway Sadhu](https://github.com/SamanwaySadhu), [Jiqing Feng](https://github.com/jiqing-feng), [Jonathan Mamou](https://github.com/jmamou) and [Niroop Ammbashankar](https://github.com/nammbash).

## 💁Collaborations

Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:[email protected]) and look forward to our collaborations on Intel Extension for Transformers!
Welcome to raise any interesting ideas on model compression techniques and LLM-based chatbot development! Feel free to reach [us](mailto:[email protected]), and we look forward to our collaborations on Intel Extension for Transformers!
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
torchvision
onnx>=1.12
onnxruntime==1.13.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
torchvision
onnx>=1.12
onnxruntime==1.13.1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
transformers
torch==2.0.1
torch==2.1.0
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Modify the `user.conf` when you run different models:

+ When you run minilm, please also add `--minilm=true` for both performance and accuracy.

+ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both perfomance and accuracy.
+ When you run benchmark on `SPR` machine, please add `--inter_parallel=28 and set --INST_NUM=28` for both performance and accuracy.

+ When you run bert large please keep batch size as 4.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ graph.save('./ir')
```

# Benchmark
If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir fisrt.
If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir first.

By setting ``--dynamic_quanzite`` for FP32 model, you could benchmark dynamic quantize int8 model.
## Accuracy
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.12.1

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ NOTES: ** the multiplication and addition operation amount when model inference
</tr>
<tr>
<td>IRQ Balance</td>
<td>Eabled</td>
<td>Enabled</td>
</tr>
<tr>
<td>CPU Model</td>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
transformers
datasets
torchprofile
torch==2.0.1
torch==2.1.0
intel_extension_for_pytorch
accelerate
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ python run_qa.py \

## Step 2: Distributed Data Parallel Training

We supporte Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
We support Distributed Data Parallel training on single node and multi nodes settings for pruning. To use Distributed Data Parallel to speedup training, the bash command needs a small adjustment.
<br>
*`<MASTER_ADDRESS>`* is the address of the master node, it won't be necessary for single node case,
<br>
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
datasets >= 1.8.0
torch==2.0.1
torch==2.1.0
transformers
wandb
accelerate
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
accelerate
datasets
transformers
torch==2.0.1
torch==2.1.0
neural-compressor==2.0
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ sentencepiece != 0.1.92
rouge-score
nltk
py7zr
torch==2.0.1
torch==2.1.0
transformers
protobuf
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch==2.0.1
torch==2.1.0
numpy
transformers
datasets
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ accelerate
datasets >= 1.8.0
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
onnx>=1.12
onnxruntime==1.13.1

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch==2.0.1
torch==2.1.0
transformers
datasets
allennlp
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ sentencepiece
scipy
scikit-learn
protobuf
torch==2.0.1
torch==2.1.0
evaluate
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
accelerate
torch==2.0.1
torch==2.1.0
datasets >= 1.1.3
sentencepiece != 0.1.92
transformers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@ accelerate
datasets >= 1.1.3
sentencepiece != 0.1.92
protobuf
torch==2.0.1
torch==2.1.0
transformers
wandb
Loading