Skip to content

Commit

Permalink
Add PR test workflow and check-in more testcases (#1208)
Browse files Browse the repository at this point in the history
* refactor test case

* refactor test case

* refactor testcase

* fix cuda allocate

* fix cuda-prefix in pr run

* Update daily_ete_test.yml

* chage internlm2 model coverage to 20b in testcase

* chage internlm2 model coverage to 20b in testcase

* chage internlm2 model coverage to 20b in testcase

* fix mp blocked by allocate cuda

* add kvint8 and w4a16 chat cover

* modify timeout for each step

* fix lint

* update prompt and pr trigger

* update runner config

* Update daily_ete_test.yml

* change job name
  • Loading branch information
zhulinJulia24 authored Mar 1, 2024
1 parent cc06bba commit 0430349
Show file tree
Hide file tree
Showing 40 changed files with 1,846 additions and 1,480 deletions.
68 changes: 49 additions & 19 deletions .github/workflows/daily_ete_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ name: daily_ete_test
on:
workflow_dispatch:
schedule:
- cron: '00 23 * * *'
- cron: '00 18 * * *'

env:
HOST_PIP_CACHE_DIR: /nvme/github-actions/pip-cache
Expand All @@ -13,7 +13,7 @@ env:
jobs:
test_functions:
runs-on: [self-hosted, linux-a100]
timeout-minutes: 240
timeout-minutes: 420
env:
REPORT_DIR: /nvme/qa_test_models/test-reports
container:
Expand Down Expand Up @@ -68,36 +68,66 @@ jobs:
run: |
python3 -m pip list
lmdeploy check_env
- name: Test lmdeploy - quantization
- name: Test lmdeploy - quantization w4a16
continue-on-error: true
run: |
pytest autotest -m '(quantization or quantization_w8a8) and not Baichuan2_7B_Chat and not Baichuan2_13B_Chat' -n 8 --alluredir=allure-results --clean-alluredir
pytest autotest/tools/quantization/test_quantization_w4a16.py -m 'not pr_test' -n 8 --alluredir=allure-results --clean-alluredir
- name: Test lmdeploy - quantization kv int8
continue-on-error: true
run: |
pytest autotest/tools/quantization/test_quantization_kvint8.py -n 8 --alluredir=allure-results
- name: Test lmdeploy - quantization w8a8
continue-on-error: true
run: |
pytest autotest/tools/quantization/test_quantization_w8a8.py -n 8 --alluredir=allure-results
- name: Test lmdeploy - quantization kv int8 and w4a16
continue-on-error: true
run: |
pytest autotest/tools/quantization/test_quantization_kvint8_w4a16.py -n 8 --alluredir=allure-results
- name: Test lmdeploy - convert
continue-on-error: true
run: |
pytest autotest -m 'convert and not Baichuan2_7B_Chat and not Baichuan2_13B_Chat' -n 6 --alluredir=allure-results
- name: Test lmdeploy - pipeline
pytest autotest/tools/convert -m 'not pr_test' -n 6 --alluredir=allure-results --dist loadgroup
- name: Test lmdeploy - interface turbomind case
continue-on-error: true
timeout-minutes: 60
run: pytest autotest -m '(pipeline_chat) and not Baichuan2_7B_Chat and not Baichuan2_13B_Chat' --alluredir=allure-results
- name: Test lmdeploy - restful
timeout-minutes: 20
run: |
pytest autotest/interface/pipeline/test_pipeline_turbomind_func.py -m 'not pr_test' --alluredir=allure-results
- name: Test lmdeploy - pipeline turbomind
continue-on-error: true
run: pytest autotest -m restful_api --alluredir=allure-results
- name: Test lmdeploy - chat
timeout-minutes: 45
run: pytest autotest/tools/pipeline/test_pipeline_chat_turbomind.py -m 'not pr_test' --alluredir=allure-results
- name: Test lmdeploy - pipeline torch
continue-on-error: true
timeout-minutes: 75
run: pytest autotest/tools/pipeline/test_pipeline_chat_pytorch.py -m 'not pr_test' --alluredir=allure-results
- name: Test lmdeploy - restful turbomind
continue-on-error: true
timeout-minutes: 60
run: pytest autotest/tools/restful/test_restful_chat_turbomind.py -m 'not pr_test' --alluredir=allure-results
- name: Test lmdeploy - restful torch
continue-on-error: true
timeout-minutes: 80
run: pytest autotest/tools/restful/test_restful_chat_pytorch.py -m 'not pr_test' --alluredir=allure-results
- name: Test lmdeploy - chat workspace
continue-on-error: true
timeout-minutes: 30
run: |
pytest autotest -m '(command_chat or command_chat_hf or command_chat_pytorch) and not Baichuan2_7B_Chat and not Baichuan2_13B_Chat' -n 4 --alluredir=allure-results
- name: Downgrade transformers
run: python3 -m pip install transformers==4.33.0
- name: Test lmdeploy - run Baichuan
pytest autotest/tools/chat/test_command_chat_workspace.py -m 'not pr_test' -n 4 --alluredir=allure-results
- name: Test lmdeploy - chat hf turbomind
continue-on-error: true
timeout-minutes: 50
timeout-minutes: 45
run: |
pytest autotest -m '(Baichuan2_7B_Chat or Baichuan2_13B_Chat) and not pipeline_chat_pytorch' --alluredir=allure-results
- name: Test lmdeploy - rerun fail cases
pytest autotest/tools/chat/test_command_chat_hf_turbomind.py -m 'not pr_test' -n 4 --alluredir=allure-results
- name: Test lmdeploy - chat hf torch
continue-on-error: true
timeout-minutes: 60
run: |
pytest autotest/tools/chat/test_command_chat_hf_pytorch.py -m 'not pr_test' -n 4 --alluredir=allure-results
- name: Test lmdeploy - rerun all fail cases
timeout-minutes: 60
run: |
pytest autotest --alluredir=allure-results --lf
pytest autotest --lf --alluredir=allure-results
- name: Generate reports
if: always()
run: |
Expand Down
99 changes: 99 additions & 0 deletions .github/workflows/pr_ete_test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
name: pr_ete_test

on:
pull_request:
paths:
- ".github/workflows/pr_ete_test.yml"
- "cmake/**"
- "src/**"
- "autotest/**"
- "3rdparty/**"
- "lmdeploy/**"
- "requirements/**"
- "requirements.txt"
- "CMakeLists.txt"
- "setup.py"
workflow_dispatch:


env:
HOST_PIP_CACHE_DIR: /nvme/github-actions/pip-cache
HOST_LOCALTIME: /usr/share/zoneinfo/Asia/Shanghai


jobs:
pr_functions_test:
runs-on: [self-hosted, linux-a100-pr]
timeout-minutes: 120
env:
REPORT_DIR: /nvme/qa_test_models/test-reports
container:
image: nvcr.io/nvidia/tritonserver:22.12-py3
options: "--gpus=all --ipc=host --user root -e PIP_CACHE_DIR=/root/.cache/pip"
volumes:
- /nvme/share_data/github-actions/pip-cache:/root/.cache/pip
- /nvme/share_data/github-actions/packages:/root/packages
- /nvme/qa_test_models:/nvme/qa_test_models
- /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime:ro
steps:
- name: Setup systems
run: |
rm /etc/apt/sources.list.d/cuda*.list
apt-get update && apt-get install -y --no-install-recommends rapidjson-dev \
libgoogle-glog-dev libgl1 openjdk-8-jre-headless
dpkg -i /root/packages/allure_2.24.1-1_all.deb
rm -rf /var/lib/apt/lists/*
- name: Clone repository
uses: actions/checkout@v2
- name: Install pytorch
run: |
python3 -m pip cache dir
python3 -m pip install torch==2.1.0 torchvision==0.16.0 --index-url https://download.pytorch.org/whl/cu118
- name: Build lmdeploy
run: |
python3 -m pip install cmake
python3 -m pip install -r requirements/build.txt
mkdir build
cd build
cmake .. \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
-DCMAKE_INSTALL_PREFIX=/opt/tritonserver \
-DBUILD_PY_FFI=ON \
-DBUILD_MULTI_GPU=ON \
-DCMAKE_CUDA_FLAGS="-lineinfo" \
-DUSE_NVTX=ON \
-DSM=80 \
-DCMAKE_CUDA_ARCHITECTURES=80 \
-DBUILD_TEST=OFF
make -j$(nproc) && make install
- name: Install lmdeploy
run: |
python3 -m pip install packaging protobuf transformers_stream_generator transformers datasets
# manually install flash attn
# the install packeage from. https://github.com/Dao-AILab/flash-attention/releases/download/v2.3.6/flash_attn-2.3.6+cu118torch2.0cxx11abiFALSE-cp38-cp38-linux_x86_64.whl
python3 -m pip install /root/packages/flash_attn-2.3.6+cu118torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl
python3 -m pip install -r requirements.txt -r requirements/test.txt
python3 -m pip install .
- name: Check env
run: |
python3 -m pip list
lmdeploy check_env
- name: Test lmdeploy
timeout-minutes: 120
run: CUDA_VISIBLE_DEVICES=5,6 pytest autotest -m pr_test --alluredir=allure-results --clean-alluredir
- name: Generate reports
if: always()
run: |
export date_today="$(date +'%Y%m%d-%H%M%S')"
export report_dir="$REPORT_DIR/$date_today"
echo "Save report to $ALLURE_DIR"
allure generate -c -o $report_dir
- name: Clear workfile
if: always()
run: |
export workdir=$(pwd)
cd ..
rm -rf $workdir
mkdir $workdir
chmod -R 777 $workdir
76 changes: 36 additions & 40 deletions autotest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@

We provide a autotest caseset to do regression.

## Preparation before testing

To improve the efficiency of test case execution, we have downloaded the hf model files to a specific path in advance for easy use in test cases. The path where the model files are stored is defined in the `autotest/config.yaml` file with parameter `model_path`.

Since the test cases involve converting the hf model using convert, the converted model storage path is defined in the `autotest/config.yaml` file parameter `dst_path`.

The `autotest/config.yaml` file also defines the supported model table and corresponding model categories, such as the `model_map` parameter, as well as the log storage path `log_path` used during test case execution.

If you want to create a test environment, you need to prepare the above content and modify the config.yaml file as needed.

## How to run testcases

Install required dependencies using the following command line:
Expand All @@ -10,10 +20,12 @@ Install required dependencies using the following command line:
python3 -m pip install -r requirements/test.txt
```

Run pytest command line with case filtering through -m flag. eg: `-m internlm_chat_7b` Filter cases related to internlm_chat_7b. The corresponding results will be stored in the `allure-results` directory.
Run pytest command line with case filtering through -m flag or folder name. eg: `-m convert` Filter cases related to convert or `autotest/tools/convert` for the case in the folder. The corresponding results will be stored in the `allure-results` directory.

```bash
pytest autotest -m internlm_chat_7b --clean-alluredir --alluredir=allure-results
pytest autotest -m convert --clean-alluredir --alluredir=allure-results
pytest autotest/tools/convert --clean-alluredir --alluredir=allure-results

```

If you need to generate reports and display report features, you need to install allure according to the [install documentation of allure](https://allurereport.org/docs/gettingstarted-installation/#install-via-the-system-package-manager-for-linux). You can also install it directly using the following command:
Expand All @@ -32,53 +44,37 @@ allure generate -c -o allure-reports
allure open ./allure-reports
```

## Preparation before testing

To improve the efficiency of test case execution, we have downloaded the hf model files to a specific path in advance for easy use in test cases. The path where the model files are stored is defined in the `autotest/config.yaml` file with parameter `model_path`.

Since the test cases involve converting the hf model using convert, the converted model storage path is defined in the `autotest/config.yaml` file parameter `dst_path`.

The `autotest/config.yaml` file also defines the supported model table and corresponding model categories, such as the `model_map` parameter, as well as the log storage path `log_path` used during test case execution.

If you want to create a test environment, you need to prepare the above content and modify the config.yaml file as needed.

## Test case functionality coverage

The test cases cover the following functionalities:
The testcases are including following models:

tools model - related to tutorials, the case is basic

![image](https://github.com/InternLM/lmdeploy/assets/145004780/85d6a2d3-cc4f-459c-8dc1-22c17b69954f)
interface model - interface function cases of pipeline、 restful api and triton server api

The relationship between functionalities and test cases is as follows:

| Function | Test Case File |
| :---------------------: | :-------------------------------: |
| w4a16 quantization | test_order1_quantization_w4 |
| w8a8 quantization | test_order1_quantization_w8a8 |
| convert | test_order2_convert |
| pipeline chat | test_order3_pipeline_chat |
| pipeline chat - pytorch | test_order3_pipeline_chat_pytorch |
| restful_api chat | test_order3_restful_chat |
| command chat - cli | test_order3_command_chat |
| command chat - hf | test_order3_command_chat_hf |
| command chat - pytorch | test_order3_command_chat_pytorch |

The modules and models currently covered by the test cases are listed below:

| Models | w4a16 quantization | w8a8 quantization | kvint8 quantization | convert | pipeline chat | pipeline chat - pytorch | restful_api chat | command chat - cli | command chat - hf | command chat - pytorch |
| :------------------------------------------------------------------------: | :----------------: | :---------------: | :-----------------: | :-----: | :-----------: | :---------------------: | :--------------: | :----------------: | :---------------: | :--------------------: |
| [internlm2_chat_7b](https://huggingface.co/internlm/internlm2-chat-7b) | No | No | No | Yes | Yes | Yes | No | Yes | Yes | Yes |
| [internlm2_chat_20b](https://huggingface.co/internlm/internlm2-chat-20b) | Yes | Yes | No | Yes | Yes | No | Yes | Yes | Yes | Yes |
| [internlm_chat_7b](https://huggingface.co/internlm/internlm-chat-7b) | No | No | No | Yes | Yes | Yes | Yes | Yes | Yes | No |
| [internlm_chat_20b](https://huggingface.co/internlm/internlm-chat-20b) | Yes | No | No | Yes | Yes | No | No | Yes | Yes | No |
| [llama2_chat_7b_w4](https://huggingface.co/lmdeploy/llama2-chat-7b-w4) | No | No | No | Yes | Yes | No | No | Yes | Yes | No |
| [Qwen_7B_Chat](https://huggingface.co/Qwen/Qwen-7B-Chat) | Yes | No | No | Yes | Yes | No | No | Yes | Yes | No |
| [Qwen_14B_Chat](https://huggingface.co/Qwen/Qwen-14B-Chat) | Yes | No | No | Yes | Yes | No | No | Yes | Yes | No |
| [Baichuan2_7B_Chat](https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat) | Yes | No | No | Yes | Yes | No | No | Yes | Yes | No |
| [llama_2_7b_chat](https://huggingface.co/meta-llama/Llama-2-7b-chat) | Yes | No | No | Yes | Yes | No | No | Yes | Yes | No |
| case model | Function | Test Case File |
| :--------: | :------------------------------: | :--------------------------------------------------: |
| tools | quantization - w4a16 | tools/quantization/test_quantization_w4a16.py |
| tools | quantization - w8a8 | tools/quantization/test_quantization_w8a8.py |
| tools | quantization - kv int8 | tools/quantization/test_quantization_kvint8.py |
| tools | quantization - kv int8 and w4a16 | tools/quantization/test_quantization_kvint8_w4a16.py |
| tools | convert | tools/convert/test_convert.py |
| tools | pipeline chat - turbomind | tools/pipeline/test_pipeline_chat_turbomind.py |
| tools | pipeline chat - pytorch | tools/pipeline/test_pipeline_chat_pytorch.py |
| tools | restful_api chat - turbomind | tools/pipeline/test_restful_chat_turbomind.py |
| tools | restful_api chat - pytorch | tools/pipeline/test_restful_chat_pytorch.py |
| tools | command chat - workspace | tools/chat/test_command_chat_workspace.py |
| tools | command chat - hf turbomind | tools/chat/test_command_chat_hf_turbomind.py |
| tools | command chat - hf pytorch | tools/chat/test_command_chat_hf_pytorch.py |
| interface | command chat - hf pytorch | tools/chat/test_command_chat_hf_pytorch.py |

The modules and models currently covered by the turbomind and pytorch backend is in `autotest/config.yaml` by using turbomind_model and pytorch_model.

## How to add a testcase

you need to confirm that the corresponding model is ready <a href="##Preparation before testing">Jump to prepare Section</a>, then you can copy the existing case in the corresponding function test file. Please modify case mark, case story, case name and parameters if need.
If you want add a new model into tool testcase, you should repare the model in your machine <a href="##Preparation before testing">Jump to prepare Section</a> then add it into `autotest/config.yaml`.

## How to add a chatcase template

Expand Down
Loading

0 comments on commit 0430349

Please sign in to comment.