Skip to content

Commit

Permalink
[Eval] Add eval test with opencompass.
Browse files Browse the repository at this point in the history
  • Loading branch information
marvin-Yu committed May 14, 2024
1 parent 819eccc commit dd18f22
Showing 1 changed file with 129 additions and 0 deletions.
129 changes: 129 additions & 0 deletions examples/opencompass/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# xFT Accuracy Evalution with opencompass
OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets. more details information can refer to [https://opencompass.org.cn/](https://opencompass.org.cn/)

## Installation
Below are the steps for quick installation and datasets preparation.

### Environment Setup
``` bash
# setup steps is refer to https://opencompass.org.cn/doc
$ conda create -n opencompass python=3.10 pytorch torchvision torchaudio cpuonly -c pytorch -y
$ conda activate opencompass

$ git clone -b intel/xft https://github.com/marvin-Yu/opencompass.git && cd opencompass
$ pip install -e .
```

### Data Preparation
``` bash
# download core dataset
$ wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-core-20240207.zip

$ unzip OpenCompassData-core-20240207.zip

# # download full dataset
# $ wget https://github.com/open-compass/opencompass/releases/download/0.2.2.rc1/OpenCompassData-complete-20240207.zip
# $ unzip OpenCompassData-complete-20240207.zip
# $ cd ./data
# $ find . -name "*.zip" -exec unzip "{}" \;
```

### Model Preparation
Download model weights to the `/data/models` directory (path configuration is not supported temporarily; you can modify the configuration file or create symbolic links). For example, to test the model `chatglm2_6b`:
```bash
/data/models/
├── chatglm2-6b-hf
├── chatglm2-6b-hf-xft
├── ...
```
For exporting xFT models, please refer to [xFT Models Preparation](https://github.com/intel/xFasterTransformer?tab=readme-ov-file#models-preparation).

### xFT Evaluation
``` bash
# list all xFT support models
$ python tools/list_configs.py xft
# +------------------------+----------------------------------------------+
# | Model | Config Path |
# |------------------------+----------------------------------------------|
# | xft_llama2_13b_chat | configs/models/xft/xft_llama2_13b_chat.py |
# | xft_llama2_70b_chat | configs/models/xft/xft_llama2_70b_chat.py |
# | xft_llama2_7b_chat | configs/models/xft/xft_llama2_7b_chat.py |
# | xft_chatglm2_6b | configs/models/xft/xft_chatglm2_6b.py |
# | xft_chatglm3_6b | configs/models/xft/xft_chatglm3_6b.py |
# | xft_chatglm_6b | configs/models/xft/xft_chatglm_6b.py |
# | xft_gemma_2b_it | configs/models/xft/xft_gemma_2b_it.py |
# | xft_gemma_7b_it | configs/models/xft/xft_gemma_7b_it.py |
# | ............... | ..................................... |
# +------------------------+----------------------------------------------+

# list dataset than you want.
$ python tools/list_configs.py ceval
# +--------------------------------+------------------------------------------------------------------+
# | Dataset | Config Path |
# |--------------------------------+------------------------------------------------------------------|
# | ceval_gen | configs/datasets/ceval/ceval_gen.py |
# | ceval_gen_5f30c7 | configs/datasets/ceval/ceval_gen_5f30c7.py |
# | ceval_ppl | configs/datasets/ceval/ceval_ppl.py |
# | ceval_ppl_93e5ce | configs/datasets/ceval/ceval_ppl_93e5ce.py |
# | ............... | ............................................. |
# +--------------------------------+------------------------------------------------------------------+

# run eval test
$ python run.py --max-num-workers 1 --models xft_chatglm2_6b --datasets ceval_gen
# 20240416_100621
# tabulate format
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# dataset version metric mode xft_chatglm2_6b-bf16 xft_chatglm2_6b-xft-bf16
# ---------------------------------------------- --------- ------------- ------ --------------------- -------------------------
# ceval-computer_network db9ce2 accuracy gen 47.37 47.37
# ceval-operating_system 1c2571 accuracy gen xx.xx xx.xx
# ceval-computer_architecture a74dad accuracy gen xx.xx xx.xx
# ceval-college_programming 4ca32a accuracy gen xx.xx xx.xx
# ceval-college_physics 963fa8 accuracy gen xx.xx xx.xx
# ceval-college_chemistry e78857 accuracy gen xx.xx xx.xx
# ceval-advanced_mathematics ce03e2 accuracy gen xx.xx xx.xx
# ...
```

# FAQ

### AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'?
`pip install --force-reinstall transformers==4.33.0`

### for `TruthfulQA` dataset, pls install deps with `bleurt`
`pip install git+https://github.com/google-research/bleurt.git`

### add env macro to control the test case, (XFT_ONLY_XFT & XFT_DTYPE_LIST & XFT_KVCACHE_DTYPE_LIST)
```
# XFT_ONLY_XFT
# This environment variable is used exclusively for testing with XFT, without testing with HF models.
# XFT_DTYPE_LIST contains a list like:
# [
# "fp16",
# "bf16",
# "int8",
# "w8a8",
# "int4",
# "nf4",
# "bf16_fp16",
# "bf16_int8",
# "bf16_w8a8",
# "bf16_int4",
# "bf16_nf4",
# "w8a8_int8",
# "w8a8_int4",
# "w8a8_nf4",
# ]
# For example, it can include a single parameter like XFT_DTYPE_LIST=bf16
# or multiple parameters separated by commas like XFT_DTYPE_LIST=bf16,fp16,int8.
# XFT_KVCACHE_DTYPE_LIST environment variable contains a list of data types used for XFT KV cache.
# [
# "fp32",
# "fp16",
# "int8",
# ]
# For example, it can include a single parameter like XFT_KVCACHE_DTYPE_LIST=fp16
# or multiple parameters separated by commas like XFT_KVCACHE_DTYPE_LIST=fp32,fp16,int8.
```

0 comments on commit dd18f22

Please sign in to comment.