Skip to content

Commit

Permalink
Support all features for ChatGLM3 (system prompt / function call / co…
Browse files Browse the repository at this point in the history
…de interpreter) (#197)
  • Loading branch information
li-plus authored Nov 22, 2023
1 parent 95d3b8c commit b071907
Show file tree
Hide file tree
Showing 25 changed files with 1,663 additions and 514 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
- name: Lint with black
uses: psf/black@stable
with:
options: "--check --verbose --line-length 120"
options: "--check --verbose"
src: "chatglm_cpp examples tests setup.py"
- name: Test with pytest
run: |
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ __pycache__/
*.egg-info/
dist/
*.so
*.whl
.hypothesis/

# cpp
Expand Down
14 changes: 13 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,19 @@ file(GLOB PY_SOURCES
add_custom_target(lint
COMMAND clang-format -i ${CPP_SOURCES}
COMMAND isort ${PY_SOURCES}
COMMAND black ${PY_SOURCES} --line-length 120)
COMMAND black ${PY_SOURCES} --verbose)

# mypy
add_custom_target(mypy
mypy chatglm_cpp examples --exclude __init__.pyi
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
)

# stub
add_custom_target(stub
pybind11-stubgen chatglm_cpp -o .
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
)

if (MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")
Expand Down
117 changes: 103 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,70 @@ python3 chatglm_cpp/convert.py -i THUDM/chatglm2-6b -t q4_0 -o chatglm2-ggml.bin
<details open>
<summary>ChatGLM3-6B</summary>

ChatGLM3-6B further supports function call and code interpreter in addition to chat mode.

Chat mode:
```sh
python3 chatglm_cpp/convert.py -i THUDM/chatglm3-6b -t q4_0 -o chatglm3-ggml.bin
./build/bin/main -m chatglm3-ggml.bin -p 你好 --top_p 0.8 --temp 0.8
# 你好👋!我是人工智能助手 ChatGLM3-6B,很高兴见到你,欢迎问我任何问题。
```

Setting system prompt:
```sh
./build/bin/main -m chatglm3-ggml.bin -p 你好 -s "You are ChatGLM3, a large language model trained by Zhipu.AI. Follow the user's instructions carefully. Respond using markdown."
# 你好👋!我是 ChatGLM3,有什么问题可以帮您解答吗?
```

Function call:
~~~
$ ./build/bin/main -m chatglm3-ggml.bin --top_p 0.8 --temp 0.8 --sp examples/system/function_call.txt -i
System > Answer the following questions as best as you can. You have access to the following tools: ...
Prompt > 生成一个随机数
ChatGLM3 > random_number_generator
```python
tool_call(seed=42, range=(0, 100))
```
Tool Call > Please manually call function `random_number_generator` with args `tool_call(seed=42, range=(0, 100))` and provide the results below.
Observation > 23
ChatGLM3 > 根据您的要求,我使用随机数生成器API生成了一个随机数。根据API返回结果,生成的随机数为23。
~~~

Code interpreter:
~~~
$ ./build/bin/main -m chatglm3-ggml.bin --top_p 0.8 --temp 0.8 --sp examples/system/code_interpreter.txt -i
System > 你是一位智能AI助手,你叫ChatGLM,你连接着一台电脑,但请注意不能联网。在使用Python解决任务时,你可以运行代码并得到结果,如果运行结果有错误,你需要尽可能对代码进行改进。你可以处理用户上传到电脑上的文件,文件默认存储路径是/mnt/data/。
Prompt > 列出100以内的所有质数
ChatGLM3 > 好的,我会为您列出100以内的所有质数。
```python
def is_prime(n):
"""Check if a number is prime."""
if n <= 1:
return False
if n <= 3:
return True
if n % 2 == 0 or n % 3 == 0:
return False
i = 5
while i * i <= n:
if n % i == 0 or n % (i + 2) == 0:
return False
i += 6
return True
primes_upto_100 = [i for i in range(2, 101) if is_prime(i)]
primes_upto_100
```
Code Interpreter > Please manually run the code and provide the results below.
Observation > [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
ChatGLM3 > 100以内的所有质数为:
$$
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97
$$
~~~

</details>

<details>
Expand Down Expand Up @@ -251,7 +310,7 @@ pip install .
Pre-built wheels for CPU backend on Linux / MacOS / Windows are published on [release](https://github.com/li-plus/chatglm.cpp/releases). For CUDA / Metal backends, please compile from source code or source distribution.
**Using pre-converted ggml models**
**Using Pre-converted GGML Models**
Here is a simple demo that uses `chatglm_cpp.Pipeline` to load the GGML model and chat with it. First enter the examples folder (`cd examples`) and launch a Python interactive shell:
```python
Expand All @@ -264,7 +323,7 @@ Here is a simple demo that uses `chatglm_cpp.Pipeline` to load the GGML model an
To chat in stream, run the below Python example:
```sh
python3 cli_chat.py -m ../chatglm-ggml.bin -i
python3 cli_demo.py -m ../chatglm-ggml.bin -i
```
Launch a web demo to chat in your browser:
Expand All @@ -280,26 +339,56 @@ For other models:
<summary>ChatGLM2-6B</summary>
```sh
python3 cli_chat.py -m ../chatglm2-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
python3 cli_demo.py -m ../chatglm2-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
python3 web_demo.py -m ../chatglm2-ggml.bin --temp 0.8 --top_p 0.8 # web demo
```
</details>
<details open>
<summary>ChatGLM3-6B</summary>
**CLI Demo**
Chat mode:
```sh
python3 cli_demo.py -m ../chatglm3-ggml.bin -p 你好 --temp 0.8 --top_p 0.8
```
Function call:
```sh
python3 cli_chat.py -m ../chatglm3-ggml.bin -p 你好 --temp 0.8 --top_p 0.8 # CLI demo
python3 web_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 # web demo
python3 cli_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 --sp system/function_call.txt -i
```
Code interpreter:
```sh
python3 cli_demo.py -m ../chatglm3-ggml.bin --temp 0.8 --top_p 0.8 --sp system/code_interpreter.txt -i
```
**Web Demo**
Install Python dependencies and the IPython kernel for code interpreter.
```sh
pip install streamlit jupyter_client ipython ipykernel
ipython kernel install --name chatglm3-demo --user
```
Launch the web demo:
```sh
streamlit run chatglm3_demo.py
```
| Function Call | Code Interpreter |
|-----------------------------|--------------------------------|
| ![](docs/function_call.png) | ![](docs/code_interpreter.png) |
</details>
<details>
<summary>CodeGeeX2</summary>
```sh
# CLI demo
python3 cli_chat.py -m ../codegeex2-ggml.bin --temp 0 --mode generate -p "\
python3 cli_demo.py -m ../codegeex2-ggml.bin --temp 0 --mode generate -p "\
# language: Python
# write a bubble sort function
"
Expand All @@ -312,7 +401,7 @@ python3 web_demo.py -m ../codegeex2-ggml.bin --temp 0 --max_length 512 --mode ge
<summary>Baichuan-13B-Chat</summary>
```sh
python3 cli_chat.py -m ../baichuan-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # CLI demo
python3 cli_demo.py -m ../baichuan-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # CLI demo
python3 web_demo.py -m ../baichuan-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.1 # web demo
```
</details>
Expand All @@ -321,7 +410,7 @@ python3 web_demo.py -m ../baichuan-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --te
<summary>Baichuan2-7B-Chat</summary>
```sh
python3 cli_chat.py -m ../baichuan2-7b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
python3 cli_demo.py -m ../baichuan2-7b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
python3 web_demo.py -m ../baichuan2-7b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # web demo
```
</details>
Expand All @@ -330,7 +419,7 @@ python3 web_demo.py -m ../baichuan2-7b-chat-ggml.bin --top_k 5 --top_p 0.85 --te
<summary>Baichuan2-13B-Chat</summary>
```sh
python3 cli_chat.py -m ../baichuan2-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
python3 cli_demo.py -m ../baichuan2-13b-chat-ggml.bin -p 你好 --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # CLI demo
python3 web_demo.py -m ../baichuan2-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --temp 0.3 --repeat_penalty 1.05 # web demo
```
</details>
Expand All @@ -339,7 +428,7 @@ python3 web_demo.py -m ../baichuan2-13b-chat-ggml.bin --top_k 5 --top_p 0.85 --t
<summary>InternLM-Chat-7B</summary>
```sh
python3 cli_chat.py -m ../internlm-chat-7b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
python3 cli_demo.py -m ../internlm-chat-7b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
python3 web_demo.py -m ../internlm-chat-7b-ggml.bin --top_p 0.8 --temp 0.8 # web demo
```
</details>
Expand All @@ -348,12 +437,12 @@ python3 web_demo.py -m ../internlm-chat-7b-ggml.bin --top_p 0.8 --temp 0.8 # we
<summary>InternLM-Chat-20B</summary>
```sh
python3 cli_chat.py -m ../internlm-chat-20b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
python3 cli_demo.py -m ../internlm-chat-20b-ggml.bin -p 你好 --top_p 0.8 --temp 0.8 # CLI demo
python3 web_demo.py -m ../internlm-chat-20b-ggml.bin --top_p 0.8 --temp 0.8 # web demo
```
</details>
**Load and optimize Hugging Face LLMs in one line of code**
**Converting Hugging Face LLMs at Runtime**
Sometimes it might be inconvenient to convert and save the intermediate GGML models beforehand. Here is an option to directly load from the original Hugging Face model, quantize it into GGML models in a minute, and start serving. All you need is to replace the GGML model path with the Hugging Face model name or path.
```python
Expand All @@ -369,7 +458,7 @@ Processing model states: 100%|████████████████
Likewise, replace the GGML model path with Hugging Face model in any example script, and it just works. For example:
```sh
python3 cli_chat.py -m THUDM/chatglm-6b -p 你好 -i
python3 cli_demo.py -m THUDM/chatglm-6b -p 你好 -i
```
## API Server
Expand Down Expand Up @@ -443,7 +532,7 @@ docker build . --network=host -t chatglm.cpp
# cpp demo
docker run -it --rm -v $PWD:/opt chatglm.cpp ./build/bin/main -m /opt/chatglm-ggml.bin -p "你好"
# python demo
docker run -it --rm -v $PWD:/opt chatglm.cpp python3 examples/cli_chat.py -m /opt/chatglm-ggml.bin -p "你好"
docker run -it --rm -v $PWD:/opt chatglm.cpp python3 examples/cli_demo.py -m /opt/chatglm-ggml.bin -p "你好"
# langchain api server
docker run -it --rm -v $PWD:/opt -p 8000:8000 -e MODEL=/opt/chatglm-ggml.bin chatglm.cpp \
uvicorn chatglm_cpp.langchain_api:app --host 0.0.0.0 --port 8000
Expand Down
Loading

0 comments on commit b071907

Please sign in to comment.