Skip to content

Commit

Permalink
[Doc] Update state of mooncake transfer engine integration with vLLM.
Browse files Browse the repository at this point in the history
Signed-off-by: Shangming Cai <[email protected]>
  • Loading branch information
ShangmingCai committed Dec 16, 2024
1 parent c8c4ce0 commit 6e7ba8d
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This repository also hosts its technical report and the open sourced traces.

<h2 id="updates">🔄 Updates</h2>

- **Dec 16, 2024**: vLLM officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.
- **Nov 28, 2024**: We open sourced the Transfer Engine, the central component of Mooncake. We also provide two demonstrations of Transfer Engine: a P2P Store and vLLM integration.
- **July 9, 2024**: We open sourced the trace as a <a href="https://github.com/kvcache-ai/Mooncake/blob/main/mooncake_trace.jsonl" target="_blank">jsonl file</a>!.
- **June 27, 2024**: We present a series of Chinese blogs with more discussions on <a href="https://zhuanlan.zhihu.com/p/705754254">zhihu 1</a>, <a href="https://zhuanlan.zhihu.com/p/705910725">2</a>, <a href="https://zhuanlan.zhihu.com/p/706204757">3</a>, <a href="https://zhuanlan.zhihu.com/p/707997501">4</a>.
Expand Down
11 changes: 5 additions & 6 deletions doc/en/vllm-integration-v0.2-nightly.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,23 @@
# vLLM Disaggregated Prefill/Decode Demo

## Overview
This is the nightly version of mooncake-transfer-engine integration with the vLLM project based on [PR 10502](https://github.com/vllm-project/vllm/pull/10502) (vllm version: v0.6.4.post1/main) to accelerate KVCache transfer for inter-node disaggregated Prefill/Decode scenario. We have run some experiments to obtain some [preview benchmark results](vllm-benchmark-results-v0.2.md). More benchmark results will be released in due time.
This is the latest version of mooncake-transfer-engine integration doc with the vLLM project based on [PR 10502](https://github.com/vllm-project/vllm/pull/10502) and [PR 10884](https://github.com/vllm-project/vllm/pull/10884) (vllm version: v0.6.4.post1/main) to accelerate KVCache transfer for inter-node disaggregated Prefill/Decode scenario. We have run some experiments to obtain some [preview benchmark results](vllm-benchmark-results-v0.2.md). More benchmark results will be released in due time.

**_Please note that this is not a fully ready version and will be modified anytime based on feedback from the vLLM community._**
**_Please note that this is still an experimental version and will be modified anytime based on feedback from the vLLM community._**

## Installation
### Prerequisite
Please install the Mooncake Transfer Engine according to the [instructions](build.md) first.

### Install an experimental version of vLLM
#### 1. Clone vLLM from an indicated repo
### Install the latest version of vLLM
#### 1. Clone vLLM from official repo
```bash
git clone [email protected]:kvcache-ai/vllm.git
git clone [email protected]:vllm-project/vllm.git
```
#### 2. Build
##### 2.1 Build from source (Include C++ and CUDA code)
```bash
cd vllm
git checkout upstream-mooncake-integration
pip3 uninstall vllm -y
pip3 install -e .
```
Expand Down

0 comments on commit 6e7ba8d

Please sign in to comment.