From 6e7ba8df9365d4d9b2b366a2b1d7ff0f844e6bbb Mon Sep 17 00:00:00 2001 From: Shangming Cai Date: Mon, 16 Dec 2024 15:54:05 +0800 Subject: [PATCH] [Doc] Update state of mooncake transfer engine integration with vLLM. Signed-off-by: Shangming Cai --- README.md | 1 + doc/en/vllm-integration-v0.2-nightly.md | 11 +++++------ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index bdd0ff6..a6e8c4c 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ This repository also hosts its technical report and the open sourced traces.

🔄 Updates

+ - **Dec 16, 2024**: vLLM officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer. - **Nov 28, 2024**: We open sourced the Transfer Engine, the central component of Mooncake. We also provide two demonstrations of Transfer Engine: a P2P Store and vLLM integration. - **July 9, 2024**: We open sourced the trace as a jsonl file!. - **June 27, 2024**: We present a series of Chinese blogs with more discussions on zhihu 1, 2, 3, 4. diff --git a/doc/en/vllm-integration-v0.2-nightly.md b/doc/en/vllm-integration-v0.2-nightly.md index f9b982b..4a6717c 100644 --- a/doc/en/vllm-integration-v0.2-nightly.md +++ b/doc/en/vllm-integration-v0.2-nightly.md @@ -1,24 +1,23 @@ # vLLM Disaggregated Prefill/Decode Demo ## Overview -This is the nightly version of mooncake-transfer-engine integration with the vLLM project based on [PR 10502](https://github.com/vllm-project/vllm/pull/10502) (vllm version: v0.6.4.post1/main) to accelerate KVCache transfer for inter-node disaggregated Prefill/Decode scenario. We have run some experiments to obtain some [preview benchmark results](vllm-benchmark-results-v0.2.md). More benchmark results will be released in due time. +This is the latest version of mooncake-transfer-engine integration doc with the vLLM project based on [PR 10502](https://github.com/vllm-project/vllm/pull/10502) and [PR 10884](https://github.com/vllm-project/vllm/pull/10884) (vllm version: v0.6.4.post1/main) to accelerate KVCache transfer for inter-node disaggregated Prefill/Decode scenario. We have run some experiments to obtain some [preview benchmark results](vllm-benchmark-results-v0.2.md). More benchmark results will be released in due time. -**_Please note that this is not a fully ready version and will be modified anytime based on feedback from the vLLM community._** +**_Please note that this is still an experimental version and will be modified anytime based on feedback from the vLLM community._** ## Installation ### Prerequisite Please install the Mooncake Transfer Engine according to the [instructions](build.md) first. -### Install an experimental version of vLLM -#### 1. Clone vLLM from an indicated repo +### Install the latest version of vLLM +#### 1. Clone vLLM from official repo ```bash -git clone git@github.com:kvcache-ai/vllm.git +git clone git@github.com:vllm-project/vllm.git ``` #### 2. Build ##### 2.1 Build from source (Include C++ and CUDA code) ```bash cd vllm -git checkout upstream-mooncake-integration pip3 uninstall vllm -y pip3 install -e . ```