From d7c2d5f0e9f396c7fcfba4979a567462aa185007 Mon Sep 17 00:00:00 2001
From: anakin87
Date: Fri, 16 Feb 2024 13:13:43 +0100
Subject: [PATCH 1/2] update vLLM integration page
---
integrations/vllm.md | 64 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 6 deletions(-)
diff --git a/integrations/vllm.md b/integrations/vllm.md
index 8638864c..9bb3fa6e 100644
--- a/integrations/vllm.md
+++ b/integrations/vllm.md
@@ -1,7 +1,7 @@
---
layout: integration
name: vLLM Invocation Layer
-description: Use a vLLM server or locally hosted instance in your Prompt Node
+description: Use the vLLM inference engine with Haystack
authors:
- name: Lukas Kreussel
socials:
@@ -25,15 +25,67 @@ Simply use [vLLM](https://github.com/vllm-project/vllm) in your haystack pipelin
-## Installation
+### Table of Contents
+
+- [Overview](#overview)
+- [Haystack 2.0](#haystack-20)
+ - [Installation](#installation)
+ - [Usage](#usage)
+- [Haystack 1.x](#haystack-1x)
+ - [Installation (1.x)](#installation-1x)
+ - [Usage (1.x)](#usage-1x)
+
+## Overview
+
+[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs.
+It is an open-source project that allows serving open models in production, when you have GPU resources available.
+
+For Haystack 1.x, the integration is available as a separate package, while for Haystack 2.x, the integration comes out of the box.
+
+## Haystack 2.x
+
+vLLM can be deployed as a server that implements the OpenAI API protocol.
+This allows vLLM to be used with the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack.
+
+For an end-to-end example of [vLLM + Haystack 2.x, see this notebook](https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/vllm_inference_engine.ipynb).
+
+
+### Installation
+vLLM should be installed.
+- you can use `pip`: `pip install vllm` (more information in the [vLLM documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html))
+- for production use cases, there are many other options, including Docker ([docs](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html))
+
+### Usage
+You first need to run an vLLM OpenAI-compatible server. You can do that using [Python](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server) or [Docker](https://docs.vllm.ai/en/latest/serving/deploying_with_docker.html).
+
+Then, you can use the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack to query the vLLM server.
+
+```python
+from haystack.components.generators.chat import OpenAIChatGenerator
+from haystack.dataclasses import ChatMessage
+from haystack.utils import Secret
+
+generator = OpenAIChatGenerator(
+ api_key=Secret.from_token("VLLM-PLACEHOLDER-API-KEY"), # for compatibility with the OpenAI API, a placeholder api_key is needed
+ model="mistralai/Mistral-7B-Instruct-v0.1",
+ api_base_url="http://localhost:8000/v1",
+ generation_kwargs = {"max_tokens": 512}
+)
+
+response = generator.run(messages=[ChatMessage.from_user("Hi. Can you help me plan my next trip to Italy?")])
+```
+
+## Haystack 1.x
+
+### Installation (1.x)
Install the wrapper via pip: `pip install vllm-haystack`
-## Usage
+### Usage (1.x)
This integration provides two invocation layers:
- `vLLMInvocationLayer`: To use models hosted on a vLLM server
- `vLLMLocalInvocationLayer`: To use locally hosted vLLM models
-### Use a Model Hosted on a vLLM Server
+#### Use a Model Hosted on a vLLM Server
To utilize the wrapper the `vLLMInvocationLayer` has to be used.
Here is a simple example of how a `PromptNode` can be created with the wrapper.
@@ -52,12 +104,12 @@ prompt_node = PromptNode(model_name_or_path=model, top_k=1, max_length=256)
The model will be inferred based on the model served on the vLLM server.
For more configuration examples, take a look at the unit-tests.
-#### Hosting a vLLM Server
+##### Hosting a vLLM Server
To create an *OpenAI-Compatible Server* via vLLM you can follow the steps in the
Quickstart section of their [documentation](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html#openai-compatible-server).
-### Use a Model Hosted Locally
+#### Use a Model Hosted Locally
⚠️To run `vLLM` locally you need to have `vllm` installed and a supported GPU.
If you don't want to use an API-Server this wrapper also provides a `vLLMLocalInvocationLayer` which executes the vLLM on the same node Haystack is running on.
From 8ae6b80e6785deb6bc425e66239feb97b18f614f Mon Sep 17 00:00:00 2001
From: anakin87
Date: Fri, 16 Feb 2024 15:24:32 +0100
Subject: [PATCH 2/2] address feedback
---
integrations/vllm.md | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/integrations/vllm.md b/integrations/vllm.md
index 9bb3fa6e..1dbf953b 100644
--- a/integrations/vllm.md
+++ b/integrations/vllm.md
@@ -11,6 +11,7 @@ repo: https://github.com/LLukas22/vLLM-haystack-adapter
type: Model Provider
report_issue: https://github.com/LLukas22/vLLM-haystack-adapter/issues
logo: /logos/vllm.png
+version: Haystack 2.0
toc: true
---
[![PyPI - Version](https://img.shields.io/pypi/v/vllm-haystack.svg)](https://pypi.org/project/vllm-haystack)
@@ -45,7 +46,7 @@ For Haystack 1.x, the integration is available as a separate package, while for
## Haystack 2.x
vLLM can be deployed as a server that implements the OpenAI API protocol.
-This allows vLLM to be used with the `OpenAIGenerator` and `OpenAIChatGenerator` components in Haystack.
+This allows vLLM to be used with the [`OpenAIGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/openaigenerator) and [`OpenAIChatGenerator`](https://docs.haystack.deepset.ai/v2.0/docs/openaichatgenerator) components in Haystack.
For an end-to-end example of [vLLM + Haystack 2.x, see this notebook](https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/vllm_inference_engine.ipynb).