Skip to content

Commit

Permalink
update milvus readme with milvus lite (#232)
Browse files Browse the repository at this point in the history
* update milvus readme with milvus lite

Signed-off-by: ChengZi <[email protected]>

* Update milvus-document-store.md

---------

Signed-off-by: ChengZi <[email protected]>
Co-authored-by: Stefano Fiorucci <[email protected]>
  • Loading branch information
zc277584121 and anakin87 authored Sep 12, 2024
1 parent e706a74 commit 7f96c0b
Showing 1 changed file with 62 additions and 28 deletions.
90 changes: 62 additions & 28 deletions integrations/milvus-document-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ toc: true
- [Haystack 2.0](#haystack-20)
- [Installation](#installation)
- [Usage](#usage)
- [Dive deep usage](#dive-deep-usage)
- [Haystack 1.x](#haystack-1x)
- [Installation (1.x)](#installation-1x)
- [Usage (1.x)](#usage-1x)
Expand All @@ -33,60 +34,94 @@ toc: true
---

### Installation
```console
pip install -U milvus-haystack

```shell
pip install --upgrade pymilvus milvus-haystack
```

*If you are using Google Colab, you may need to restart the runtime to enable dependencies just installed.*

### Usage

First, to start up a Milvus service, follow the ['Start Milvus'](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus) instructions in the documentation.
Use the `MilvusDocumentStore` in a Haystack pipeline as a quick start.

Then, here are the ways to build index, retrieval, and build rag pipeline respectively.
```python
from haystack import Document
from milvus_haystack import MilvusDocumentStore

```py
# Create the indexing Pipeline and index some documents
import glob
document_store = MilvusDocumentStore(
connection_args={"uri": "./milvus.db"}, # Milvus Lite
# connection_args={"uri": "http://localhost:19530"}, # Milvus standalone docker service.
drop_old=True,
)
documents = [Document(
content="A Foo Document",
meta={"page": "100", "chapter": "intro"},
embedding=[-10.0] * 128,
)]
document_store.write_documents(documents)
print(document_store.count_documents()) # 1
```
In the `connection_args`, setting the URI as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.

If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). When using this setup, please use the server URI, e.g.`http://localhost:19530`, as your URI.

### Dive deep usage

Prepare an OpenAI API key and set it as an environment variable:

```shell
export OPENAI_API_KEY=<your_api_key>
```

Here are the ways to

- Create the indexing Pipeline
- Create the retrieval pipeline
- Create the RAG pipeline

#### Create the indexing Pipeline and index some documents

```python
import os

from haystack import Pipeline
from haystack.components.converters import MarkdownToDocument
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter

from milvus_haystack import MilvusDocumentStore
from milvus_haystack.milvus_embedding_retriever import MilvusEmbeddingRetriever

file_paths = glob.glob("./milvus-document-store.md")
current_file_path = os.path.abspath(__file__)
file_paths = [current_file_path] # You can replace it with your own file paths.

document_store = MilvusDocumentStore(
connection_args={
"host": "localhost",
"port": "19530",
"user": "",
"password": "",
"secure": False,
},
connection_args={"uri": "./milvus.db"}, # Milvus Lite
# connection_args={"uri": "http://localhost:19530"}, # Milvus standalone docker service.
drop_old=True,
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", MarkdownToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store))
indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"converter": {"sources": file_paths}})

print("Number of documents:", document_store.count_documents())
```

# ------------------------------------------------------------------------------------
# Create the retrieval pipeline and try a query
question = "What is Milvus?"
#### Create the retrieval pipeline and try a query

```python
question = "How to set the service uri with milvus lite?" # You can replace it with your own question.

retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component("embedder", SentenceTransformersTextEmbedder())
retrieval_pipeline.add_component("embedder", OpenAITextEmbedder())
retrieval_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3))
retrieval_pipeline.connect("embedder", "retriever")

Expand All @@ -95,11 +130,12 @@ retrieval_results = retrieval_pipeline.run({"embedder": {"text": question}})
for doc in retrieval_results["retriever"]["documents"]:
print(doc.content)
print("-" * 10)
```

# ------------------------------------------------------------------------------------
# Create the RAG pipeline and try a query
#### Create the RAG pipeline and try a query

```python
from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

Expand All @@ -114,11 +150,10 @@ prompt_template = """Answer the following query based on the provided context. I
"""

rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
rag_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
rag_pipeline.add_component("generator", OpenAIGenerator(api_key=Secret.from_token(os.getenv("OPENAI_API_KEY")),
generation_kwargs={"temperature": 0}))
rag_pipeline.add_component("generator", OpenAIGenerator(generation_kwargs={"temperature": 0}))
rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
rag_pipeline.connect("retriever.documents", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "generator")
Expand All @@ -134,7 +169,6 @@ print('RAG answer:', results["generator"]["replies"][0])
```



## Haystack 1.x


Expand Down

0 comments on commit 7f96c0b

Please sign in to comment.