Skip to content

Commit

Permalink
Add 2.0 updates (#75)
Browse files Browse the repository at this point in the history
* Add HF 2.0 updates

* Add ToC for HF page

* Add elasticsearch for 2.0
  • Loading branch information
bilgeyucel authored Dec 5, 2023
1 parent 3107734 commit bd23c94
Show file tree
Hide file tree
Showing 2 changed files with 264 additions and 10 deletions.
100 changes: 97 additions & 3 deletions integrations/elasticsearch-document-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,107 @@ repo: https://github.com/deepset-ai/haystack
type: Document Store
report_issue: https://github.com/deepset-ai/haystack/issues
logo: /logos/elastic.png
version: Haystack 2.0
toc: true
---

### Table of Contents

- [Haystack 2.0](#haystack-20)
- [Installation](#installation)
- [Usage](#usage)
- [Haystack 1.x](#haystack-1x)
- [Installation (1.x)](#installation-1x)
- [Usage (1.x)](#usage-1x)

## Haystack 2.0

The `ElasticsearchDocumentStore` is maintained in [haystack-core-integrations](https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/elasticsearch) repo. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines.

For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1)

### Installation

To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides.

```bash
pip install elasticsearch-haystack
```

### Usage

Once installed, you can start using your Elasticsearch database with Haystack by initializing it:

```python
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(host = "http://localhost:9200", embedding_dim = 768)
```

#### Writing Documents to ElasticsearchDocumentStore

To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline with a [DocumentWriter](https://docs.haystack.deepset.ai/v2.0/docs/documentwriter), or use the `write_documents()` function.
For this step, you can use the available [TextFileToDocument](https://docs.haystack.deepset.ai/v2.0/docs/textfiletodocument) and [DocumentSplitter](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter), as well as other [Integrations](/integrations) that might help you fetch data from other resources.

#### Indexing Pipeline

```python
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from haystack.pipeline import Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.converters import TextFileToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter

document_store = ElasticsearchDocumentStore(host = "http://localhost:9200")
converter = TextFileToDocument()
splitter = DocumentSplitter()
doc_embedder = SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/multi-qa-mpnet-base-dot-v1")
writer = DocumentWriter(document_store)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", converter)
indexing_pipeline.add_component("splitter", splitter)
indexing_pipeline.add_component("doc_embedder", doc_embedder)
indexing_pipeline.add_component("writer", writer)

indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "doc_embedder")
indexing_pipeline.connect("doc_embedder", "writer")

indexing_pipeline.run({
"converter":{"sources":["filename.txt"]}
})
```

### Using Elasticsearch in a Query Pipeline

Once you have documents in your `ElasitsearchDocumentStore`, it's ready to be used with with [ElasticsearchEmbeddingRetriever](https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/elasticsearch/src/elasticsearch_haystack/embedding_retriever.py) in the retrieval step of any Haystack pipeline such as a Retrieval Augmented Generation (RAG) pipelines. Learn more about [Retrievers](https://docs.haystack.deepset.ai/v2.0/docs/retrievers) to make use of vector search within your LLM pipelines.

```python
from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from haystack.pipeline import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder
from elasticsearch_haystack.embedding_retriever import ElasticsearchEmbeddingRetriever

document_store = ElasticsearchDocumentStore(host = "http://localhost:9200")
retriever = ElasticsearchEmbeddingRetriever(document_store)
text_embedder = SentenceTransformersTextEmbedder(model_name_or_path="sentence-transformers/multi-qa-mpnet-base-dot-v1")

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", retriever)

query_pipeline.run(query = "historical places in Istanbul")
```

## Haystack 1.x

The `ElasticsearchDocumentStore` is maintained within the core Haystack project. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines.

For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1)

## Installation
### Installation (1.x)

To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides.

Expand All @@ -29,7 +123,7 @@ pip install farm-haystack[elasticsearch]

To install Elasticsearch 7, you can run `pip install farm-haystac[elasticsearch7]`.

## Usage
### Usage (1.x)

Once installed, you can start using your Elasticsearch database with Haystack by initializing it:

Expand All @@ -41,7 +135,7 @@ document_store = ElasticsearchDocumentStore(host = "localhost",
embedding_dim = 768)
```

### Writing Documents to ElasticsearchDocumentStore
#### Writing Documents to ElasticsearchDocumentStore

To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function.
For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources.
Expand Down
174 changes: 167 additions & 7 deletions integrations/huggingface.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,181 @@ repo: https://github.com/deepset-ai/haystack
type: Model Provider
report_issue: https://github.com/deepset-ai/haystack/issues
logo: /logos/huggingface.png
version: Haystack 2.0
toc: true
---

You can use models on [Hugging Face](https://huggingface.co/) in your Haystack pipelines with the [PromptNode](https://docs.haystack.deepset.ai/docs/prompt_node), [EmbeddingRetriever](https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended), [Ranker](https://docs.haystack.deepset.ai/docs/ranker), [Reader](https://docs.haystack.deepset.ai/docs/reader) and more!
### **Table of Contents**

## Installation
- [Haystack 2.0](#haystack-20)
- [Installation](#installation)
- [Usage](#usage)
- [Haystack 1.x](#haystack-1x)
- [Installation (1.x)](#installation-1x)
- [Usage (1.x)](#usage-1x)

## Haystack 2.0

You can use models on [Hugging Face](https://huggingface.co/) in your Haystack 2.0 pipelines with [Generators](https://docs.haystack.deepset.ai/v2.0/docs/generators), [Embedders](https://docs.haystack.deepset.ai/v2.0/docs/embedders), [Rankers](https://docs.haystack.deepset.ai/v2.0/docs/rankers) and [Readers](https://docs.haystack.deepset.ai/v2.0/docs/readers)!

### Installation

```bash
pip install haystack-ai
```

### Usage

You can use models on Hugging Face in various ways:

#### Embedding Models

You can leverage embedding models from Hugging Face through two components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/v2.0/docs/sentencetransformerstextembedder) and [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/v2.0/docs/sentencetransformersdocumentembedder).

To create semantic embeddings for documents, use `SentenceTransformersDocumentEmbedder` in your indexing pipeline. For generating embeddings for queries, use `SentenceTransformersTextEmbedder`. Once you've selected the suitable component for your specific use case, initialize the component with the desired model name.

Below is the example indexing pipeline with `InMemoryDocumentStore`, `DocumentWriter` and `SentenceTransformersDocumentEmbedder`:

```python
from haystack import Document
from haystack import Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [Document(content="My name is Wolfgang and I live in Berlin"),
Document(content="I saw a black horse running"),
Document(content="Germany has many big cities")]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/all-MiniLM-L6-v2"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({
"embedder":{"documents":documents}
})
```

#### Generative Models (LLMs)

You can leverage text generation models from Hugging Face through two components: [HuggingFaceLocalGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator), [HuggingFaceTGIGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator) and [HuggingFaceTGIChatGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgichatgenerator).

Depending on the model type (chat or text completion) and hosting option (TGI, Inference Endpoint, locally hosted), select the suitable Hugging Face Generator component and initialize it with the model name

Below is the example query pipeline that uses `mistralai/Mistral-7B-v0.1` hosted on Hugging Face Inference endpoints with `HuggingFaceTGIGenerator`:

```python
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import HuggingFaceTGIGenerator

template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.text }}
{% endfor %}
Question: What's the official language of {{ country }}?
"""
pipe = Pipeline()

pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1", token="HF_TOKEN"))
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

pipe.run({
"prompt_builder": {
"country": "France"
}
})
```

#### Ranker Models

To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query.

Below is the example of document retrieval pipeline with `InMemoryBM25Retriever` and `SentenceTransformersRanker`:

```python
from haystack import Document, Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker

docs = [Document(content="Paris is in France"),
Document(content="Berlin is in Germany"),
Document(content="Lyon is in France")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = TransformersSimilarityRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-6-v2")

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")
document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Cities in France"
document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"ranker": {"query": query, "top_k": 2}})
```

#### Reader Models

To use question answering models on Hugging Face, initialize a `ExtractiveReader` with the model name. You can then use this `ExtractiveReader` to extract answers from the relevant context.

Below is the example of extractive question answering pipeline with `InMemoryBM25Retriever` and `ExtractiveReader`:

```python
from haystack import Document, Pipeline
from haystack.document_stores import InMemoryDocumentStore
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.readers import ExtractiveReader

docs = [Document(content="Paris is the capital of France."),
Document(content="Berlin is the capital of Germany."),
Document(content="Rome is the capital of Italy."),
Document(content="Madrid is the capital of Spain.")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
reader = ExtractiveReader(model_name_or_path="deepset/roberta-base-squad2-distilled")

extractive_qa_pipeline = Pipeline()
extractive_qa_pipeline.add_component(instance=retriever, name="retriever")
extractive_qa_pipeline.add_component(instance=reader, name="reader")

extractive_qa_pipeline.connect("retriever.documents", "reader.documents")

query = "What is the capital of France?"
extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3},
"reader": {"query": query, "top_k": 2}})
```

## Haystack 1.x

You can use models on [Hugging Face](https://huggingface.co/) in your Haystack 1.x pipelines with the [PromptNode](https://docs.haystack.deepset.ai/docs/prompt_node), [EmbeddingRetriever](https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended), [Ranker](https://docs.haystack.deepset.ai/docs/ranker), [Reader](https://docs.haystack.deepset.ai/docs/reader) and more!

### Installation (1.x)

```bash
pip install farm-haystack
```

## Usage
### Usage (1.x)

You can use models on Hugging Face in various ways:

### Embedding Models
#### Embedding Models

To use embedding models on Hugging Face, initialize an `EmbeddingRetriever` with the model name. You can then use this `EmbeddingRetriever` in an indexing pipeline to create semantic embeddings for documents and index them to a document store.

Expand All @@ -52,7 +212,7 @@ indexing_pipeline.add_node(component=document_store, name="document_store", inpu
indexing_pipeline.run(documents=[Document("This is my document")])
```

### Generative Models (LLMs)
#### Generative Models (LLMs)

To use text generation models on Hugging Face, initialize a `PromptNode` with the model name and the prompt template. You can then use this `PromptNode` to generate questions from the given context.

Expand All @@ -78,7 +238,7 @@ query_pipeline.run(query = "Berlin")
> If you would like to use the [Inference API](https://huggingface.co/inference-api), you need pass your Hugging Face token to PromptNode.

### Ranker Models
#### Ranker Models

To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query.

Expand All @@ -97,7 +257,7 @@ document_retrieval_pipeline.add_node(component=ranker, name="Ranker", inputs=["R
document_retrieval_pipeline.run("YOUR_QUERY")
```

### Reader Models
#### Reader Models

To use question answering models on Hugging Face, initialize a `FarmReader` with the model name. You can then use this `FarmReader` to extract answers from the relevant context.

Expand Down

0 comments on commit bd23c94

Please sign in to comment.