From bd23c94d03f3f39289813cdbdd93c09b89486fcb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bilge=20Y=C3=BCcel?= Date: Tue, 5 Dec 2023 17:11:46 +0100 Subject: [PATCH] Add 2.0 updates (#75) * Add HF 2.0 updates * Add ToC for HF page * Add elasticsearch for 2.0 --- integrations/elasticsearch-document-store.md | 100 ++++++++++- integrations/huggingface.md | 174 ++++++++++++++++++- 2 files changed, 264 insertions(+), 10 deletions(-) diff --git a/integrations/elasticsearch-document-store.md b/integrations/elasticsearch-document-store.md index 769a1664..4d655f10 100644 --- a/integrations/elasticsearch-document-store.md +++ b/integrations/elasticsearch-document-store.md @@ -13,13 +13,107 @@ repo: https://github.com/deepset-ai/haystack type: Document Store report_issue: https://github.com/deepset-ai/haystack/issues logo: /logos/elastic.png +version: Haystack 2.0 +toc: true --- +### Table of Contents + +- [Haystack 2.0](#haystack-20) + - [Installation](#installation) + - [Usage](#usage) +- [Haystack 1.x](#haystack-1x) + - [Installation (1.x)](#installation-1x) + - [Usage (1.x)](#usage-1x) + +## Haystack 2.0 + +The `ElasticsearchDocumentStore` is maintained in [haystack-core-integrations](https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/elasticsearch) repo. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines. + +For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1) + +### Installation + +To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides. + +```bash +pip install elasticsearch-haystack +``` + +### Usage + +Once installed, you can start using your Elasticsearch database with Haystack by initializing it: + +```python +from elasticsearch_haystack.document_store import ElasticsearchDocumentStore + +document_store = ElasticsearchDocumentStore(host = "http://localhost:9200", embedding_dim = 768) +``` + +#### Writing Documents to ElasticsearchDocumentStore + +To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline with a [DocumentWriter](https://docs.haystack.deepset.ai/v2.0/docs/documentwriter), or use the `write_documents()` function. +For this step, you can use the available [TextFileToDocument](https://docs.haystack.deepset.ai/v2.0/docs/textfiletodocument) and [DocumentSplitter](https://docs.haystack.deepset.ai/v2.0/docs/documentsplitter), as well as other [Integrations](/integrations) that might help you fetch data from other resources. + +#### Indexing Pipeline + +```python +from elasticsearch_haystack.document_store import ElasticsearchDocumentStore +from haystack.pipeline import Pipeline +from haystack.components.embedders import SentenceTransformersDocumentEmbedder +from haystack.components.converters import TextFileToDocument +from haystack.components.preprocessors import DocumentSplitter +from haystack.components.writers import DocumentWriter + +document_store = ElasticsearchDocumentStore(host = "http://localhost:9200") +converter = TextFileToDocument() +splitter = DocumentSplitter() +doc_embedder = SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/multi-qa-mpnet-base-dot-v1") +writer = DocumentWriter(document_store) + +indexing_pipeline = Pipeline() +indexing_pipeline.add_component("converter", converter) +indexing_pipeline.add_component("splitter", splitter) +indexing_pipeline.add_component("doc_embedder", doc_embedder) +indexing_pipeline.add_component("writer", writer) + +indexing_pipeline.connect("converter", "splitter") +indexing_pipeline.connect("splitter", "doc_embedder") +indexing_pipeline.connect("doc_embedder", "writer") + +indexing_pipeline.run({ + "converter":{"sources":["filename.txt"]} + }) +``` + +### Using Elasticsearch in a Query Pipeline + +Once you have documents in your `ElasitsearchDocumentStore`, it's ready to be used with with [ElasticsearchEmbeddingRetriever](https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/elasticsearch/src/elasticsearch_haystack/embedding_retriever.py) in the retrieval step of any Haystack pipeline such as a Retrieval Augmented Generation (RAG) pipelines. Learn more about [Retrievers](https://docs.haystack.deepset.ai/v2.0/docs/retrievers) to make use of vector search within your LLM pipelines. + +```python +from elasticsearch_haystack.document_store import ElasticsearchDocumentStore +from haystack.pipeline import Pipeline +from haystack.components.embedders import SentenceTransformersTextEmbedder +from elasticsearch_haystack.embedding_retriever import ElasticsearchEmbeddingRetriever + +document_store = ElasticsearchDocumentStore(host = "http://localhost:9200") +retriever = ElasticsearchEmbeddingRetriever(document_store) +text_embedder = SentenceTransformersTextEmbedder(model_name_or_path="sentence-transformers/multi-qa-mpnet-base-dot-v1") + +query_pipeline = Pipeline() +query_pipeline.add_component("text_embedder", text_embedder) +query_pipeline.add_component("retriever", retriever) + +query_pipeline.run(query = "historical places in Istanbul") +``` + +## Haystack 1.x + The `ElasticsearchDocumentStore` is maintained within the core Haystack project. It allows you to use [Elasticsearch](https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro.html) as data storage for your Haystack pipelines. For a details on available methods, visit the [API Reference](https://docs.haystack.deepset.ai/reference/document-store-api#elasticsearchdocumentstore-1) -## Installation +### Installation (1.x) To run an Elasticsearch instance locally, first follow the [installation](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html) and [start up](https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html) guides. @@ -29,7 +123,7 @@ pip install farm-haystack[elasticsearch] To install Elasticsearch 7, you can run `pip install farm-haystac[elasticsearch7]`. -## Usage +### Usage (1.x) Once installed, you can start using your Elasticsearch database with Haystack by initializing it: @@ -41,7 +135,7 @@ document_store = ElasticsearchDocumentStore(host = "localhost", embedding_dim = 768) ``` -### Writing Documents to ElasticsearchDocumentStore +#### Writing Documents to ElasticsearchDocumentStore To write documents to your `ElasticsearchDocumentStore`, create an indexing pipeline, or use the `write_documents()` function. For this step, you may make use of the available [FileConverters](https://docs.haystack.deepset.ai/docs/file_converters) and [PreProcessors](https://docs.haystack.deepset.ai/docs/preprocessor), as well as other [Integrations](/integrations) that might help you fetch data from other resources. diff --git a/integrations/huggingface.md b/integrations/huggingface.md index 1b8e409a..8ef9be5b 100644 --- a/integrations/huggingface.md +++ b/integrations/huggingface.md @@ -13,21 +13,181 @@ repo: https://github.com/deepset-ai/haystack type: Model Provider report_issue: https://github.com/deepset-ai/haystack/issues logo: /logos/huggingface.png +version: Haystack 2.0 +toc: true --- -You can use models on [Hugging Face](https://huggingface.co/) in your Haystack pipelines with the [PromptNode](https://docs.haystack.deepset.ai/docs/prompt_node), [EmbeddingRetriever](https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended), [Ranker](https://docs.haystack.deepset.ai/docs/ranker), [Reader](https://docs.haystack.deepset.ai/docs/reader) and more! +### **Table of Contents** -## Installation +- [Haystack 2.0](#haystack-20) + - [Installation](#installation) + - [Usage](#usage) +- [Haystack 1.x](#haystack-1x) + - [Installation (1.x)](#installation-1x) + - [Usage (1.x)](#usage-1x) + +## Haystack 2.0 + +You can use models on [Hugging Face](https://huggingface.co/) in your Haystack 2.0 pipelines with [Generators](https://docs.haystack.deepset.ai/v2.0/docs/generators), [Embedders](https://docs.haystack.deepset.ai/v2.0/docs/embedders), [Rankers](https://docs.haystack.deepset.ai/v2.0/docs/rankers) and [Readers](https://docs.haystack.deepset.ai/v2.0/docs/readers)! + +### Installation + +```bash +pip install haystack-ai +``` + +### Usage + +You can use models on Hugging Face in various ways: + +#### Embedding Models + +You can leverage embedding models from Hugging Face through two components: [SentenceTransformersTextEmbedder](https://docs.haystack.deepset.ai/v2.0/docs/sentencetransformerstextembedder) and [SentenceTransformersDocumentEmbedder](https://docs.haystack.deepset.ai/v2.0/docs/sentencetransformersdocumentembedder). + +To create semantic embeddings for documents, use `SentenceTransformersDocumentEmbedder` in your indexing pipeline. For generating embeddings for queries, use `SentenceTransformersTextEmbedder`. Once you've selected the suitable component for your specific use case, initialize the component with the desired model name. + +Below is the example indexing pipeline with `InMemoryDocumentStore`, `DocumentWriter` and `SentenceTransformersDocumentEmbedder`: + +```python +from haystack import Document +from haystack import Pipeline +from haystack.document_stores import InMemoryDocumentStore +from haystack.components.embedders import SentenceTransformersDocumentEmbedder +from haystack.components.writers import DocumentWriter + +document_store = InMemoryDocumentStore(embedding_similarity_function="cosine") + +documents = [Document(content="My name is Wolfgang and I live in Berlin"), + Document(content="I saw a black horse running"), + Document(content="Germany has many big cities")] + +indexing_pipeline = Pipeline() +indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/all-MiniLM-L6-v2")) +indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store)) +indexing_pipeline.connect("embedder", "writer") +indexing_pipeline.run({ + "embedder":{"documents":documents} + }) +``` + +#### Generative Models (LLMs) + +You can leverage text generation models from Hugging Face through two components: [HuggingFaceLocalGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacelocalgenerator), [HuggingFaceTGIGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgigenerator) and [HuggingFaceTGIChatGenerator](https://docs.haystack.deepset.ai/v2.0/docs/huggingfacetgichatgenerator). + +Depending on the model type (chat or text completion) and hosting option (TGI, Inference Endpoint, locally hosted), select the suitable Hugging Face Generator component and initialize it with the model name + +Below is the example query pipeline that uses `mistralai/Mistral-7B-v0.1` hosted on Hugging Face Inference endpoints with `HuggingFaceTGIGenerator`: + +```python +from haystack import Pipeline +from haystack.components.retrievers import InMemoryBM25Retriever +from haystack.components.builders.prompt_builder import PromptBuilder +from haystack.components.generators import HuggingFaceTGIGenerator + +template = """ +Given the following information, answer the question. + +Context: +{% for document in documents %} + {{ document.text }} +{% endfor %} + +Question: What's the official language of {{ country }}? +""" +pipe = Pipeline() + +pipe.add_component("retriever", InMemoryBM25Retriever(document_store=docstore)) +pipe.add_component("prompt_builder", PromptBuilder(template=template)) +pipe.add_component("llm", HuggingFaceTGIGenerator(model="mistralai/Mistral-7B-v0.1", token="HF_TOKEN")) +pipe.connect("retriever", "prompt_builder.documents") +pipe.connect("prompt_builder", "llm") + +pipe.run({ + "prompt_builder": { + "country": "France" + } +}) +``` + +#### Ranker Models + +To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query. + +Below is the example of document retrieval pipeline with `InMemoryBM25Retriever` and `SentenceTransformersRanker`: + +```python +from haystack import Document, Pipeline +from haystack.document_stores import InMemoryDocumentStore +from haystack.components.retrievers import InMemoryBM25Retriever +from haystack.components.rankers import TransformersSimilarityRanker + +docs = [Document(content="Paris is in France"), + Document(content="Berlin is in Germany"), + Document(content="Lyon is in France")] +document_store = InMemoryDocumentStore() +document_store.write_documents(docs) + +retriever = InMemoryBM25Retriever(document_store = document_store) +ranker = TransformersSimilarityRanker(model_name_or_path="cross-encoder/ms-marco-MiniLM-L-6-v2") + +document_ranker_pipeline = Pipeline() +document_ranker_pipeline.add_component(instance=retriever, name="retriever") +document_ranker_pipeline.add_component(instance=ranker, name="ranker") +document_ranker_pipeline.connect("retriever.documents", "ranker.documents") + +query = "Cities in France" +document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, + "ranker": {"query": query, "top_k": 2}}) +``` + +#### Reader Models + +To use question answering models on Hugging Face, initialize a `ExtractiveReader` with the model name. You can then use this `ExtractiveReader` to extract answers from the relevant context. + +Below is the example of extractive question answering pipeline with `InMemoryBM25Retriever` and `ExtractiveReader`: + +```python +from haystack import Document, Pipeline +from haystack.document_stores import InMemoryDocumentStore +from haystack.components.retrievers import InMemoryBM25Retriever +from haystack.components.readers import ExtractiveReader + +docs = [Document(content="Paris is the capital of France."), + Document(content="Berlin is the capital of Germany."), + Document(content="Rome is the capital of Italy."), + Document(content="Madrid is the capital of Spain.")] +document_store = InMemoryDocumentStore() +document_store.write_documents(docs) + +retriever = InMemoryBM25Retriever(document_store = document_store) +reader = ExtractiveReader(model_name_or_path="deepset/roberta-base-squad2-distilled") + +extractive_qa_pipeline = Pipeline() +extractive_qa_pipeline.add_component(instance=retriever, name="retriever") +extractive_qa_pipeline.add_component(instance=reader, name="reader") + +extractive_qa_pipeline.connect("retriever.documents", "reader.documents") + +query = "What is the capital of France?" +extractive_qa_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, + "reader": {"query": query, "top_k": 2}}) +``` + +## Haystack 1.x + +You can use models on [Hugging Face](https://huggingface.co/) in your Haystack 1.x pipelines with the [PromptNode](https://docs.haystack.deepset.ai/docs/prompt_node), [EmbeddingRetriever](https://docs.haystack.deepset.ai/docs/retriever#embedding-retrieval-recommended), [Ranker](https://docs.haystack.deepset.ai/docs/ranker), [Reader](https://docs.haystack.deepset.ai/docs/reader) and more! + +### Installation (1.x) ```bash pip install farm-haystack ``` -## Usage +### Usage (1.x) You can use models on Hugging Face in various ways: -### Embedding Models +#### Embedding Models To use embedding models on Hugging Face, initialize an `EmbeddingRetriever` with the model name. You can then use this `EmbeddingRetriever` in an indexing pipeline to create semantic embeddings for documents and index them to a document store. @@ -52,7 +212,7 @@ indexing_pipeline.add_node(component=document_store, name="document_store", inpu indexing_pipeline.run(documents=[Document("This is my document")]) ``` -### Generative Models (LLMs) +#### Generative Models (LLMs) To use text generation models on Hugging Face, initialize a `PromptNode` with the model name and the prompt template. You can then use this `PromptNode` to generate questions from the given context. @@ -78,7 +238,7 @@ query_pipeline.run(query = "Berlin") > If you would like to use the [Inference API](https://huggingface.co/inference-api), you need pass your Hugging Face token to PromptNode. -### Ranker Models +#### Ranker Models To use cross encoder models on Hugging Face, initialize a `SentenceTransformersRanker` with the model name. You can then use this `SentenceTransformersRanker` to sort documents based on their relevancy to the query. @@ -97,7 +257,7 @@ document_retrieval_pipeline.add_node(component=ranker, name="Ranker", inputs=["R document_retrieval_pipeline.run("YOUR_QUERY") ``` -### Reader Models +#### Reader Models To use question answering models on Hugging Face, initialize a `FarmReader` with the model name. You can then use this `FarmReader` to extract answers from the relevant context.