Skip to content

Commit

Permalink
adding releases (#346)
Browse files Browse the repository at this point in the history
  • Loading branch information
TuanaCelik authored Jul 15, 2024
1 parent 436bd47 commit 959c9aa
Show file tree
Hide file tree
Showing 5 changed files with 169 additions and 0 deletions.
18 changes: 18 additions & 0 deletions content/release-notes/2.2.1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: Haystack 2.2.1
description: Release notes for Haystack 2.2.1
toc: True
date: 2024-06-06
last_updated: 2024-06-06
tags: ["Release Notes"]
link: https://github.com/deepset-ai/haystack/releases/tag/v2.2.1
---

### ⬆️ Upgrade Notes

- `trafilatura` must now be manually installed with `pip install trafilatura` to use the `HTMLToDocument` Component.

### ⚡️ Enhancement Notes

- Remove `trafilatura` as direct dependency and make it a lazily imported one

13 changes: 13 additions & 0 deletions content/release-notes/2.2.2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Haystack 2.2.2
description: Release notes for Haystack 2.2.2
toc: True
date: 2024-06-07
last_updated: 2024-06-07
tags: ["Release Notes"]
link: https://github.com/deepset-ai/haystack/releases/tag/v2.2.2
---

### 🐛 Bug Fixes

- Add missing `metrics` column in `DataFrame` returned by `EvaluationRunResult.score_report()`
17 changes: 17 additions & 0 deletions content/release-notes/2.2.3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: Haystack 2.2.3
description: Release notes for Haystack 2.2.3
toc: True
date: 2024-06-17
last_updated: 2024-06-17
tags: ["Release Notes"]
link: https://github.com/deepset-ai/haystack/releases/tag/v2.2.3
---

### 🐛 Bug Fixes

- Pin numpy\<2 to avoid breaking changes that cause several core integrations to fail. Pin tenacity too (8.4.0 is broken).

### ⚡️ Enhancement Notes

- Export `ChatPromptBuilder` in `builders` module
19 changes: 19 additions & 0 deletions content/release-notes/2.2.4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Haystack 2.2.4
description: Release notes for Haystack 2.2.4
toc: True
date: 2024-07-04
last_updated: 2024-07-04
tags: ["Release Notes"]
link: https://github.com/deepset-ai/haystack/releases/tag/v2.2.4
---

### ⚡️ Enhancement Notes

- Added the apply_filter_policy function to standardize the application of filter policies across all document store-specific retrievers, allowing for consistent handling of initial and runtime filters based on the chosen policy (replace or merge).
- Introduced a 'filter_policy' init parameter for both InMemoryBM25Retriever and InMemoryEmbeddingRetriever, allowing users to define how runtime filters should be applied with options to either 'replace' the initial filters or 'merge' them, providing greater flexibility in filtering query results.

### 🐛 Bug Fixes

- Meta handling of bytestreams in Azure OCR has been fixed.
- Fix some bugs running a Pipeline that has Components with conditional outputs. Some branches that were expected not to run would run anyway, even if they received no inputs. Some branches instead would cause the Pipeline to get stuck waiting to run that branch, even if they received no inputs. The behaviour would depend whether the Component not receiving the input has a optional input or not.
102 changes: 102 additions & 0 deletions content/release-notes/2.3.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
title: Haystack 2.3.0
description: Release notes for Haystack 2.3.0
toc: True
date: 2024-07-15
last_updated: 2024-07-15
tags: ["Release Notes"]
link: https://github.com/deepset-ai/haystack/releases/tag/v2.3.0
---

#### 🧑‍🔬 Haystack Experimental Package

Alongside this release, we're introducing a new repository and package: `haystack-experimental`.
This package will be installed alongside `haystack-ai` and will give you access to experimental components. As the name suggests, these components will be highly exploratory, and may or may not make their way into the main haystack package.

- Each experimental component in the haystack-experimental repo will have a life-span of 3 months
- The end of the 3 months marks the end of the experiment. In which case the component will either move to the core haystack package, or be discontinued

To learn more about the experimental package, check out the Experimental Package docs[LINK] and the API references[LINK]
To use components in the experimental package, simply `from haystack_experimental.component_type import Component`
What's in there already?
- The `OpenAIFunctionCaller`: Use this component after Chat Generators to call the functions that the LLM returns with
- The `OpenAPITool`: The OpenAPITool is a component designed to interact with RESTful endpoints of OpenAPI services. Its primary function is to generate and send appropriate payloads to these endpoints based on human-provided instructions. OpenAPITool bridges the gap between natural language inputs and structured API calls, making it easier for users to interact with complex APIs and thus integrating the structured world of OpenAPI-specified services with the LLMs apps.
- The `EvaluationHarness` - A tool that can wrap pipelines to be evaluated as well as complex evaluation tasks into one simple runnable component

For more information, visit <https://github.com/deepset-ai/haystack-experimental> or the haystack_experimental reference API at <https://docs.haystack.deepset.ai/v2.3/reference/> (bottom left pane)

#### 📝 New Converter
- New [`DocxToDocument`](https://docs.haystack.deepset.ai/docs/docxtodocument) component to convert Docx files to Documents.

### ⬆️ Upgrade Notes

- <span class="title-ref">trafilatura</span> must now be manually installed with <span class="title-ref">pip install trafilatura</span> to use the <span class="title-ref">HTMLToDocument</span> Component.

- The deprecated <span class="title-ref">converter_name</span> parameter has been removed from <span class="title-ref">PyPDFToDocument</span>.

To specify a custom converter for <span class="title-ref">PyPDFToDocument</span>, use the <span class="title-ref">converter</span> initialization parameter and pass an instance of a class that implements the <span class="title-ref">PyPDFConverter</span> protocol.

The <span class="title-ref">PyPDFConverter</span> protocol defines the methods <span class="title-ref">convert</span>, <span class="title-ref">to_dict</span> and <span class="title-ref">from_dict</span>. A default implementation of <span class="title-ref">PyPDFConverter</span> is provided in the <span class="title-ref">DefaultConverter</span> class.

- Deprecated <span class="title-ref">HuggingFaceTEITextEmbedder</span> and <span class="title-ref">HuggingFaceTEIDocumentEmbedder</span> have been removed. Use <span class="title-ref">HuggingFaceAPITextEmbedder</span> and <span class="title-ref">HuggingFaceAPIDocumentEmbedder</span> instead.

- Deprecated <span class="title-ref">HuggingFaceTGIGenerator</span> and <span class="title-ref">HuggingFaceTGIChatGenerator</span> have been removed. Use <span class="title-ref">HuggingFaceAPIGenerator</span> and <span class="title-ref">HuggingFaceAPIChatGenerator</span> instead.

### 🚀 New Features

- Adding a new `SentenceWindowRetrieval` component allowing to perform sentence-window retrieval, i.e. retrieves surrounding documents of a given document from the document store. This is useful when a document is split into multiple chunks and you want to retrieve the surrounding context of a given chunk.
- Added custom filters support to ConditionalRouter. Users can now pass in one or more custom Jinja2 filter callables and be able to access those filters when defining condition expressions in routes.
- Added a new mode in JoinDocuments, Distribution-based rank fusion as \[the article\](<https://medium.com/plain-simple-software/distribution-based-score-fusion-dbsf-a-new-approach-to-vector-search-ranking-f87c37488b18>)
- Adding the <span class="title-ref">DocxToDocument</span> component inside the <span class="title-ref">converters</span> category. It uses the <span class="title-ref">python-docx</span> library to convert Docx files to haystack Documents.
- Add a PPTX to Document converter using the python-pptx library. Extracts all text from each slide. Each slide is separated with a page break "f" so a Document Splitter could split by slide.
- The <span class="title-ref">DocumentSplitter</span> now has support for the <span class="title-ref">split_id</span> and <span class="title-ref">split_overlap</span> to allow for more control over the splitting process.
- Introduces the TransformersTextRouter! This component uses a transformers text classification pipeline to route text inputs onto different output connections based on the labels of the chosen text classification model.
- Add memory sharing between different instances of InMemoryDocumentStore. Setting the same <span class="title-ref">index</span> argument as another instance will make sure that the memory is shared. e.g. `` `python index = "my_personal_index" document_store_1 = InMemoryDocumentStore(index=index) document_store_2 = InMemoryDocumentStore(index=index) assert document_store_1.count_documents() == 0 assert document_store_2.count_documents() == 0 document_store_1.write_documents([Document(content="Hello world")]) assert document_store_1.count_documents() == 1 assert document_store_2.count_documents() == 1 ``\`
- Add a new <span class="title-ref">missing_meta</span> param to <span class="title-ref">MetaFieldRanker</span>, which determines what to do with documents that lack the ranked meta field. Supported values are <span class="title-ref">"bottom"</span> (which puts documents with missing meta at the bottom of the sorted list), <span class="title-ref">"top"</span> (which puts them at the top), and <span class="title-ref">"drop"</span> (which removes them from the results entirely).

### ⚡️ Enhancement Notes

- Added the apply_filter_policy function to standardize the application of filter policies across all document store-specific retrievers, allowing for consistent handling of initial and runtime filters based on the chosen policy (replace or merge).
- Added a new parameter to <span class="title-ref">EvaluationRunResult.comparative_individual_scores_report()</span> to specify columns to keep in the comparative DataFrame.
- Added the 'remove_component' method in 'PipelineBase' to delete components and its connections.
- Added serialization methods save_to_disk and write_to_disk to InMemoryDocumentStore.
- When using "openai" for the LLM-based evaluators the metadata from OpenAI will be in the output dictionary, under the key "meta".
- Remove <span class="title-ref">trafilatura</span> as direct dependency and make it a lazily imported one
- Renamed component from DocxToDocument to DOCXToDocument to follow the naming convention of other converter components.
- Made JSON schema validator compatible with all LLM by switching error template handling to a single user message. Also reduce cost by only including last error instead of full message history.
- Enhanced flexibility in HuggingFace API environment variable names across all related components to support both 'HF_API_TOKEN' and 'HF_TOKEN', improving compatibility with the widely used HF environmental variable naming conventions.
- Updated the ContextRelevance evaluator prompt, explicitly asking to score each statement.
- Improve LinkContentFetcher to support a broader range of content types, including glob patterns for text, application, audio, and video types. This update introduces a more flexible content handler resolution mechanism, allowing for direct matches and pattern matching, thereby greatly improving the handler's adaptability to various content types encountered on the web.
- Add max_retries to AzureOpenAIGenerator. AzureOpenAIGenerator can now be initialised by setting max_retries. If not set, it is inferred from the <span class="title-ref">OPENAI_MAX_RETRIES</span> environment variable or set to 5. The timeout for AzureOpenAIGenerator, if not set, it is inferred from the <span class="title-ref">OPENAI_TIMEOUT</span> environment variable or set to 30.
- Introduced a 'filter_policy' init parameter for both InMemoryBM25Retriever and InMemoryEmbeddingRetriever, allowing users to define how runtime filters should be applied with options to either 'replace' the initial filters or 'merge' them, providing greater flexibility in filtering query results.
- Pipeline serialization to YAML now supports tuples as field values.
- Add support for \[structlog context variables\](<https://www.structlog.org/en/24.2.0/contextvars.html>) to structured logging.
- AnswerBuilder can now accept ChatMessages as input in addition to strings. When using ChatMessages, metadata will be automatically added to the answer.
- Update the error message when the <span class="title-ref">sentence-transformers</span> library is not installed and the used component requires it.
- Add `max_retries` and `timeout` parameters to the AzureOpenAIChatGenerator initializations.
- Add `max_retries` and `timeout` parameters to the AzureOpenAITextEmbedder initializations.
- Add `max_retries`, `timeout` parameters to the `AzureOpenAIDocumentEmbedder` initialization.
- Improved error messages for deserialization errors.


### ⚠️ Deprecation Notes

- Haystack 1.x legacy filters are deprecated and will be removed in a future release. Please use the new filter style as described in the documentation - <https://docs.haystack.deepset.ai/docs/metadata-filtering>
- The output of the ContextRelevanceEvaluator will change in Haystack 2.4.0. Contexts will be scored as a whole instead of individual statements and only the relevant sentences will be returned. A score of 1 is now returned if a relevant sentence is found, and 0 otherwise.

### 🐛 Bug Fixes
- Encoding of HTML files in LinkContentFetcher
- This updates the components, TransformersSimilarityRanker, SentenceTransformersDiversityRanker, SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder and LocalWhisperTranscriber from_dict methods to work when loading with init_parameters only containing required parameters.
- Fix <span class="title-ref">TransformersZeroShotTextRouter</span> and <span class="title-ref">TransformersTextRouter</span> <span class="title-ref">from_dict</span> methods to work when <span class="title-ref">init_parameters</span> only contain required variables.
- <span class="title-ref">SASEvaluator</span> now raises a <span class="title-ref">ValueError</span> if a <span class="title-ref">None</span> value is contained in the <span class="title-ref">predicted_answers</span> input.
- Auto enable tracing upon import if <span class="title-ref">ddtrace</span> or <span class="title-ref">opentelemetry</span> is installed.
- Meta handling of bytestreams in Azure OCR has been fixed.
- Use new filter syntax in the CacheChecker component instead of legacy one.
- Solve serialization bug on 'ChatPromptBuilder' by creating 'to_dict' and 'from_dict' methods on 'ChatMessage' and 'ChatPromptBuilder'.
- Fix some bugs running a Pipeline that has Components with conditional outputs. Some branches that were expected not to run would run anyway, even if they received no inputs. Some branches instead would cause the Pipeline to get stuck waiting to run that branch, even if they received no inputs. The behaviour would depend whether the Component not receiving the input has a optional input or not.
- Fixed the calculation for MRR and MAP scores.
- Fix the deserialization of pipelines containing evaluator components that were subclasses of <span class="title-ref">LLMEvaluator</span>.
- Fix recursive JSON type conversion in the schema validator to be less aggressive (no infinite recursion).
- Adds the missing 'organization' parameter to the serialization function.
- Correctly serialize tuples and types in the init parameters of the <span class="title-ref">LLMEvaluator</span> component and its subclasses.
- Pin numpy\<2 to avoid breaking changes that cause several core integrations to fail. Pin tenacity too (8.4.0 is broken).

0 comments on commit 959c9aa

Please sign in to comment.