Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for pipeline deserialization callbacks #7518

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

shadeMe
Copy link
Contributor

@shadeMe shadeMe commented Apr 9, 2024

Motivation:

The motivation behind this change is to allow for external code to introspective modification of a pipeline's underlying components without having to worry about the serialization details. A concrete usecase enabled by this feature is a way to override the init parameters of individual components at deserialization time, creating different variations of the same pipeline (to be leveraged by the upcoming evaluation harness scaffolding for pipelines).

Changes:

This PR modifies the following methods in the Pipeline class in a non-breaking manner:

  • load
  • loads
  • from_dict

All of the above methods now have an additional, optional parameter that accepts a DeserializationCallbacks object, which contains callbacks that get invoked during the various stages of the deserialization process. We use a wrapper/container here for forward-compatibility (should we choose to add more callbacks in the future).

Currently, we support just one callback, namely the one the is invoked during the component pre-init stage (before the component's __init__ method/constructor is called). The callback is allowed to inspect the init parameters and modify them inplace. The modified parameters are then passed to the constructor after the callback.

Since each component can have its own custom serialization logic, generic approaches to handle this at the pipeline-level will not suffice - Any modifications to the init parameters will require the overridden values to be in the same serialized format expected by the component's serialization logic. Since we do not have access to this logic, we need to place the hook after the init parameters have been deserialized, i.e., just before the invocation of the component's constructor. To that end, we implement an internal context manager in the ComponentMeta metaclass that tests for the presence of such a hook/callback and detours the constructor.

How did you test it?

Unit tests.

Checklist

@shadeMe shadeMe added topic:pipeline 2.x Related to Haystack v2.0 labels Apr 9, 2024
@shadeMe shadeMe requested review from masci and silvanocerza April 9, 2024 15:31
@shadeMe shadeMe marked this pull request as ready for review April 9, 2024 16:15
@shadeMe shadeMe requested review from a team as code owners April 9, 2024 16:15
@shadeMe shadeMe requested review from dfokina and vblagoje and removed request for a team and vblagoje April 9, 2024 16:15
@coveralls
Copy link
Collaborator

coveralls commented Apr 9, 2024

Pull Request Test Coverage Report for Build 8633397767

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 48 unchanged lines in 7 files lost coverage.
  • Overall coverage increased (+0.1%) to 89.914%

Files with Coverage Reduction New Missed Lines %
core/component/component.py 1 99.08%
components/evaluators/document_map.py 2 92.86%
components/evaluators/document_mrr.py 2 91.67%
core/serialization.py 2 95.83%
telemetry/_telemetry.py 4 84.15%
components/evaluators/sas_evaluator.py 12 64.29%
core/pipeline/pipeline.py 25 93.85%
Totals Coverage Status
Change from base Build 8617128985: 0.1%
Covered Lines: 6240
Relevant Lines: 6940

💛 - Coveralls

shadeMe and others added 2 commits April 10, 2024 16:50
@shadeMe shadeMe merged commit b1760ad into main Apr 10, 2024
23 checks passed
@shadeMe shadeMe deleted the feature/pipeline-deserialization-hook branch April 10, 2024 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:core topic:pipeline topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants