Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hayhook with custom haystack components. #36

Open
satyakisen opened this issue Aug 23, 2024 · 7 comments
Open

Hayhook with custom haystack components. #36

satyakisen opened this issue Aug 23, 2024 · 7 comments
Labels

Comments

@satyakisen
Copy link

Hi Team,

I am trying to call rest api to run a pipeline with multiple custom component. I could not find any example for the same in the hayhook repository. It will be helpful if some examples are provided for the above use case.

Thanks in advance.

@vblagoje
Copy link
Member

vblagoje commented Sep 4, 2024

How far have you gotten and where exactly did you face issues @satyakisen ?

@satyakisen
Copy link
Author

satyakisen commented Sep 9, 2024

@vblagoje I am able to use basic custom haystack components, by installing my component through poetry. But in my pipeline when there is some kind of python object like, haystack Token secrets or chat message, the pipeline dump is generating a tag like !!python/object:haystack.dataclasses.chat_message.ChatMessage.

While deploying this yaml file in the hayhook, its throwing yaml parsing error. Can you please help me with the above error.

I could see this is because of the yaml safe_load function which is restricting the deserialization of haystack python objects. Is there any way I could use a custom marshaller while deserializing the yaml file?

Elaborating the above issue for better clarification.

Overview

We can dump a haystack pipeline to a yaml file and later load the same
yaml file and run the respective pipeline as per Haystack Documentation.

In this experiment we are using the Haystack out of the box component (ChatPromptBuilder, ChatMessage).

Running the experiment we find that though we are able to serialize the pipeline,
while deserializing it is throwing some error.

Reproduction

One can reproduce the error by copying the Pipeline Codebase onto pipeline.py file
and then running the below command:

# Ran on windows git bash
python.exe pipeline.py > ./error_msg.txt 2>&1

Codebase

Pipeline Codebase

from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage


def create_prompt_builder():
    template: str = """
            Query: {{query}}

            Instruction:
                {{instruction}}
                               

            Context:
            {% for document in documents: %}
                    {{document}}
            {% endfor %}
        """
    return ChatPromptBuilder(template=[ChatMessage.from_user(template)])


def dump() -> None:
    pipeline = Pipeline()
    prompt_builder = create_prompt_builder()
    pipeline.add_component('prompt_builder', prompt_builder)

    with open("./yamls/test_pipeline_001.yml", "w") as file:
        pipeline.dump(file)

def load() -> None:
    pipeline = Pipeline()
    with open("./yamls/test_pipeline_001.yml", "r") as file:
        pipeline.load(file)

if __name__ == '__main__':
    dump()
    load()

Pipeline YAML content

components:
  prompt_builder:
    init_parameters:
      required_variables: []
      template:
      - !!python/object:haystack.dataclasses.chat_message.ChatMessage
        content: "\n            Query: {{query}}\n\n            Instruction:\n   \
          \             {{instruction}}\n                               \n\n     \
          \       Context:\n            {% for document in documents: %}\n       \
          \             {{document}}\n            {% endfor %}\n        "
        meta: {}
        name: null
        role: !!python/object/apply:haystack.dataclasses.chat_message.ChatRole
        - user
      variables: null
    type: haystack.components.builders.chat_prompt_builder.ChatPromptBuilder
connections: []
max_loops_allowed: 100
metadata: {}

Error while deserializing

Traceback (most recent call last):
  File "C:\Project\POC\ML\GENAI\Haystack\experiment\pipeline.py", line 37, in <module>
    load()
  File "C:\Project\POC\ML\GENAI\Haystack\experiment\pipeline.py", line 33, in load
    pipeline.load(file)
  File "C:\Python\envs\user\Lib\site-packages\haystack\core\pipeline\base.py", line 258, in load
    return cls.from_dict(marshaller.unmarshal(fp.read()), callbacks)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\haystack\marshal\yaml.py", line 17, in unmarshal
    return yaml.safe_load(data_)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\__init__.py", line 125, in safe_load
    return load(stream, SafeLoader)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 51, in get_single_data
    return self.construct_document(node)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 60, in construct_document
    for dummy in generator:
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 408, in construct_yaml_seq
    data.extend(self.construct_sequence(node))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 129, in construct_sequence
    return [self.construct_object(child, deep=deep)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 129, in <listcomp>
    return [self.construct_object(child, deep=deep)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 100, in construct_object
    data = constructor(self, node)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python\envs\user\Lib\site-packages\yaml\constructor.py", line 427, in construct_undefined
    raise ConstructorError(None, None,
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:haystack.dataclasses.chat_message.ChatMessage'
  in "<unicode string>", line 6, column 9:
          - !!python/object:haystack.datacla ... 
            ^

Requirements

python = "^3.11"
haystack-ai = "2.2.3"

@ParseDark
Copy link

same issue on here. When i try to use a custom component on my pipeline. hayhooks will throw an error

hayhooks deploy yml/start.yml
Error deploying pipeline: Unable to parse Haystack Pipeline start: Component '__main__.OpenAIFormatConverter' not imported.

Not sure. I think the haystack team does not want to maintain this project anymore. They just want to add more features to the haystack. But they forget one thing, deployment is more important because haystack pipeline if just created pipeline it is just a toy. Only the pipeline can deploy on the prod server, that's the main target.

@jimjones26
Copy link

I wanted to comment and say I am having the same issue. A pipeline which uses custom components will not succeed when trying to deploy, throws the following error:

Error deploying pipeline: Unable to parse Haystack Pipeline ingestion_pipeline: Component 'custom_components.get_page_source.CustomComponent' not imported.

@alex-stoica
Copy link
Contributor

@jimjones26 @ParseDark @satyakisen maybe it's not the solution that you're looking for, but running custom components in a pipeline with hayhooks containerized worked for me #27 - also there you can find the full archived code.

Now, if you don't want to use Docker, the problem might be more difficult

@alex-stoica
Copy link
Contributor

I second @ParseDark - a functional deployment code is highly important, ideally with the ability of executing multiple pipeline runs concurrently

@alex-stoica
Copy link
Contributor

@jimjones26
When you encountered

Unable to parse Haystack Pipeline ingestion_pipeline: Component 'custom_components.get_page_source.CustomComponent' not imported.

did you also check that your custom component is decorated with @component? I've noticed the same issue when forgetting to decorate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants