Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PredefinedPipeline.CHAT_WITH_WEBSITE fails deserialization #8391

Closed
silvanocerza opened this issue Sep 24, 2024 Discussed in #8390 · 0 comments · Fixed by #8401
Closed

PredefinedPipeline.CHAT_WITH_WEBSITE fails deserialization #8391

silvanocerza opened this issue Sep 24, 2024 Discussed in #8390 · 0 comments · Fixed by #8401
Assignees
Labels
P1 High priority, add to the next sprint

Comments

@silvanocerza
Copy link
Contributor

silvanocerza commented Sep 24, 2024

Discussed in #8390

Originally posted by aillusions September 24, 2024
Hi

I'e tried 2 sample apps from this guide https://haystack.deepset.ai/overview/quick-start and none of them have worked ..

Env:

python --version
  Python 3.12.6

pip show haystack-ai
   Name: haystack-ai
   Version: 2.5.1

first of them throws error:

haystack.core.errors.PipelineUnmarshalError: Error unmarshalling pipeline: Couldn't deserialize component 'converter' of class 'HTMLToDocument' with the following data: {'init_parameters': {'extractor_type': 'DefaultExtractor'}, 'type': 'haystack.components.converters.html.HTMLToDocument'}. Possible reasons include malformed serialized data, mismatch between the serialized component and the loaded one (due to a breaking change, see https://github.com/deepset-ai/haystack/releases), etc.

second

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate in certificate chain (_ssl.c:1000)

How to troubleshoot those?

thx


To reproduce:

from haystack import Pipeline, PredefinedPipeline

Pipeline.from_template(PredefinedPipeline.CHAT_WITH_WEBSITE)

It will fail with this error:

PipelineUnmarshalError: Error unmarshalling pipeline: Couldn't deserialize component 'converter' of class 'HTMLToDocument' with the following data: {'init_parameters': {'extractor_type': 'DefaultExtractor'}, 'type': 'haystack.components.converters.html.HTMLToDocument'}. Possible reasons include malformed serialized data, mismatch between the serialized component and the loaded one (due to a breaking change, see https://github.com/deepset-ai/haystack/releases), etc.
Source:
components:
  converter:
    init_parameters:
      extractor_type: DefaultExtractor
    type: haystack.components.converters.html.HTMLToDocument

  fetcher:
    init_parameters:
      raise_on_failure: true
      retry_attempts: 2
      timeout: 3
      user_agents:
      - haystack/LinkContentFetcher/2.0.0b8
    type: haystack.components.fetchers.link_content.LinkContentFetcher

  llm:
    init_parameters:
      api_base_url: null
      api_key:
        env_vars:
        - OPENAI_API_KEY
        strict: true
        type: env_var
      generation_kwargs: {}
      model: gpt-3.5-turbo
      streaming_callback: null
      system_prompt: null
    type: haystack.components.generators.openai.OpenAIGenerator

  prompt:
    init_parameters:
      template: |

        "According to the contents of this website:
        {% for document in documents %}
          {{document.content}}
        {% endfor %}
        Answer the given question: {{query}}
        Answer:
        "
    type: haystack.components.builders.prompt_builder.PromptBuilder

connections:
- receiver: converter.sources
  sender: fetcher.streams
- receiver: prompt.documents
  sender: converter.documents
- receiver: llm.prompt
  sender: prompt.prompt

metadata: {}

This probably stems from the change in the HTMLToDocument backend change from boilerpy3 to trafilatura coming from #7705.

@silvanocerza silvanocerza added the P1 High priority, add to the next sprint label Sep 24, 2024
@anakin87 anakin87 self-assigned this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants