-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Reader API to convert HTMLs into Documents #663
Comments
I was the one who proposed this component. |
Hey there @bilgeyucel @anakin87! Is this still a thing? I just toyed around with the API and got good results. Would be happy to knock this out if you guys think its valuable. |
@jlonge4 I think the API improved over time. I see they now have different endpoints for converting a page into markdown, searching the web and grounding (experimental). |
@anakin87 I think it's pretty cool. Do you think the existing LinkContentFetcher/Web Search components have too much overlap in functionality with it? |
I would say it is just another nice option. Are you thinking of a single component or more than one? |
@anakin87 I agree! Would passing modes at init time to a single component make sense? |
I'm thinking of something like: @component
class JinaReader():
def __init__(
self,
api_key: Secret = Secret.from_env_var("JINA_API_KEY"),
mode: Union[Mode, str],
...
):
...
@component.output_types(document=Document)
def run(self, input:str):
# check input depending on mode
...
Line 4 in ac0e4c2
|
|
@anakin87 looks great, I'll get it cooking asap! |
@jlonge4 I have added a tasklist to #663 (comment). Could you maybe help with opening a PR to mention the |
@anakin87 you've got it, no problem 😎 |
Closing this issue. (Only social media announcement is missing.) |
Is your feature request related to a problem? Please describe.
There's no component to use Jina's Reader API with Haystack.
Describe the solution you'd like
A new JinaHTMLtoDocument (name TBD) component to use Jina's Reader API to convert URLs into Haystack Documents. This component should accept a URL and output a Haystack Document.
Describe alternatives you've considered
Additional context
Add any other context or screenshots about the feature request here.
Tasks
JinaReaderConnector
#1212Tasks
The text was updated successfully, but these errors were encountered: