Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pdf file context provider #483

Open
rpg2807 opened this issue Sep 18, 2023 · 6 comments
Open

Support for pdf file context provider #483

rpg2807 opened this issue Sep 18, 2023 · 6 comments
Labels
area:context-providers Relates to context providers kind:enhancement Indicates a new feature request, imrovement, or extension

Comments

@rpg2807
Copy link

rpg2807 commented Sep 18, 2023

First of all, thanks for the extension. It seems like a great tool.
I tried supplying a link to online pdf file using @url but it seemed to read the encodes pdf file as plain text.
Please show me how to add pdf file content as context or add the feature.
Btw, I see embedding context provider in the plug-in directory. Not sure how to use it though.

@sestinj
Copy link
Contributor

sestinj commented Sep 18, 2023

@rpg2807 We have documentation here on how to add a context provider: https://continue.dev/docs/customization/context-providers#building-your-own-context-provider

Once you have a new context provider (the embeddings provider included), you can add it you your ~/.continue/config.py like this:

from continuedev.src.continuedev.plugins.context_providers.github import GitHubIssuesContextProvider

...
config=ContinueConfig(
  ...
  context_providers=[
    GitHubIssuesContextProvider(
      repo_name="continuedev/continue",  # change to whichever repo you want to use
      auth_token="<my_github_auth_token>",
    )
  ]
)

The embeddings context provider might work, so you could give it a try, but we will be working on it later this week and I can share when we have a production-ready version : )

@sestinj
Copy link
Contributor

sestinj commented Sep 18, 2023

It might make sense (and be easier) to just add .pdf functionality to the URLContextProvider, just as basically an if statement if the URL is a .pdf, and then decode it to text in the specific way needed. Could just implement that in this function without needing to rewrite any of the context provider logic

@rpg2807
Copy link
Author

rpg2807 commented Sep 19, 2023

Thanks for the suggestion. That does look like an easier way around.
When I tried, I was having issues correctly installing/importing PyPDF2 module. I tried adding PyPDF2 in the requirements.txt, explicitly called 'pip install PyPDF2' in build.sh but nothing seems to be helping. When I load the extension, this is what I get:

File "/tmp/_MEI9a8UlQ/continuedev/src/continuedev/plugins/context_providers/url.py", line 7, in <module>
    from PyPDF4 import PdfFileReader

ModuleNotFoundError: No module named 'PyPDF2'

Could you please instruct how could one add a python module to be used in the context provider?

@sestinj
Copy link
Contributor

sestinj commented Sep 19, 2023

@rpg2807 This is a limitation of running the server as a "frozen" binary. Any packages not included in the binary cannot be imported afterward, so you'll have to bundle it into the binary.

You did the right thing by adding it to requirements.txt, but sometimes pyinstaller misses a few imports when building the binary, which you'll have to list in the hidden_imports field in continue/run.spec. So you could try adding 'PyPDF2' to this list and then building again.

@oldluke92
Copy link

@rpg2807 did you make progress here ?
If so, are you willing to share what you did ?

@dosubot dosubot bot added area:context-providers Relates to context providers kind:enhancement Indicates a new feature request, imrovement, or extension and removed enhancement labels Jul 8, 2024
@simeneide
Copy link

nning the server as a "frozen" binary. Any packages not included in the binary cannot be imported afterward, so you'll have to bundle it int

Yes, any progress here? I have docs that are contained in pdf's, and this would be really nice inside continue! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:context-providers Relates to context providers kind:enhancement Indicates a new feature request, imrovement, or extension
Projects
None yet
Development

No branches or pull requests

6 participants