Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paper analysis workflow & pdf loader #19

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eimenhmdt
Copy link
Owner

This PR adds a workflow that analyzes a given paper. The workflow extracts the main findings, methodology and limitations of the paper. The paper can be loaded as a PDF. For this, I added a new module "file_loaders" and a PDF loader that loads and splits PDFs by page. The PDF can load local and also remote PDFs if passed a URL (e.g. "https://arxiv.org/pdf/2302.03803.pdf").

Looking forward to your feedback!

f"Paper analysis initiated", "yellow", attrs=["bold", "blink"]
)
)
pages = load_pdf(pdf_path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quality of retrieval might be slightly improved by splitting the documents into smaller sections and adding some chunk overlap.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback!

@janzheng
Copy link

Could you please add the common sections of a paper? (Abstract, Introduction, Materials, Methods (sometimes they're in a single section called Materials & Methods), Results, Discussion, Supplemental Materials, References)

I think a method to just extract the text from those sections would be SO useful on its own.

@eimenhmdt
Copy link
Owner Author

Could you please add the common sections of a paper? (Abstract, Introduction, Materials, Methods (sometimes they're in a single section called Materials & Methods), Results, Discussion, Supplemental Materials, References)

I think a method to just extract the text from those sections would be SO useful on its own.

Great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants