Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoice Extraction #84

Merged
merged 38 commits into from
Jan 9, 2025
Merged

Conversation

SinghSuryaDeep
Copy link
Contributor

@SinghSuryaDeep SinghSuryaDeep commented Nov 20, 2024

PR Checklist

Model Interaction

  • Flexible LLM platform support The platform should be easily switchable. Use LangChain or LlamaIndex.
  • Use prompt guide corresponding to the model For example for Granite 3.x Language Models

Data

  • Example data: Follow the example data guidance.

Notebook requirements

  • Notebook outputs cleared: Ensure all notebook outputs are cleared.
  • Automated testing: Add the recipe to the automated tests as described here
  • Test in Google Colab:
    • Test that it works in Google Colab (Python 3.10.12).
    • Colab has its own package set and Python version, so ensure compatibility.
  • Test locally:
    • Ensure the code works in a fresh Python virtual environment (venv).
  • Standard access to secrets and variables Include !pip install git+https://github.com/ibm-granite-community/utils in the first code cell in order to make get_env_var available to accessing secrets and variables in the recipe.

Incoming References

  • README.md updates:
    • Add a link to the recipe in the Table of Contents (ToC).
    • Include a Colab button after that link.

GitHub

  • Commits signed: All commits must be GPG or SSH signed.
  • DCO Compliance: Developer Certificate of Origin (DCO) applies to the code, documentation, and any example data provided. Ensure commits are signed off.

Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
Copy link
Collaborator

@bjhargrave bjhargrave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I have some comments here. Also, please add this notebook's path to .github/notebook_lists/vanilla_notebooks.txt so the notebooks will be tested in the build.

Also, please rebase upon the tip of the main branch.

recipes/Invoice-Extraction/Granite_Recipes_Invoices.ipynb Outdated Show resolved Hide resolved
recipes/Invoice-Extraction/Granite_Recipes_Invoices.ipynb Outdated Show resolved Hide resolved
recipes/Invoice-Extraction/Granite_Recipes_Invoices.ipynb Outdated Show resolved Hide resolved
@bjhargrave
Copy link
Collaborator

bjhargrave commented Dec 2, 2024

Also, please add this notebook's path to .github/notebook_lists/vanilla_notebooks.txt so the notebook will be tested in the build.

@SinghSuryaDeep
Copy link
Contributor Author

I have changed the invoice path as mentioned above and added the notebook path to the vanilla_notebooks.txt file.

Copy link

github-actions bot commented Dec 2, 2024

Testing Notebooks workflow launched on this PR: View run

@bjhargrave bjhargrave requested a review from adampingel December 2, 2024 14:33
@bjhargrave
Copy link
Collaborator

@adampingel I think this is ready to merge.

@adampingel
Copy link
Contributor

@SinghSuryaDeep

  1. Where are the invoice pdfs from? We need to be able to distribute all data under CDLA Permissive 2.0
  2. Please create an entry in the top-level README.md for these -- I think under the first section of recipes

Signed-off-by: Surya Deep Singh <[email protected]>
Signed-off-by: Surya Deep Singh <[email protected]>
@SinghSuryaDeep
Copy link
Contributor Author

@adampingel

  1. The invoice PDF contains mocked-up data. The numbers and names are not real.
  2. I have added an entry in the README.md file for invoice extraction.

Copy link

github-actions bot commented Dec 3, 2024

Testing Notebooks workflow launched on this PR: View run

Copy link

github-actions bot commented Dec 5, 2024

Testing Notebooks workflow launched on this PR: View run

@bjhargrave
Copy link
Collaborator

@SinghSuryaDeep The latest ci test fails with

FileNotFoundError: Missing ONNX file: /home/runner/.cache/huggingface/hub/models--ds4sd--docling-models/snapshots/a8a57426c20d9f7bc0343cfd84e8b439425e5561/model_artifacts/layout/beehive_v0.0.5/model.pt

I am not sure exactly what is going on, but it seems to be a problem with docling.

@SinghSuryaDeep
Copy link
Contributor Author

Hi @bjhargrave and @adampingel
I tested the invoice extraction notebook and updated the GitHub repository as well. Could you please try rerunning the pipeline again. Thanks

Copy link

github-actions bot commented Jan 6, 2025

Testing Notebooks workflow launched on this PR: View run

@SinghSuryaDeep
Copy link
Contributor Author

@adampingel and @bjhargrave

Just a gentle reminder regarding the PR. Please let me know if any action or additional information is required from my side. If not, may we proceed with merging?

Thanks in advance!

@bjhargrave bjhargrave merged commit b42e55b into ibm-granite-community:main Jan 9, 2025
8 of 10 checks passed
@bjhargrave
Copy link
Collaborator

Thanks @SinghSuryaDeep!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants