Automating Bank Cheque Extraction from Scanned PDFs

This project automates the extraction of key details from scanned bank cheque images and PDFs. It utilizes Optical Character Recognition (OCR) and advanced image processing techniques to extract vital cheque information such as the payee name, cheque number, bank name, amount in words and numbers, MICR code, and more. The extracted details are then stored in a structured format for further processing.

Preview Video Demo for a detailed walkthrough.

Project Overview

Project Tasks:

Upload PDF or image files (PDF, JPG, JPEG, PNG).
Process uploaded files through OCR (Gemini API) to extract cheque details.
Store extracted details in a PostgreSQL database.
Visualize analytics of the processed cheque data.

How to Use the System:

Login Page: Login to the system to access the main dashboard.
Home Page: Contains project overview and guidance on how to use the system.
- Project Title: Automating Bank Cheque Extraction from Scanned PDFs
- How to Use the System: Step-by-step guide for processing documents.
- Next Steps: Navigate through the sidebar to explore features.
- Tips for Best Results: Use high-quality, properly scanned documents.
Upload Page: Upload PDF or image files for cheque extraction.
Analytics Page: View summary statistics and visualizations of the extracted cheque data.

Upload Page:

Supported formats: PDF, JPG, JPEG, PNG (limit 200MB per file).
The extraction process involves:
- Converting PDF to images using PyMuPDF.
- Using OCR (Gemini API) to extract details.
- Storing data in PostgreSQL.
- Viewing analytics such as total cheque amounts, total cheques, and bank names.

Analytics Dashboard:

Summary Statistics:
- Total Banks
- Total Cheque Amount
- Total Cheques
Cheque Details Table: Sort and filter cheque details by columns such as payee name and cheque amount.
Cheque Amount Distribution Visualizations:
- Pie Chart: Top 5 Banks by Cheque Amount.
- Bar Chart: Payee vs Amount.
- Scatter Chart: Bank Name vs Amount.
- Download buttons for PNG images of visualizations and full analytics report in Excel, CSV, or PDF format.

Tech Stack:

Backend: Python, PostgreSQL
Frontend: Streamlit
OCR: Gemini API
Data Processing: Pandas, Matplotlib
File Handling: PyMuPDF, FPDF, ReportLab
Database: PostgreSQL
Other Libraries: psycopg2-binary, xlsxwriter, google-generativeai, python-dotenv

Usage

Upload a scanned cheque PDF or image to extract the relevant information.
View the extracted data in JSON format.
Explore the analytics dashboard for statistical insights and visualizations.
Download the results in multiple formats (Excel, CSV, PDF).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
Preview_demo		Preview_demo
README.md		README.md
app.py		app.py
db_handler.py		db_handler.py
gemini.py		gemini.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automating Bank Cheque Extraction from Scanned PDFs

Project Overview

Tech Stack:

Usage

About

Releases

Packages

Languages

Web-Dev-Learner/Bank_cheque_extraction

Folders and files

Latest commit

History

Repository files navigation

Automating Bank Cheque Extraction from Scanned PDFs

Project Overview

Tech Stack:

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages