Skip to content

Automates cheque detail extraction using OCR, processing payee, amount, MICR, and storing structured data.

Notifications You must be signed in to change notification settings

Web-Dev-Learner/Bank_cheque_extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automating Bank Cheque Extraction from Scanned PDFs

This project automates the extraction of key details from scanned bank cheque images and PDFs. It utilizes Optical Character Recognition (OCR) and advanced image processing techniques to extract vital cheque information such as the payee name, cheque number, bank name, amount in words and numbers, MICR code, and more. The extracted details are then stored in a structured format for further processing.

Preview Video Demo for a detailed walkthrough.

Project Overview

  • Project Tasks:
    • Upload PDF or image files (PDF, JPG, JPEG, PNG).
    • Process uploaded files through OCR (Gemini API) to extract cheque details.
    • Store extracted details in a PostgreSQL database.
    • Visualize analytics of the processed cheque data.
  • How to Use the System:
    • Login Page: Login to the system to access the main dashboard.
    • Home Page: Contains project overview and guidance on how to use the system.
      • Project Title: Automating Bank Cheque Extraction from Scanned PDFs
      • How to Use the System: Step-by-step guide for processing documents.
      • Next Steps: Navigate through the sidebar to explore features.
      • Tips for Best Results: Use high-quality, properly scanned documents.
    • Upload Page: Upload PDF or image files for cheque extraction.
    • Analytics Page: View summary statistics and visualizations of the extracted cheque data.
  • Upload Page:
    • Supported formats: PDF, JPG, JPEG, PNG (limit 200MB per file).
    • The extraction process involves:
      • Converting PDF to images using PyMuPDF.
      • Using OCR (Gemini API) to extract details.
      • Storing data in PostgreSQL.
      • Viewing analytics such as total cheque amounts, total cheques, and bank names.
  • Analytics Dashboard:
    • Summary Statistics:
      • Total Banks
      • Total Cheque Amount
      • Total Cheques
    • Cheque Details Table: Sort and filter cheque details by columns such as payee name and cheque amount.
    • Cheque Amount Distribution Visualizations:
      • Pie Chart: Top 5 Banks by Cheque Amount.
      • Bar Chart: Payee vs Amount.
      • Scatter Chart: Bank Name vs Amount.
      • Download buttons for PNG images of visualizations and full analytics report in Excel, CSV, or PDF format.
  • Tech Stack:

    • Backend: Python, PostgreSQL
    • Frontend: Streamlit
    • OCR: Gemini API
    • Data Processing: Pandas, Matplotlib
    • File Handling: PyMuPDF, FPDF, ReportLab
    • Database: PostgreSQL
    • Other Libraries: psycopg2-binary, xlsxwriter, google-generativeai, python-dotenv

    Usage

    • Upload a scanned cheque PDF or image to extract the relevant information.
    • View the extracted data in JSON format.
    • Explore the analytics dashboard for statistical insights and visualizations.
    • Download the results in multiple formats (Excel, CSV, PDF).

    About

    Automates cheque detail extraction using OCR, processing payee, amount, MICR, and storing structured data.

    Topics

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages