Skip to content

Extract text from images and PDFs using python and store in a JSON Format. Store the extracted in MYSQL database.

Notifications You must be signed in to change notification settings

Atul-vaibhav/OCR-Extraction-Using-Python

Repository files navigation

OCR to MySQL Data Pipeline

Overview

This project extracts text from images using OCR (Tesseract), structures the extracted data into JSON format, and stores it into a MySQL database.

Technologies Used

  • Python
  • OpenCV
  • Tesseract OCR
  • MySQL
  • JSON

Setup Instructions

1. Install Dependencies

Ensure you have Python installed. Install required libraries:

pip install pytesseract opencv-python mysql-connector-python

Install Tesseract OCR and add it to the system PATH:

2. Run OCR Script

To extract text and structure it as JSON:

python extract_and_structure_image.py

3. Setup MySQL Database

Create the database and tables using:

mysql -u root -p < sql/schema.sql

4. Insert Extracted Data

Run the MySQL storage script:

python store_to_mysql.py

Sample JSON Output

{
  "Patient Name": "John Doe",
  "DOB": "01/05/1980",
  "Pain Level": 6,
  "Comments": "Not good"
}

Contributing

Feel free to fork and improve the project. Submit a pull request for any enhancements.

License

This project is licensed under the MIT License.

About

Extract text from images and PDFs using python and store in a JSON Format. Store the extracted in MYSQL database.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages