Skip to content

Latest commit

 

History

History
20 lines (11 loc) · 1.08 KB

File metadata and controls

20 lines (11 loc) · 1.08 KB

Folder/file Purpose

20230401T120858Z-001 - Data that is used src - Source Code of the Assignment my_solution.ipynb - Test case of my solution in ipynb Alternate_Solution.ipynb - Alternate solution Requirements.txt - Requirements to Run Reuslt.png - Auto generated output

Objective: Design & develop a pipeline to Extract data from unstructured documents

Here I proposed two solutions :

  • my_solution.ipynb : It is in traditional way where first Preprocessing of image with opencv and then use of OCR libraries to gain information from images.

  • Alternate_Solution.ipynb : It is a Deep Learning Pre traind Model with text detection and recognisation with pyTorch and OCR , which can be customized as per our use. I selected this because of the robust ML pipeline that is used first text detection (localizing words), then text recognition (identify all characters in the word), which gives the edge to extract information seamlessly.