Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Extractor from images #843

Open
Somyajain2004 opened this issue Nov 9, 2024 · 2 comments
Open

Text Extractor from images #843

Somyajain2004 opened this issue Nov 9, 2024 · 2 comments
Assignees

Comments

@Somyajain2004
Copy link
Contributor

Is your feature request related to a problem? Please describe.
The problem is the need for an efficient and accurate way to extract text from images. This can be particularly useful in scenarios where users want to convert printed or handwritten text in images (like receipts, documents, or notes) into editable and searchable digital text. Currently, manual text extraction is time-consuming and error-prone.

Describe the solution you'd like
The solution should allow users to upload an image, and the system will automatically recognize and extract text, converting it into an editable format like plain text, or PDF, The extracted text should be accurate and retain formatting where possible, making it usable for further processing or data entry.

Describe alternatives you've considered
-Third-party OCR Tools: While external OCR (Optical Character Recognition) tools like Tesseract or Google Vision API are available, they require additional setup, API integration, and, in some cases, incur costs. An in-built solution would streamline the process and enhance user experience.

Approach to be followed (optional)
-Use a pre-trained OCR model (e.g., Tesseract OCR) to recognize text from uploaded images.
-Build a simple user interface for users to upload images and view extracted text results.
-Implement text-editing and export options (e.g., save as .txt, .pdf, or .csv).

Additional context
Expected results :
Input:
image

Output :
Textual Conventions (I)

MediumType, MediumAddress
ethernet(7), tokenring(9), fddi(15)
PeerType, PeerAddress
ipv4(1), ipv6(2), nsap(3), ipx(11), appletalk(12), decnet(13)
AdjacentType, AdjacentAddress
A superset of MediumType and PeerType
RTFM WG 3
The University of Auckland

@Somyajain2004 Somyajain2004 added the enhancement New feature or request label Nov 9, 2024
Copy link

github-actions bot commented Nov 9, 2024

Thanks for creating the issue in ML-Nexus!🎉
Before you start working on your PR,
Pull the latest changes to avoid any merge conflicts.

  • Attach before & after screenshots in your PR for clarity.
  • Include the issue number in your PR description for better tracking.
    Happy open-source contributing!☺️

@Somyajain2004
Copy link
Contributor Author

Could you add level to this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant