-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation on raw images #144
base: master
Are you sure you want to change the base?
Conversation
filter out binarized images (independent of the workflow), to improve segmentation quality
Codecov Report
@@ Coverage Diff @@
## master #144 +/- ##
==========================================
+ Coverage 37.73% 37.77% +0.04%
==========================================
Files 9 9
Lines 1023 998 -25
Branches 216 212 -4
==========================================
- Hits 386 377 -9
+ Misses 565 555 -10
+ Partials 72 66 -6
Continue to review full report at Codecov.
|
This needs to be tested systematically. I expect to see both degradation and improvement, depending on how hard binarization is. See here for explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reasoning, subscribing to tesseract-ocr/tesseract#3083 for the discussion on upstream changes. Changeset (filtering binarized
) is sensible but needs good testing to ensure that it is more beneficial than detrimental, or perhaps should be parameterizable.
I thought about that, but at workflow configuration time, you have next to no chance of knowing which is going to be better. (I would guess that only input images which fare well under global Otsu are better off with the change. But we have no automatic indicator of binarization quality yet. In the very least, we should strive for some estimator based on local distribution of connected component statistics.) But I still hope that we can fix the problem in Tesseract itself. |
d231edb
to
2b3e8d6
Compare
No description provided.