Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract struggles with 90-degree angled text sometimes #4387

Open
0dinD opened this issue Jan 31, 2025 · 5 comments
Open

Tesseract struggles with 90-degree angled text sometimes #4387

0dinD opened this issue Jan 31, 2025 · 5 comments
Labels
bug OSD Orientation and Script Detection

Comments

@0dinD
Copy link

0dinD commented Jan 31, 2025

Current Behavior

I was investigating whether Tesseract can handle mixed orientation in the text (see also: #2055), and found a specific case where it almost works, but fails in a way that makes me think there's a bug in the code. More specifically, in the example that I provide below, Tesseract seems to be reading the 90-degree text "upside-down", as in, reading the 90-degree text as if though it was 270-degree text.

For example, as you can see in the output hOCR below, the textangle is correctly identified as 90 degrees, but Tesseract is reading the text "upside-down", i.e. from a 270 degree perspective. Look at words like "anbeu" ("neque" but upside-down), "luenb" ("quam" but upside-down), "wesdi" ("ipsum" but upside-down) and so on.

Command used: tesseract text-90deg.png text-90deg --psm 1 hocr

Input image:

Image

Output hOCR:

text-90deg.hocr.txt

Tested with the current latest AppImage of Tesseract, 5.5.0

Expected Behavior

Tesseract should read all the text in the correct orientation so that there are no jumbled words in the hOCR output.

Suggested Fix

Find and fix the bug that makes Tesseract read 90-degree text as 270-degree text in this case.

tesseract -v

tesseract 5.5.0
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX512BW
 Found AVX512F
 Found AVX512VNNI
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

Operating System

Ubuntu 22.04 Jammy

Other Operating System

No response

uname -a

No response

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

No response

@0dinD
Copy link
Author

0dinD commented Jan 31, 2025

Oh, wait, now I just realized that I had the orientations the wrong way around: The hOCR spec for textangle says that it's counter-clockwise, so there is no discrepancy in the example above, with regard to textangle vs the OCR output.

The main issue then, seems to be that --psm 1 is not able to detect the correct orientation for the bottom text. The correct orientation should be 270 degrees in this case, but --psm 1 clearly chooses 90 degrees, which results in garbled output because the OCR will read things upside-down.

@amitdo
Copy link
Collaborator

amitdo commented Feb 3, 2025

Is osd.traineddata located in the same path of eng.traineddata in your machine?

@0dinD
Copy link
Author

0dinD commented Feb 3, 2025

@amitdo It looks like they are both in the same path, yes:

zerodind@machine:~/git/dev/tesseract$ ls -lah /usr/share/tesseract-ocr/4.00/tessdata/
total 18M
drwxr-xr-x 4 root root 4,0K dec 12 00:52 .
drwxr-xr-x 3 root root 4,0K feb 10  2022 ..
drwxr-xr-x 2 root root 4,0K jun 21  2022 configs
-rw-r--r-- 1 root root 4,0M sep 15  2017 eng.traineddata
-rw-r--r-- 1 root root  11M sep 15  2017 osd.traineddata
-rw-r--r-- 1 root root  572 feb  9  2022 pdf.ttf
-rw-r--r-- 1 root root 4,0M sep 15  2017 swe.traineddata
drwxr-xr-x 2 root root 4,0K jun 21  2022 tessconfigs

Let me know if there's any more information you need. For the record, in the repro case I gave above in this issue description, I was using the latest AppImage of Tesseract, version 5.5.0. Not sure if the AppImage itself contains the traineddata or if it uses the system files (which are for an older Tesseract version). Either way, I'm pretty this issue has existed for a long while, I initially did not use the AppImage but rather my system (Ubuntu 22.04 Jammy) version of Tesseract (version 4.1.1).

@amitdo
Copy link
Collaborator

amitdo commented Feb 5, 2025

I tested it myself with tesseract 5.5.0. I get a similar result.

@amitdo
Copy link
Collaborator

amitdo commented Feb 5, 2025

I manually removed the top block:

Image

With this image, Tesseract works well. It detects the textangle as 270 and thus the text recognition is fine.

@amitdo amitdo added the bug label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug OSD Orientation and Script Detection
Projects
None yet
Development

No branches or pull requests

2 participants