Skip to content

Commit

Permalink
segment-table: also do OSD to allow cells with vertical text
Browse files Browse the repository at this point in the history
  • Loading branch information
bertsky committed Aug 24, 2020
1 parent d25cfa6 commit ecfc989
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion ocrd_tesserocr/segment_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,7 +158,14 @@ def process(self):
LOG.info("Detecting table cells in region '%s'", region.id)
#
# detect the region segments:
tessapi.SetPageSegMode(PSM.SPARSE_TEXT) # retrieve "cells"
tessapi.SetPageSegMode(PSM.SPARSE_TEXT_OSD) # retrieve "cells"
# FIXME: _OSD is necessary to get VERTICAL_TEXT (90°) blocks, but
# this also causes looking for vertical gaps/alignments everywhere
# (not just blocks that end up as vertical), so often cells
# will span more than 1 line and some text will even be missed!
# We should check whether some strokewidth params can influence this.
# Otherwise, Tesseract should become more consistent in deciding for
# vertically aligned blobs (either the whole block, or keep horizontal).
# TODO: we should XY-cut the sparse cells in regroup them into consistent cells
layout = tessapi.AnalyseLayout()
roelem = reading_order.get(region.id)
Expand Down

0 comments on commit ecfc989

Please sign in to comment.