Skip to content

Commit

Permalink
Fixed rare bug
Browse files Browse the repository at this point in the history
  • Loading branch information
paolobettelini committed Nov 4, 2024
1 parent f1d8943 commit 8e3017e
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion scripts/pdfextract.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,10 @@ def extract_text_from_pdf(pdf_file):
# Get coordinates and text
x, y, text = lobj.bbox[0], lobj.bbox[3], lobj.get_text().strip()

if text.startswith("!"):
# The '!' characters should be at coordinate 0 (actually, very very close)
# I encountered a situation where there were '!' symbols at absurd cordinates,
# so we add the < 0.001 condition just to be sure.
if text.startswith("!") and x < 0.001:
# Clean characters.
# TODO: ' needs to be replace, but the other characters should be supported

Expand Down

0 comments on commit 8e3017e

Please sign in to comment.