Skip to content

Commit

Permalink
Merge branch 'refs/heads/sync-v0.3.13' into chore/sync-with-upstream-…
Browse files Browse the repository at this point in the history
…v0.3.13

# Conflicts:
#	tests/pytesseract_test.py
#	unstructured_pytesseract/__init__.py
  • Loading branch information
christinestraub committed Aug 15, 2024
2 parents 27ab7fd + bb128f3 commit 4249b78
Show file tree
Hide file tree
Showing 6 changed files with 22 additions and 21 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
fail-fast: true
matrix:
include:
- {name: '3.12', python: '3.12', os: ubuntu-20.04, tox: py312}
- {name: '3.11', python: '3.11', os: ubuntu-20.04, tox: py311}
- {name: '3.10', python: '3.10', os: ubuntu-20.04, tox: py310}
- {name: '3.9', python: '3.9', os: ubuntu-20.04, tox: py39}
Expand Down
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
exclude: ^(tests/data/)
repos:
- repo: https://github.com/psf/black
rev: 23.7.0
rev: 23.9.1
hooks:
- id: black
args: [-S, --line-length=79, --safe, --quiet]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand All @@ -20,22 +20,22 @@ repos:
rev: 6.1.0
hooks:
- id: flake8
- repo: https://github.com/pre-commit/mirrors-autopep8
- repo: https://github.com/hhatto/autopep8
rev: v2.0.4
hooks:
- id: autopep8
- repo: https://github.com/asottile/reorder-python-imports
rev: v3.10.0
rev: v3.12.0
hooks:
- id: reorder-python-imports
args: [--py37-plus]
- repo: https://github.com/asottile/pyupgrade
rev: v3.10.1
rev: v3.15.0
hooks:
- id: pyupgrade
args: [--py37-plus]
- repo: https://github.com/asottile/add-trailing-comma
rev: v3.0.1
rev: v3.1.0
hooks:
- id: add-trailing-comma
# - repo: https://github.com/asottile/setup-cfg-fmt
Expand Down
12 changes: 3 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,8 @@ Add the following config, if you have tessdata error like: "Error opening data f

* **run_and_get_output** Returns the raw output from Tesseract OCR. Gives a bit more control over the parameters that are sent to tesseract.

* **run_and_get_multiple_output** Returns like `run_and_get_output` but can handle multiple extensions. This function replaces the `extension: str` kwarg with `extension: List[str]` kwarg where a list of extensions can be specified and the corresponding data is returned after only one `tesseract` call. This function reduces the number of calls to `tesseract` when multiple output formats, like both text and bounding boxes, are needed.

**Parameters**

``image_to_data(image, lang=None, config='', nice=0, output_type=Output.STRING, timeout=0, pandas_config=None)``
Expand Down Expand Up @@ -245,12 +247,4 @@ As of Python-tesseract 0.3.1 the license is Apache License Version 2.0
CONTRIBUTORS
------------
- Originally written by `Samuel Hoffstaetter <https://github.com/h>`_
- `Juarez Bochi <https://github.com/jbochi>`_
- `Matthias Lee <https://github.com/madmaze>`_
- `Lars Kistner <https://github.com/Sr4l>`_
- `Ryan Mitchell <https://github.com/REMitchell>`_
- `Emilio Cecchini <https://github.com/ceccoemi>`_
- `John Hagen <https://github.com/johnthagen>`_
- `Darius Morawiec <https://github.com/nok>`_
- `Eddie Bedada <https://github.com/adbedada>`_
- `Uğurcan Akyüz <https://github.com/ugurcanakyuz>`_
- `Full list of contributors <https://github.com/madmaze/pytesseract/graphs/contributors>`_
1 change: 1 addition & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ classifiers =
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
Programming Language :: Python :: 3.11
Programming Language :: Python :: 3.12
Programming Language :: Python :: Implementation :: CPython
Programming Language :: Python :: Implementation :: PyPy
Expand Down
2 changes: 1 addition & 1 deletion unstructured_pytesseract/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,4 @@
from .pytesseract import TSVNotSupported


__version__ = '0.3.12'
__version__ = '0.3.13'
15 changes: 10 additions & 5 deletions unstructured_pytesseract/pytesseract.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
from os.path import normcase
from os.path import normpath
from os.path import realpath
from pkgutil import find_loader
from tempfile import NamedTemporaryFile
from time import sleep
from typing import List
Expand All @@ -32,14 +31,20 @@

tesseract_cmd = 'tesseract'

numpy_installed = find_loader('numpy') is not None
if numpy_installed:
try:
from numpy import ndarray

pandas_installed = find_loader('pandas') is not None
if pandas_installed:
numpy_installed = True
except ModuleNotFoundError:
numpy_installed = False

try:
import pandas as pd

pandas_installed = True
except ModuleNotFoundError:
pandas_installed = False

LOGGER = logging.getLogger('pytesseract')

DEFAULT_ENCODING = 'utf-8'
Expand Down

0 comments on commit 4249b78

Please sign in to comment.