Skip to content

Commit

Permalink
Merge pull request #27 from leferrad/release/1.3.2
Browse files Browse the repository at this point in the history
Release v1.3.2
  • Loading branch information
leferrad authored Aug 12, 2024
2 parents c45886d + dfdde5a commit 4e33628
Show file tree
Hide file tree
Showing 8 changed files with 58 additions and 9 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install Tesseract
run: sudo apt-get update && sudo apt-get install -y tesseract-ocr
- uses: julia-actions/setup-julia@v1
with:
version: '1'
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- name: Install Tesseract
run: sudo apt-get update && sudo apt-get install -y tesseract-ocr
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "OCReract"
uuid = "c9880795-194d-450c-832d-1e8a03a8ecd1"
authors = ["Leandro Ferrado"]
version = "1.3.1"
version = "1.3.2"

[deps]
FileIO = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@
## Installation

From the Julia REPL, type `]` to enter the Pkg REPL mode and run:

```julia-repl
pkg> add OCReract
```

This is just a wrapper, so it assumes you already have installed [Tesseract](https://tesseract-ocr.github.io/tessdoc/Installation.html).
This is just a wrapper, so it assumes you already have installed [Tesseract](https://tesseract-ocr.github.io/tessdoc/Installation.html). Also, be sure the binary `tesseract` is in your PATH (you can check this by running `tesseract --version` in your terminal).

## Usage

Expand All @@ -37,4 +38,5 @@ julia> println(strip(res_text));
In a Julia session, run `Pkg.test("OCReract", coverage=true)`.

## Next steps

- Develop a module for image pre-processing (to improve OCR results)
13 changes: 10 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,13 @@ OCReract is a simple Julia wrapper of the well-known OCR engine called [Tesserac
The Tesseract OCR engine must be installed manually. On ubuntu, this may be as simple as

```
$ sudo apt-get install -y tesseract-ocr
sudo apt-get install -y tesseract-ocr
```

but the [installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html) are the authoritative source.

The Julia wrapper can be installed using the Julia package manager. From the Julia REPL, type `]` to enter the Pkg REPL mode and run:

```julia-repl
pkg> add OCReract
```
Expand All @@ -26,7 +27,7 @@ In this simple example, we will process the following image through the two opti

![Test Image](https://raw.githubusercontent.com/leferrad/OCReract.jl/master/test/files/noisy.png)

#### In disk
### In disk

Let's execute `run_tesseract` to process the image from repository's test folder, and then `cat` the resulting text file.

Expand All @@ -39,7 +40,7 @@ julia> read(`cat $res_path`, String)
"Noisy image\nto test\nOCReract.jl\n\f"
```

#### In memory
### In memory

`OCReract` uses [JuliaImages](https://juliaimages.org/latest/) module to process images in memory. So, the image should be loaded with `Images` module (or the lighter-weight combination `using ImageCore, FileIO`) to then execute `run_tesseract` to retrieve the result as a `String`.

Expand All @@ -61,6 +62,12 @@ OCReract.jl
```@index
```

```@docs
OCReract.OCReract
OCReract.check_tesseract_installed
OCReract.get_tesseract_version
```

```@autodocs
Modules = [OCReract]
Private = false
Expand Down
4 changes: 3 additions & 1 deletion src/OCReract.jl
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
module OCReract

export
run_tesseract
run_tesseract,
get_tesseract_version,
check_tesseract_installed

include("tesseract.jl")

Expand Down
33 changes: 30 additions & 3 deletions src/tesseract.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,27 @@ using ImageCore
using Logging

export
run_tesseract
run_tesseract,
get_tesseract_version,
check_tesseract_installed

# Tesseract settings
command = "tesseract"
psm_valid_range = 0:14
oem_valid_range = 0:4

"""Util to check and inform whether Tesseract is installed or not"""
"""
check_tesseract_installed()
This function checks if Tesseract is installed in the system by running the command
`$command --version`. If the command is not recognized, an error is logged.
# Examples
```julia-repl
julia> using OCReract;
julia> check_tesseract_installed()
```
"""
function check_tesseract_installed()
try
read(`$command --version`, String);
Expand All @@ -21,7 +34,21 @@ end

check_tesseract_installed()

"""Util to get version of Tesseract installed"""
"""
get_tesseract_version() -> String
Function to get the version of Tesseract installed in the system.
The version is extracted from the first line of the output of the command `$command --version`.
# Returns
- `String`: version of Tesseract installed
# Examples
```julia-repl
julia> using OCReract;
julia> get_tesseract_version()
```
"""
function get_tesseract_version()
info = read(`$command --version`, String)
version = split(info, "\n")[1]
Expand Down
7 changes: 7 additions & 0 deletions test/test_tesseract.jl
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,13 @@ end
function test_get_tesseract_version()
version = OCReract.get_tesseract_version()
tesseract_string, version_string = split(version, " ")
# Remove v from version string if it exists
if tesseract_string == "tesseract" && version_string[1] == 'v'
version_string = version_string[2:end]
end
# Use only the version number (major.minor.patch)
version_string = split(version_string, ".")[1:3]
version_string = join(version_string, ".")
@test tesseract_string == "tesseract"
@test occursin(r"^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$", version_string)
end
Expand Down

0 comments on commit 4e33628

Please sign in to comment.