Version 2 beta is now available and under development in the master branch, read a story about v2 beta: Why I refactor tesseract.js v2?
Check the support/1.x branch for version 1
Tesseract.js is a javascript library that gets words in almost any language out of images. (Demo)
Image Recognition
Video Real-time Recognition
Tesseract.js wraps an emscripten port of the Tesseract OCR Engine. It works in the browser using webpack or plain script tags with a CDN and on the server with Node.js. After you install it, using it is as simple as:
import Tesseract from 'tesseract.js';
Tesseract.recognize(
'https://tesseract.projectnaptha.com/img/eng_bw.png',
'eng',
{ logger: m => console.log(m) }
).then(({ data: { text } }) => {
console.log(text);
})
Or more imperative
import { createWorker } from 'tesseract.js';
const worker = createWorker({
logger: m => console.log(m)
});
(async () => {
await worker.load();
await worker.loadLanguage('eng');
await worker.initialize('eng');
const { data: { text } } = await worker.recognize('https://tesseract.projectnaptha.com/img/eng_bw.png');
console.log(text);
await worker.terminate();
})();
Check out the docs for a full explanation of the API.
- Upgrade to tesseract v4.1 (using emscripten 1.38.45)
- Support multiple languages at the same time, eg: eng+chi_tra for English and Traditional Chinese
- Supported image formats: png, jpg, bmp, pbm
- Support WebAssembly (fallback to ASM.js when browser doesn't support)
- Support Typescript
Tesseract.js works with a <script>
tag via local copy or CDN, with webpack via npm
and on Node.js with npm/yarn
.
<!-- v2 -->
<script src='https://unpkg.com/[email protected]/dist/tesseract.min.js'></script>
<!-- v1 -->
<script src='https://unpkg.com/[email protected]/src/index.js'></script>
After including the script the Tesseract
variable will be globally available.
Tesseract.js currently requires Node.js v6.8.0 or higher
# For v2
npm install tesseract.js@next
yarn add tesseract.js@next
# For v1
npm install tesseract.js
yarn add tesseract.js
- Offline Version: https://github.com/jeromewu/tesseract.js-offline
- Custom Traineddata: https://github.com/jeromewu/tesseract.js-custom-traineddata
- Chrome Extension: https://github.com/jeromewu/tesseract.js-chrome-extension
- With Vue: https://github.com/jeromewu/tesseract.js-vue-app
- With Angular: https://github.com/jeromewu/tesseract.js-angular-app
- With React: https://github.com/jeromewu/tesseract.js-react-app
- Typescript: https://github.com/jeromewu/tesseract.js-typescript
- Video Real-time Recognition: https://github.com/jeromewu/tesseract.js-video
To run a development copy of Tesseract.js do the following:
# First we clone the repository
git clone https://github.com/naptha/tesseract.js.git
cd tesseract.js
# Then we install the dependencies
npm install
# And finally we start the development server
npm start
The development server will be available at http://localhost:3000/examples/browser/demo.html in your favorite browser.
It will automatically rebuild tesseract.dev.js
and worker.dev.js
when you change files in the src folder.
You can also run the development server in Gitpod ( a free online IDE and dev environment for GitHub that will automate your dev setup ) with a single click.
To build the compiled static files just execute the following:
npm run build
This will output the files into the dist
directory.
This project exists thanks to all the people who contribute. [Contribute].
Become a financial contributor and help us sustain our community. [Contribute]
Support this project with your organization. Your logo will show up here with a link to your website. [Contribute]