Skip to content

Latest commit

 

History

History
65 lines (43 loc) · 1.44 KB

README.md

File metadata and controls

65 lines (43 loc) · 1.44 KB

pdf2textlib

PyPI Status Downloads

Simple Multilingual PDF text extraction, Also extracts from images

import pdf2textlib

print(pdf2textlib.getText("Demo.pdf","eng+tel+urd"))  
# parameter 1 : Path to the PDF file
# parameter 2 : string of language codes separated by '+' sign 

OS Dependencies

Debian, Ubuntu, and friends

sudo apt-get install build-essential libpoppler-cpp-dev pkg-config python-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python-devel redhat-rpm-config

macOS

brew install pkg-config poppler

Conda users may also need libgcc:

conda install -c anaconda libgcc

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda:
    conda install -c conda-forge poppler
    

Install

pip install pdf2textlib