Skip to content

Tool for extract text from Office and PDFs files as a very, very tiny alternative to Apache Tika

License

Notifications You must be signed in to change notification settings

mariosotil/text-extractor

Repository files navigation

text-extractor

Extracts text from Office and PDFs files, using POI and PDFxStream, as a very, very tiny alternative to Apache Tika

This library, obviously, NO replaces Apache Tika. Only extracts text from Word, Excel, RTF and PDF files. It's based on the code found on the blog article Extract Text From pdf, office files(.doc, .ppt, .xls), open office files, .rtf, and text/plain files in Java but using the last Apache POI and PDFxStream versions (06/10/2015).

  • org.apache.poi, 3.12
  • com.snowtide.pdfxstream, 3.1.2

About

Tool for extract text from Office and PDFs files as a very, very tiny alternative to Apache Tika

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages