text-extractor

Extracts text from Office and PDFs files, using POI and PDFxStream, as a very, very tiny alternative to Apache Tika

This library, obviously, NO replaces Apache Tika. Only extracts text from Word, Excel, RTF and PDF files. It's based on the code found on the blog article Extract Text From pdf, office files(.doc, .ppt, .xls), open office files, .rtf, and text/plain files in Java but using the last Apache POI and PDFxStream versions (06/10/2015).

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src/main/java/org/riverframework/utils		src/main/java/org/riverframework/utils
.drone.sec		.drone.sec
.drone.yml		.drone.yml
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml
test.txt		test.txt
text-extractor.iml		text-extractor.iml

Provide feedback