GitHub - dsidavis/pdftohtml: copy of pdftohtml code with enhancements

This is a modified version of the pdftohtml project. It includes rectangles and paths in the XML output so that we can detect lines. Also information about images in the document. We can split the strings or coalesce them as they are processed.

sample.pdf is generated from mkPDF.R. This illustrates rectangles and lines. Using pdftohtml to convert this to XML gives us these elements.

See examples/

Feb 2023

We have recently integrated the code from the most recent version of xpdf (4.04) into this version of the modified pdftohtml. This is still a work in progress but addresses different versions of PDF and different security issues. We need to do a lot more testing.

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
examples		examples
fofi		fofi
goo		goo
splash		splash
src		src
xpdf		xpdf
AUTHORS		AUTHORS
CHANGES		CHANGES
Makefile		Makefile
MyNotes		MyNotes
README.md		README.md
SampleNotes		SampleNotes
TODO.md		TODO.md
aconf.h		aconf.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feb 2023

About

Releases

Packages

Contributors 2

Languages

dsidavis/pdftohtml

Folders and files

Latest commit

History

Repository files navigation

Feb 2023

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages