Skip to content
This repository has been archived by the owner on Apr 9, 2024. It is now read-only.
/ pdf2html Public archive

Convert pdf files to text and html (eventually MS Word, too)

License

Notifications You must be signed in to change notification settings

aih/pdf2html

Repository files navigation

pdf2html

A web-based converter from pdf to html and eventually other formats (text and MSWord). The UI is based on a Django-backed app built on jQuery-File-Upload. That JQuery app was developed by Sebastian Tschan, with the source available on Github. This was ported to Django by Sigurd Gartmann (sigurdga on github).

I connected the UI to a back-end pdf converter. For a Django app to use JQuery-File-Upload, you should branch from [here](https://github.com/sigurda/django-jquery-file-upload).

TODO: Use the terrific library for pdf to html conversion: [pdf2htmlEX](https://github.com/coolwanglu/pdf2htmlEX/wiki/Quick-Start), using ttfautohint as --external-hint-tool=ttfautohint

Conversion to Word can use pandoc

License

MIT, as the original project. See LICENSE.txt.

About

Convert pdf files to text and html (eventually MS Word, too)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published