Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is is possible to convert to SVG but keep text as text? #17

Open
Dingo64 opened this issue Sep 3, 2017 · 12 comments
Open

Is is possible to convert to SVG but keep text as text? #17

Dingo64 opened this issue Sep 3, 2017 · 12 comments

Comments

@Dingo64
Copy link

Dingo64 commented Sep 3, 2017

Is is possible to convert to SVG but keep text as text?

@RonanKER
Copy link

I thing "pdf2svg" is not able to do anything about that, it depends of Poppler or Cairo library

@yuweiming2016
Copy link

@RonanKER ,do you hava any code or configuration to show it ?
i am looking for the way to let pdf2svg keep text as text from google for a week ,but nothing useful for me ,can you help me ?

@dawbarton
Copy link
Owner

If you want to keep text in the SVG then your best bet is to use Inkscape. I'm fairly sure it can be used from the command line to automate the conversion with text (though I've never used it for automated PDF -> SVG, only manually). Be aware that text often moves around a bit (the kerning is often a little off) when converting from a PDF.

@dawbarton
Copy link
Owner

See https://inkscape.org/doc/inkscape-man.html for details on the Inkscape command line.

@yuweiming2016
Copy link

I have learned to use Inkscape for a week. as i know Inkscape can just convert pdf to svg for the first page.is this real?
this is bad news for me.@dawbarton

@dawbarton
Copy link
Owner

It can open any page when opening with the gui. If you want everything via the command line, you can simply use qpdf or pdftk to extract the page you want from the PDF as a single page and then use Inkscape. (Inkscape might be able to do page selection from the command line, I just don't know how.)

@yuweiming2016
Copy link

i google for a long time ,but nothing is useful,so sad

image

@RonanKER
Copy link

RonanKER commented Sep 24, 2019

I got an old batch script from 2015 when I tryed it (with pdftk and inkscape) :
test_inkscape.txt

in the folder 'in' I put several pdf exemple/test files, and then i lunched several similar batch files to try several solutions (inkscape, pdf2svg, pdftron, poppler, ...) and then compare results.

If you can afford it, i think pdftron was the best, but i'm not sure it would preserve text as you wich.

@danielk892374
Copy link

could anyone hint me in the right direction to understand why neither cairo nor poppler preserve text during pdf to svg conversion (to find some workaround to force them to keep it)? Does this procedure have a name? Is it "text vectorization" by any chance?

By the way I've tried inkscape as well, but no luck. Libreoffice seemed to work, but it was extremely slow and created a large .svg file, which is very hard to open.

@dawbarton
Copy link
Owner

I'm not sure what the name is ("preserve text" would have been my guess). Inkscape is usually the best in recent years - I've not had any problems with the PDFs that I've given it recently. It might be worth running pdftotext on your PDF to see if it does actually contain any text.

@danielk892374
Copy link

After some research on PDFs in general I've realized that the problem was in the text being not a "regular text", but as part of "annotaton/comments" objects. These often get ignored when being imported and I believe that inkscape excluded them as well.

@BenjaminGalliot
Copy link

Similarly, and somewhat related, I believe it would be practical to preserve the hyperlinks—ideally embedded in the text itself, or at least surrounding the glyph paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants