-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HACK: For speed, write out font styles only once. #124
base: master
Are you sure you want to change the base?
Conversation
**Background:** I had a few DVI files what were over 1000 pages long, on which dvisvgm would take hours to run. (Specifically, these files were the literate-programming listings of TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each section on a separate page... but this situation may also be familiar to those trying to run dvisvgm on the TikZ manual, as in #x / #y .) With this change, the time to run dvisvgm went from hours to seconds. **What it does:** When invoked with certain options, for every page of the DVI file, dvisvgm writes out `@font-face` and text style CSS rules, like: @font-face{font-family:cmr10;src:url(data:application/x-font-ttf;base64,AAEAAAAN... and text.f12 {font-family:cmr10;font-size:9.96264px} All that this change does, in a hacky way, is accumulate these across pages, and write each of them only once. Then, the separate SVGs for each page can all just use the common style. **Caveats:** This is a giant hack, with MANY caveats: 1. assuming there are enough pages (SVGs) for all this to be worth it, 2. assuming only 7-bit fonts (having glyphs in positions 0 to 127), 3. assuming font has no license problems (so doesn't have to be subset), 4. assuming the user can do some postprocessing, namely generating CSS files by wrapping the `font-faces.txt` and `font-styles.txt` files within `<style>` tags. 5. assuming SVG files don't have to be self-contained, i.e. - when used from a HTML page, will be inserted directly into the DOM and inherit its styles, rather than being wrapped in `img`/`object` tags - alternatively, postprocessing can put in the SVG file something like <style>@import 'common.css';</style> at the right place, where `common.css` is produced by (4) above. 6. assming dvisvgm is being invoked something like this: dvisvgm --page=1- --font-format=woff2,autohint then, it *may* help to just do the expensive font-writing once, as here.
This looks like a great improvement, although I think it should be optional through a command line switch. |
@hmenke Agreed of course, that's why the long apology at the top about not doing it properly and also why this PR was made unmergeable :-) For now I just have a separate directory with this patched in, and I use the I think the proper version, apart from using a commandline switch, should also account for all the different kinds of fonts etc (this is the part I don't know, and which I imagine makes this quite complex to do properly), and also properly accumulate the set of chars/glyphs encountered across all pages (and write at the end), rather than write out all 127 chars the first time. |
Thanks for the feedback and for taking the time to dig into the sources. First of all, there's no need to apologize. I'm always glad if people suggest useful improvements or even provide patches. As you've already pointed out, the PR in its current state would limit the functionality of dvisvgm as only 127-bit fonts are considered. Extending it to all font variants, especially native Unicode fonts, requires some more work. I think, it wouldn't be a good idea to encode entire Unicode fonts with thousands of glyphs to Base64 leading to a giant file while only a few of them are actually used in the processed document. So, collecting the referenced glyphs and subsetting the font is a crucial task. Also, you probably don't want to always process all pages of the document but only selected ones, e.g. because some things have been fixed there. Should a reconversion replace the existing file that contains the font data or is it better to have some mechanism to update it? There are many little things like these to consider and I hope to find some time to think about a working and extensible implementation. |
I don't like the idea of producing non-selfcontained SVGs, but enabling this via cmd-line option as a non-default setting would be ok, of course. |
Of course, this feature would be an optional extension and not enabled by default. I usually try to keep the functionality compatible with previous releases. :-) |
(This is not a pull request that can be merged in. It's kind of a followup from the request at #119 but instead of implementing a proper solution just hacked together something just for myself. But after seeing that I didn't touch it again for a few days and will realistically probably not return to it soon to learn how to do it properly, just saving it here so that I don't lose it when I change computers or something. In principle it may be used to implement a proper option or something, but I don't know enough about the complexity of the problem to say. Apologies for using this space for this!)
Background: I had a few DVI files what were over 1000 pages long, on which dvisvgm would take hours to run.
(Specifically, these files were the literate-programming listings of TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each section on a separate page... but this situation may also be familiar to those trying to run dvisvgm on the TikZ manual, as in #55 / #107 .)
With this change, the time to run dvisvgm went from hours to seconds.
What it does: Right now, when invoked with certain options, for every page of the DVI file, dvisvgm writes out
@font-face
and text style CSS rules, like:and
All that this change does, in a hacky way, is accumulate these across pages, and write each of them only once. Then, the separate SVGs for each page can all just use the common style.
Caveats:
This is a giant hack, with MANY caveats:
assuming there are enough pages (SVGs) for all this to be worth it,
assuming only 7-bit fonts (having glyphs in positions 0 to 127),
assuming font has no license problems (so doesn't have to be subset),
assuming the user can do some postprocessing, namely generating CSS files by wrapping the
font-faces.txt
andfont-styles.txt
files within<style>
tags.assuming SVG files don't have to be self-contained, i.e.
when used from a HTML page, will be inserted directly into the DOM and inherit its styles, rather than being wrapped in
img
/object
tagsalternatively, postprocessing can put in the SVG file something like
at the right place, where
common.css
is produced by (4) above.assming dvisvgm is being invoked something like this:
then, it may help to just do the expensive font-writing once, as here.