HACK: For speed, write out font styles only once. #124

shreevatsa · 2019-12-10T17:07:09Z

(This is not a pull request that can be merged in. It's kind of a followup from the request at #119 but instead of implementing a proper solution just hacked together something just for myself. But after seeing that I didn't touch it again for a few days and will realistically probably not return to it soon to learn how to do it properly, just saving it here so that I don't lose it when I change computers or something. In principle it may be used to implement a proper option or something, but I don't know enough about the complexity of the problem to say. Apologies for using this space for this!)

Background: I had a few DVI files what were over 1000 pages long, on which dvisvgm would take hours to run.
(Specifically, these files were the literate-programming listings of TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each section on a separate page... but this situation may also be familiar to those trying to run dvisvgm on the TikZ manual, as in #55 / #107 .)

With this change, the time to run dvisvgm went from hours to seconds.

What it does: Right now, when invoked with certain options, for every page of the DVI file, dvisvgm writes out @font-face and text style CSS rules, like:

@font-face{font-family:cmr10;src:url(data:application/x-font-ttf;base64,AAEAAAAN...

and

text.f12 {font-family:cmr10;font-size:9.96264px}

All that this change does, in a hacky way, is accumulate these across pages, and write each of them only once. Then, the separate SVGs for each page can all just use the common style.

Caveats:
This is a giant hack, with MANY caveats:

assuming there are enough pages (SVGs) for all this to be worth it,
assuming only 7-bit fonts (having glyphs in positions 0 to 127),
assuming font has no license problems (so doesn't have to be subset),
assuming the user can do some postprocessing, namely generating CSS files by wrapping the font-faces.txt and font-styles.txt files within <style> tags.
assuming SVG files don't have to be self-contained, i.e.
- when used from a HTML page, will be inserted directly into the DOM and inherit its styles, rather than being wrapped in img/object tags
- alternatively, postprocessing can put in the SVG file something like
```
   <style>@import 'common.css';</style>
```
  at the right place, where common.css is produced by (4) above.
assming dvisvgm is being invoked something like this:
```
 dvisvgm --page=1- --font-format=woff2,autohint
```

then, it may help to just do the expensive font-writing once, as here.

@font-face

**Background:** I had a few DVI files what were over 1000 pages long, on which dvisvgm would take hours to run. (Specifically, these files were the literate-programming listings of TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each section on a separate page... but this situation may also be familiar to those trying to run dvisvgm on the TikZ manual, as in #x / #y .) With this change, the time to run dvisvgm went from hours to seconds. **What it does:** When invoked with certain options, for every page of the DVI file, dvisvgm writes out `@font-face` and text style CSS rules, like: @font-face{font-family:cmr10;src:url(data:application/x-font-ttf;base64,AAEAAAAN... and text.f12 {font-family:cmr10;font-size:9.96264px} All that this change does, in a hacky way, is accumulate these across pages, and write each of them only once. Then, the separate SVGs for each page can all just use the common style. **Caveats:** This is a giant hack, with MANY caveats: 1. assuming there are enough pages (SVGs) for all this to be worth it, 2. assuming only 7-bit fonts (having glyphs in positions 0 to 127), 3. assuming font has no license problems (so doesn't have to be subset), 4. assuming the user can do some postprocessing, namely generating CSS files by wrapping the `font-faces.txt` and `font-styles.txt` files within `<style>` tags. 5. assuming SVG files don't have to be self-contained, i.e. - when used from a HTML page, will be inserted directly into the DOM and inherit its styles, rather than being wrapped in `img`/`object` tags - alternatively, postprocessing can put in the SVG file something like <style>@import 'common.css';</style> at the right place, where `common.css` is produced by (4) above. 6. assming dvisvgm is being invoked something like this: dvisvgm --page=1- --font-format=woff2,autohint then, it *may* help to just do the expensive font-writing once, as here.

hmenke · 2019-12-10T20:55:33Z

This looks like a great improvement, although I think it should be optional through a command line switch.

shreevatsa · 2019-12-11T05:36:10Z

@hmenke Agreed of course, that's why the long apology at the top about not doing it properly and also why this PR was made unmergeable :-) For now I just have a separate directory with this patched in, and I use the dvisvgm built in that directory when I need this mode, and regular dvisvgm otherwise.

I think the proper version, apart from using a commandline switch, should also account for all the different kinds of fonts etc (this is the part I don't know, and which I imagine makes this quite complex to do properly), and also properly accumulate the set of chars/glyphs encountered across all pages (and write at the end), rather than write out all 127 chars the first time.

mgieseki · 2019-12-11T09:11:05Z

Thanks for the feedback and for taking the time to dig into the sources. First of all, there's no need to apologize. I'm always glad if people suggest useful improvements or even provide patches.

As you've already pointed out, the PR in its current state would limit the functionality of dvisvgm as only 127-bit fonts are considered. Extending it to all font variants, especially native Unicode fonts, requires some more work. I think, it wouldn't be a good idea to encode entire Unicode fonts with thousands of glyphs to Base64 leading to a giant file while only a few of them are actually used in the processed document. So, collecting the referenced glyphs and subsetting the font is a crucial task. Also, you probably don't want to always process all pages of the document but only selected ones, e.g. because some things have been fixed there. Should a reconversion replace the existing file that contains the font data or is it better to have some mechanism to update it? There are many little things like these to consider and I hope to find some time to think about a working and extensible implementation.

agrahn · 2019-12-11T11:11:12Z

I don't like the idea of producing non-selfcontained SVGs, but enabling this via cmd-line option as a non-default setting would be ok, of course.

mgieseki · 2019-12-11T11:53:49Z

Of course, this feature would be an optional extension and not enabled by default. I usually try to keep the functionality compatible with previous releases. :-)

hmenke approved these changes Dec 10, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HACK: For speed, write out font styles only once. #124

HACK: For speed, write out font styles only once. #124

shreevatsa commented Dec 10, 2019

hmenke commented Dec 10, 2019

shreevatsa commented Dec 11, 2019

mgieseki commented Dec 11, 2019

agrahn commented Dec 11, 2019

mgieseki commented Dec 11, 2019

HACK: For speed, write out font styles only once. #124

Are you sure you want to change the base?

HACK: For speed, write out font styles only once. #124

Conversation

shreevatsa commented Dec 10, 2019

hmenke commented Dec 10, 2019

shreevatsa commented Dec 11, 2019

mgieseki commented Dec 11, 2019

agrahn commented Dec 11, 2019

mgieseki commented Dec 11, 2019