Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HACK: For speed, write out font styles only once. #124

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shreevatsa
Copy link

(This is not a pull request that can be merged in. It's kind of a followup from the request at #119 but instead of implementing a proper solution just hacked together something just for myself. But after seeing that I didn't touch it again for a few days and will realistically probably not return to it soon to learn how to do it properly, just saving it here so that I don't lose it when I change computers or something. In principle it may be used to implement a proper option or something, but I don't know enough about the complexity of the problem to say. Apologies for using this space for this!)

Background: I had a few DVI files what were over 1000 pages long, on which dvisvgm would take hours to run.
(Specifically, these files were the literate-programming listings of TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each section on a separate page... but this situation may also be familiar to those trying to run dvisvgm on the TikZ manual, as in #55 / #107 .)

With this change, the time to run dvisvgm went from hours to seconds.

What it does: Right now, when invoked with certain options, for every page of the DVI file, dvisvgm writes out @font-face and text style CSS rules, like:

@font-face{font-family:cmr10;src:url(data:application/x-font-ttf;base64,AAEAAAAN...

and

text.f12 {font-family:cmr10;font-size:9.96264px}

All that this change does, in a hacky way, is accumulate these across pages, and write each of them only once. Then, the separate SVGs for each page can all just use the common style.

Caveats:
This is a giant hack, with MANY caveats:

  1. assuming there are enough pages (SVGs) for all this to be worth it,

  2. assuming only 7-bit fonts (having glyphs in positions 0 to 127),

  3. assuming font has no license problems (so doesn't have to be subset),

  4. assuming the user can do some postprocessing, namely generating CSS files by wrapping the font-faces.txt and font-styles.txt files within <style> tags.

  5. assuming SVG files don't have to be self-contained, i.e.

    • when used from a HTML page, will be inserted directly into the DOM and inherit its styles, rather than being wrapped in img/object tags

    • alternatively, postprocessing can put in the SVG file something like

         <style>@import 'common.css';</style>
      

      at the right place, where common.css is produced by (4) above.

  6. assming dvisvgm is being invoked something like this:

     dvisvgm --page=1- --font-format=woff2,autohint
    

then, it may help to just do the expensive font-writing once, as here.

**Background:** I had a few DVI files what were over 1000 pages long,
on which dvisvgm would take hours to run.
(Specifically, these files were the literate-programming listings of
TeX/eTeX/pdfTeX/XeTeX programs, as typeset by WEAVE, except with each
section on a separate page... but this situation may also be familiar
to those trying to run dvisvgm on the TikZ manual, as in #x / #y .)

With this change, the time to run dvisvgm went from hours to seconds.

**What it does:** When invoked with certain options, for every page
of the DVI file, dvisvgm writes out `@font-face` and text style CSS
rules, like:

    @font-face{font-family:cmr10;src:url(data:application/x-font-ttf;base64,AAEAAAAN...

and

    text.f12 {font-family:cmr10;font-size:9.96264px}

All that this change does, in a hacky way, is accumulate these across
pages, and write each of them only once. Then, the separate SVGs for
each page can all just use the common style.

**Caveats:**
This is a giant hack, with MANY caveats:

1. assuming there are enough pages (SVGs) for all this to be worth it,

2. assuming only 7-bit fonts (having glyphs in positions 0 to 127),

3. assuming font has no license problems (so doesn't have to be subset),

4. assuming the user can do some postprocessing, namely generating CSS
   files by wrapping the `font-faces.txt` and `font-styles.txt` files
   within `<style>` tags.

5. assuming SVG files don't have to be self-contained, i.e.

   - when used from a HTML page, will be inserted directly into the DOM and
     inherit its styles, rather than being wrapped in `img`/`object` tags

   - alternatively, postprocessing can put in the SVG file something like

            <style>@import 'common.css';</style>

     at the right place, where `common.css` is produced by (4) above.

6. assming dvisvgm is being invoked something like this:

        dvisvgm --page=1- --font-format=woff2,autohint

then, it *may* help to just do the expensive font-writing once, as here.
@hmenke
Copy link

hmenke commented Dec 10, 2019

This looks like a great improvement, although I think it should be optional through a command line switch.

@shreevatsa
Copy link
Author

@hmenke Agreed of course, that's why the long apology at the top about not doing it properly and also why this PR was made unmergeable :-) For now I just have a separate directory with this patched in, and I use the dvisvgm built in that directory when I need this mode, and regular dvisvgm otherwise.

I think the proper version, apart from using a commandline switch, should also account for all the different kinds of fonts etc (this is the part I don't know, and which I imagine makes this quite complex to do properly), and also properly accumulate the set of chars/glyphs encountered across all pages (and write at the end), rather than write out all 127 chars the first time.

@mgieseki
Copy link
Owner

Thanks for the feedback and for taking the time to dig into the sources. First of all, there's no need to apologize. I'm always glad if people suggest useful improvements or even provide patches.

As you've already pointed out, the PR in its current state would limit the functionality of dvisvgm as only 127-bit fonts are considered. Extending it to all font variants, especially native Unicode fonts, requires some more work. I think, it wouldn't be a good idea to encode entire Unicode fonts with thousands of glyphs to Base64 leading to a giant file while only a few of them are actually used in the processed document. So, collecting the referenced glyphs and subsetting the font is a crucial task. Also, you probably don't want to always process all pages of the document but only selected ones, e.g. because some things have been fixed there. Should a reconversion replace the existing file that contains the font data or is it better to have some mechanism to update it? There are many little things like these to consider and I hope to find some time to think about a working and extensible implementation.

@agrahn
Copy link

agrahn commented Dec 11, 2019

I don't like the idea of producing non-selfcontained SVGs, but enabling this via cmd-line option as a non-default setting would be ok, of course.

@mgieseki
Copy link
Owner

Of course, this feature would be an optional extension and not enabled by default. I usually try to keep the functionality compatible with previous releases. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants