Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SVG output compression with performance testing #135

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zm-cttae
Copy link

@zm-cttae zm-cttae commented Apr 25, 2023

Fixes

Closes #92. Also related to #90.
Related to previous efforts #36 and #70.

Objective

Restore SVG output styling data compression using a performant algorithm: dom-inline-style-filter.

User story

  • As: web developers who maintain and deploy SVGs in open source projects,
  • We would like to: generate SVG images from DOM elements with a light data footprint,
  • So that: we can ship SVG screenshots of HTML elements in production e.g. NPM packages.

Pull request description

In the past, we have struggled to compress the output of toSvg in dom-to-image-more in a stable way.
There is a writeup explaining our struggles in #92.
Even after an algorithm to filter out user agent styling when inlining the style, there is some ways to go with data size.

Implementation details

  1. When traversing DOM tree of node, group nodes by descending node depth.
    CSS inheritance is computed on the DOM tree via preorder traversal and is additive-cumulative (increases styling data).
    For the filter op which is subtractive, we want to traverse the tree in the opposite direction.
    The algorithm sorts elements in the node tree by descending node depth. (This is known as reverse level order traversal.)
    This gives us a 30% to 40% speed boost. This also ensures declarations are only removed when they really can be inherited.
  2. When filtering each inline style declaration by computed effect, go for the most hyphenated properties first.
    In CSS, shorthands consistently have less hyphens than their longhand.
    We want to filter out scenarios where a CSS property matches their shorthand.
    E.g. block-size -> height or border-color -> border.
    The algorithm does a radix sort with bitmasks for standard, custom and vendored proprties, then subsorts by descending hyphen count.
    In tests this filtered another 50% of inline styling.
    We also get a 20-40% speed boost because we're not setting as many properties back.

Testing

The underlying algorithm was determined to be a multi-pass, high-pass deterministic compression in two modes:

  • $O(log(N))$ growth for inputs at large filesizes $|F| >> 1e6 \text{ bytes}$
  • $O(c*N), c \approx 4$ growth for inputs at small filesizes $|F| << 1e6\space\text{ bytes}$

This data is collected from manual testing on the output of toSvg API method.

Results for large filesizes

Wikipedia article demo Value
Number of nodes 5894 nodes
Initial declaration count 128581 (21.8 declarations / node)
Pre-compression bytes 3.77mb
Reductions [2924992, 257746, 87120, 0]
Processing time 9642.8ms (1.64 ms/node)
Total reduction 3.27mb
Output declaration count 15853 (2.69 / node)
Post-compression bytes 504.8kb
Compression quotients [0.2252, 0.6967, 0.8525, 1]
Total quotient (compound) 0.1337
Trendline formula $1-exp(-8 / 5 \cdot N)$
graph-pass-gain-hugefile

Results for small filesizes

Code screenshot demo Value
Number of nodes 468 nodes
Initial declaration count 11397 (24.4 declarations / node)
Pre-compression bytes 373608b
Reductions [292044, 34152, 0]
Processing time 382ms (0.8 ms / node)
Total reduction 326196b
Post-compression bytes 47412b
Output declaration count 1777 (3.78 / node)
Compression quotients [0.892, 0.989, 0.999, 1]
Total quotient (compound) 0.1269
Trendline formula $1-exp(-9 / 4 \cdot N)$
graph-pass-gain-tinyfile

This algorithm has $O(log(N)$ or $O(c*N)$ growth - 1904labs#135
Refs: 1904labs#36, 1904labs#91, 1904labs#92
@zm-cttae zm-cttae changed the title Add SVG output compression (closes #92) Add SVG output compression with performance testing Apr 25, 2023
@IDisposable
Copy link
Member

Welcome back!! I'll look over this tomorrow.

Copy link
Member

@IDisposable IDisposable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty simple... welcome some comments on my questions.

src/dom-to-image-more.js Outdated Show resolved Hide resolved
if (name === 'animation-duration' || name === 'transition-duration') return; // dynamic properties

const value = targetStyle.getPropertyValue(name);
const declarations = tokenizeCssTextDeclarations(targetStyle.cssText);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is more costly to do this every time... probably should be working with the declarations split out once, then in the loop we can go through each retaining the "kept" properties in one array and joining them up as we go (I get maintaining the order is important), so we are not constantly splitting the style up... we could map thing to a "keep/kill" pattern so that we replace the declaration[index] with an empty string when sure we can elide it and then filtering them out just before the final squish?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tokenization is really cheap. I thought of doing it this way. But I was coding through a migraine aura..

If you could contribute it as a patch that would be appreciated 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you, cataract (eye) surgery yesterday... probably be a couple days before I can dive into wandering that code better, but this is very good so far :)

src/dom-to-image-more.js Show resolved Hide resolved
src/dom-to-image-more.js Show resolved Hide resolved
IDisposable added a commit that referenced this pull request Apr 26, 2023
Manually implemented a better fallback from PR #134
This is intended to help with running in non-browser environments like Node and Vite.
Bumped dev-time packages.
Also moved the ensureSandboxWindow method out one layer in preparation for upcoming PR #135 for pruning.
@IDisposable
Copy link
Member

A question occurred to me just now, is it worth bypassing this logic if they're calling toPng or something that uses toSvg, but doesn't expose the underlying SVG so who cares how big it is?

There is no legal restriction on this value. But 255 is a sane™️ limit.
Refs: 1904labs#135
@zm-cttae
Copy link
Author

It's good you mentioned the previous raster-vector path.

Previously, we #70 was accepted as a valid defect from a perf perspective.
IMHO this PR is a bit of a perf regression when it comes to large complex inputs.

If we don't want regresssions in #70 I should be adding back the vector-raster path.

@zm-cttae
Copy link
Author

zm-cttae commented Apr 28, 2023

Is it worth bypassing this logic if they're calling toPng or something that uses toSvg, but doesn't expose the underlying SVG so who cares how big it is?

I personally think "yes". 10 seconds for 5000 nodes is objectively expensive at vast sizes.
The inverse is that speeding up the toPng path is the best option
(Maybe @joswhite and @mohd-akram would agree also)

@IDisposable
Copy link
Member

PERSONAL NOTE: I've had a setback on my eye surgery and will have to take some time away from computers (and thus this) for at least a few days. I WILL get this merged in sometime soon :)

@zm-cttae
Copy link
Author

zm-cttae commented May 4, 2023

I'll try add back the vector-raster path in the meantime.

@IDisposable
Copy link
Member

If you do, make the path call all the way into toSVG and have it get an extract boolean to opt out of the simplification

@zm-cttae
Copy link
Author

zm-cttae commented May 6, 2023

Are the following time results acceptable?
npm test: $3.2 \space \text{s}$ ➡️ $6.6 \space \text{s}$?

@zm-cttae
Copy link
Author

zm-cttae commented May 6, 2023

@joswhite I know you were of the opinion this PR should be a package.
Is that still the case? Or could you kindly see if there are any bugs or nits here 🙏?

@zm-cttae
Copy link
Author

This pull request has been published for external use by the way! dom-inline-style-filter

@IDisposable
Copy link
Member

@zm-cttae I would welcome a .MD file change for this pointing to your excellent package :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add onclone size optimization for inline styles
2 participants