Improve css class matching #545

bbshelper · 2023-10-28T14:36:58Z

This is my first PR on #529

I've tested with load and render, but I've no knowledge of cache mechanism yet so not sure it works there. I publish them for review now because I want to avoid conflicts to subsequent commit(s) :)

Tested with two sets of books from Project Gutenburg, and cycle estimations of LoadDocument() reported by callgrind are as follows (fewer is better):

Percentile	Largest Books	Median Books
Best 10%	-39.2%	-11.3%
Median	-6.0%	-3.3%
Worst 10%	-1.6%	-0.2%

This change is

crengine/src/lvstsheet.cpp

crengine/src/lvtinydom.cpp

crengine/include/lvtinydom.h

crengine/include/lvstsheet.h

crengine/src/lvstsheet.cpp

crengine/src/lvtinydom.cpp

poire-z · 2023-10-28T20:21:24Z

A bit hard to get into with the existing, and your, names getAttribute/Value/Index, so better explicite naming may make it easier to follow.
But I get the idea, and nothing against.

I've tested with load and render, but I've no knowledge of cache mechanism yet so not sure it works there.

It will probably fails with a non-small book, after you open and exit it (so a cache is created), and you re-open it (so, loading the DOM and attributes/values indexes from the cache) and re-render it: your _classAttributeTable will be empty, and some CSS may then not match (or use the slower class checking path).

You would either need to:
A rebuild it when about to be needed (from iterating all the nodes, or all the existing class atribute values (which are cached) if it is possible (dunno).
B serialize/deserialize it in/from the cache

We do A for _urlImageMap and lists (for the get/setNodeNumberingProps() methods).
We do B for most of the other id/name/attribute/values stuff (see lxmlDocBase::serializeMaps() / ::deserializeMaps() ).

You may have a hard time de/serializing LVHashTable<lUInt32, LVArray<lUInt32> *, true> (dunno).
May be storing your LVArray<lUInt32> as a lString32 could make it simpler, and using :String32:pos() to find it instead of iterating the LVArray<lUInt32>.
I guess B would only store things for nodes with class="multiple values", and nothing when class="singlevalue" - so it shouldn't take much space in the cache and in RAM when deserialized.
A would be better if 1) it would take room in RAM and we only need when rerendering - and 2) it's fast to compute.

(Thanks for not having me read std::hash(std::array) :)

About #529: looks like this is not one of the three ideas you mentionned?
Any thought about my comment in there?
Especially about hashing each classname, so getting a lUint32. No need to cache anything. When in stylesheet.apply(), and for each node, we would compute the array of lUInt32 at start (like you do when you fetch them from _classAttributeTable), and when making the stylesheet, you would just save a lUInt32 in a LvCssSelectorRule._hashed_value instead of your _index ?

bbshelper · 2023-10-31T14:48:33Z

Thanks for the insights on the cache part. It will take some time before I can submit a working patch.

(Thanks for not having me read std::hash(std::array) :)

That's no accident, my dear friend :)

About #529: looks like this is not one of the three ideas you mentionned?

This is actually a scaled back version of my first idea. I experimented with hash, and also ID selectors, but haven't got a good result (yet).

Any thought about my comment in there?

Apologies for not replying further. It seemed I didn't make myself clear so I'd rather talk with code :p

Especially about hashing each classname, so getting a lUint32. No need to cache anything. When in stylesheet.apply(), and for each node, we would compute the array of lUInt32 at start (like you do when you fetch them from _classAttributeTable), and when making the stylesheet, you would just save a lUInt32 in a LvCssSelectorRule._hashed_value instead of your _index ?

I didn't fully get your idea in #529 until now :)
With lString32::getHash(), there might be collisions. For something like my quickCheck() its fine, but with the uniqueness provided by _classAttributeTable I can possibly further remove the recheck in LVCssSelectorRule::check(). On the other hand, it saves the time and space and complexity of _classAttributeTable, so it could be a gain over my implementation. I'll need more experiment and profiling though.

NiLuJe · 2023-10-31T18:00:37Z

IIRC, MuPDF uses perfect hashing via gperf for those ;).

poire-z · 2023-10-31T20:55:12Z

^ You mentionned that previously in #348.
We also ship a crengine/include/xxhash.h, that is used only in the following, which looks like it's what would be needed for our topic.

crengine/crengine/src/lvtinydom.cpp

Lines 382 to 385 in 156d095

    
           static lUInt32 calcHash(const lUInt8 * s, int len) 
        
           { 
        
               return XXH32(s, len, 0); 
        
           }

and that we could reuse I guess if we're not sure of our lString8::getHash() and collisions.

bbshelper · 2023-10-31T23:23:27Z

IIRC, MuPDF uses perfect hashing via gperf for those ;).

MuPDF uses that for their CSS properties, which is known at compile time. In our case, the class names are dynamic, and I'm not sure gperf supports that.

bbshelper · 2023-11-01T11:43:33Z

So I followed @poire-z 's input and implemented a version with lString32's own hash. It's slightly slower (typically less than 1% for LoadDocument()) but much simpler. I'd like to go with this version.

crengine/src/lvstsheet.cpp

crengine/include/lvstsheet.h

crengine/src/lvstring.cpp

poire-z · 2023-11-01T13:53:52Z

It's slightly slower (typically less than 1% for LoadDocument())

Any idea why ?
It feels computing these hashes would be less intensive than fetching things from your previous _classAttrValue LVHashTable that would need some allocations on creation and iterations when looking/fetching stuff.

but much simpler. I'd like to go with this version.

Sure, I like it better for this reason.

poire-z · 2023-11-01T13:57:20Z

crengine/src/lvstsheet.cpp

@@ -6022,14 +6049,25 @@ void LVStyleSheet::apply( const ldomNode * node, css_style_rec_t * style )
    LVCssSelector * selector_0 = _selectors[0];
    LVCssSelector * selector_id = id>0 && id<_selectors.length() ? _selectors[id] : NULL;

+    LVArray<lUInt32> class_hash_array;


(Do you think there's any difference in speed between creating LVArray<lUInt32>, adding into, and iterating them, vs. creating a single lString32 and using :pos() ?)

I would dearly hope that an array would be significantly faster ;).

My wild guess is that they are of the same speed. lString32 is basically an LVArray<lUInt32> plus a lot of string specific methods (and reference counting).

LVCssSelector now holds an LVRef to rules. This simplifies copying and destruction.

bbshelper · 2023-11-03T16:23:05Z

It's slightly slower (typically less than 1% for LoadDocument())

Any idea why ? It feels computing these hashes would be less intensive than fetching things from your previous _classAttrValue LVHashTable that would need some allocations on creation and iterations when looking/fetching stuff.

Well, to answer your question, I rerun the profiling and found that, for the whole load and render process, the new version is actually 0.18% faster (median) than the old one, and only 9% are slower. The old profiling data got overwritten so I can't really tell what went wrong.

Memory allocation changes can ripple through the whole program with callgrind profiling. I did some manual comparison but didn't reach any conclusion. However, when a node has only one single class name, the old version doesn't calculate the attribute hash, or allocate an LVArray. This might help explain those few cases where the new version is slower.

poire-z · 2023-11-03T17:03:09Z

but much simpler. I'd like to go with this version.

I'm now realizing that this simple solution won't allow for more similar optimisations :/

We would only be optimising the class check for the current node we are checking if it has a class= attribute, so optimising only .foo or div.bar, because of the work that you only do at the entry point (around our _selector_0 we were talking about).
We won't be able to reuse much of your code (your initial idea or the simpler followup) if we want to optimize:
div.container p (which are really common in most books with large stylesheets I've seen) which go checking other LVCssSelectorRules of the same LVCssSelector, but with parent nodes of our initial node, ie. in all the case cssrt_parent, cssrt_ancessor...

If we want to optmize that too, we'd need to move the quick class hash checking in there:

crengine/crengine/src/lvstsheet.cpp

Lines 5166 to 5186 in 22191fd

    
           case cssrt_class:         // E.class 
        
               { 
        
                   const lString32 &val = node->getAttributeValue(attr_class); 
        
                   if ( val.empty() ) 
        
                       return false; 
        
                   // val.lowercase(); // className should be case sensitive 
        
                   // we have appended a space when there was some inner space, meaning 
        
                   // this class attribute contains multiple class names, which needs 
        
                   // more complex checks 
        
                   if ( val[val.length()-1] == ' ' ) { 
        
                       int start = 0; 
        
                       int pos; 
        
                       while ((pos = val.pos(_value, start)) >= 0) { 
        
                           if ((pos == 0 || val[pos - 1] == ' ') && val[pos + _value.length()] == ' ') 
        
                               return true; 
        
                           start += _value.length(); 
        
                       } 
        
                       return false; 
        
                   } 
        
                   return val == _value; 
        
               }

actually replacing the string comparison here with the "hash present in an array of hashes" test (so, no need for a rarely used quickCheck() - just a quick regular check() :)

So, maybe by reusing/tweaking some ideas from your initial implementation:

we need to cache the hashes - we can't compute it each time we meet the node, because of the cssrt_ancessors rules that will check many child > parent > parent paths and will meet these same nodes and their classname many times.
we don't need a mapping of node => hash or array of hash: the value of the class attribute ("foo bar" in <div class="foo bar">) is stored only once, in lString32HashedCollection _attrValueTable, so many nodes are sharing the same item. Its reference is the attributeValueIndex you had been using in your initial implementation I think. So, it's a mapping of attributeValueIndex => hashes that we would need, and it may end up being really small ! (After checking your "force pushed" diff, I realize that's what you might have been doing already in your initial implementation ! But you might not need to do it in lvtindom.cpp when parsing attributes, you could just have it done by calling a lvtinydom.cpp method here in LVCssSelectorRule::check(), would look cleaner and localized)
this cache would not need to be stored in the document cache: it would be filled when needed and kept in ram until the document is closed. Small to iterate, quick to fill and use.

What do you think ?
Shouldn't be too hard to implement and try - and I guess that if you already had such % speedup numbers by only doing the optimisation for the tail end, you'll get even more impressive ones if you can get it done for all .class checks on ancestors.

bbshelper · 2023-11-04T14:46:21Z

I've been experimenting the cache thing for a while. It is actually more involved than I expected. The patch is still far from reviewable, but I can share my two cents now since you ask:

Selectors like div.foo p is quite common and cannot be optimized by this commit.
Other than class selectors, there are excessive calls to some ldomNode methods, such as getNodeId(),
likely caused by ancestor rules (other combinators may contribute too, but ancestor rules are more common).
They are not slow per se (if not persisted), but can accumulate to several % to total load and render time.

My idea is to introduce a cache class (FatNode), which should behave just like ldomNode, but caches some results in it, like getNodeId(), or isRoot().

In both LoadDocument() and render(), we apply styles in a depth first way. It's easy to maintain a list of FatNodes from the root node to the current one (AncestorCache). With this cache, methods like FatNode::getParentNode() can easily return a FatNode *. This should help a lot with ancestor rules.

Just as you said, the cache don't need to be stored in document cache. They are constructed on the fly during style calculation and discarded afterwards.

I don't think the current commit could interfere with this design. We just need to cache the hash values in FatNode.

Implementation challenges:

sibling combinators: FatNode::getUnboxedPrevSibling() can only return a ldomNode as its siblings are not in AncestorCache. They probably shouldn't be. This makes FatNode not a drop-in replacement for ldomNode in style caclulation, so there need to be 2 versions for most functions in lvstsheet.cpp. Template helps, but still.
call stack is complex: between LoadDocument()/render() (which should own the AncestorCache) and LVStyleSheet::apply(), there are initNodeStyleRecursive(), updateStyleDataRecursive(), initNodeStyle(), setNodeStyle(), to name a few. AncestorCache have to be passed around and maintained along the way, a lot of work, not very elegant. Also, initNodeStyle() has more users than I thought, even in odx_fb2TitleHandler::makeSection(), which I totally have no idea of.
for small books, or books with really simple css, the added complexity could make things slower (but they should already be fast to open and render, the user won't notice anyway).

poire-z · 2023-11-04T14:59:29Z

likely caused by ancestor rules (other combinators may contribute too, but ancestor rules are more common).
They are not slow per se (if not persisted), but can accumulate to several % to total load and render time.

Well, that's expected they are slow, given the multiple parent paths we need to check.

I somehow can follow the rest of your ideas - but I don't really want to get much involved in these other complex node & style topics for now :) It's already complex, and I'm not sure I'd want more kludge if it is not elegant and not too intrusive.

I was just focused on the scope of this PR, avoiding string comparisons in classname checks.
And it feels the idea I suggested above could be really not intrusive, letting as much stuff in lvstsheet.cpp, hopefully just adding a method in lvtinydom.cpp.

bbshelper · 2023-11-04T16:12:56Z

Well.. misunderstanding again.. maybe I was too preoccupied by the whole FatNode thing :p

I'd say your idea makes sense. Actually I had a failed attempt to completely remove the class string check. I'll do some experiments and come back to you.

Class name matching is slow. It's extremely slow when there are many class selectors. Mitigate this problem by: 1. when parsing css, store class name's hash value 2. during css check, split class names and calculate their hashes 3. for class selectors, compare integers first to avoid expensive class name matching for most rules

bbshelper · 2023-11-18T16:31:00Z

Tweaking my initial implementation didn't deliver a better result than the simpler approach. It's still slower with my test set. So please consider merging my latest commit. Applying the quick check to selector_id has measurable speedup.

poire-z · 2023-11-18T21:22:05Z

So please consider merging my latest commit.

I did some poor-man timings checks on 2 documents (one with some ancestors class selectors, timing around 1.6s - the other without much CSS but many nodes, timing around 10s), and I do observe (sometimes, not always, I don't often get consistent timings... dunno why) some noticable speedup, so I could be fine with that.

Tweaking my initial implementation didn't deliver a better result than the simpler approach. It's still slower with my test set.

I'm not sure you got what I had in mind, and if you ended up testing what I imagined, so I went coding it, picking stuff from your initial implementation.
Here's my changes over the current state of this PR: pr545_classHashes.diff.txt

Somehow, I don't notice the speedup I got with your current PR :/ I'm not sure why.
On some document, I get 83 misses (so classhashes computed and put in our LVHashTable of size 128), and 3 011 599 hits (so, classhashes fetches from that cache). The checks & fetches shouldn't take many loops if we have 83 items in the 128-modulo-hashes slots...

Can you check how this patch does with your benchmarking tools ?
And if you don't observe any speedup, any idea why if brings no efficiency, and where's the bottleneck ?

bbshelper · 2023-11-19T14:10:24Z

I ran your patch through my test epubs. It was slower. I've attached the profiling result of one epub here: pr545.tgz. It's chosen because it has the median performance change among the 100 epubs. You may check it with KCacheGrind.

It's probably not a good idea to remove quickClassCheck(). It can reduce 90% calls to LVCssSelector::check(), and it is much faster. There are several checks before the switch statement inside LVCssSelectorRule::check(), which takes time.
The change inside cssrt_class case doesn't look like a clear win over plain string comparison. getClassHashes() already takes 69 cycles avg.
Different strings can have same hash value (collision). I don't think it's functionally correct to return results solely based on hash comparison, though I don't see any regressions in my test epubs.

I can share with you my experiment with my initial implementation (and you're right: it's indeed not what you had in mind): bbshelper@e270699. Unfortunately It's also slightly slower than the current PR commit.

I have also prototyped with the FatNode thing, and with cache in place, the above commit is only slightly faster. So I decided not pursue further at this moment.

poire-z · 2023-11-19T15:24:47Z

OK, thanks for your tests and thoughts.
So, let's go with that, and call it pause :)

poire-z reviewed Oct 28, 2023

View reviewed changes

crengine/src/lvstsheet.cpp Outdated Show resolved Hide resolved

poire-z reviewed Oct 28, 2023

View reviewed changes

bbshelper force-pushed the css branch from 259657a to 8bb1ec0 Compare November 1, 2023 11:22

poire-z reviewed Nov 1, 2023

View reviewed changes

crengine/src/lvstsheet.cpp Outdated Show resolved Hide resolved

crengine/include/lvstsheet.h Outdated Show resolved Hide resolved

crengine/include/lvstsheet.h Outdated Show resolved Hide resolved

crengine/src/lvstring.cpp Show resolved Hide resolved

poire-z reviewed Nov 1, 2023

View reviewed changes

bbshelper added 2 commits November 3, 2023 18:23

css: mark const member functions

ba15321

css: clarify LVCssSelectorRule ownership

e5ddf5e

LVCssSelector now holds an LVRef to rules. This simplifies copying and destruction.

bbshelper force-pushed the css branch from 8bb1ec0 to 5f55db4 Compare November 3, 2023 15:56

bbshelper force-pushed the css branch from 5f55db4 to 0858bce Compare November 18, 2023 16:19

poire-z merged commit 55fa104 into koreader:master Nov 19, 2023
1 check passed

This was referenced Dec 2, 2023

bump crengine: various optimizations and fixes koreader/koreader-base#1705

Merged

bump crengine: various optimizations and fixes koreader/koreader#11191

Merged

poire-z mentioned this pull request Dec 17, 2023

Strange schrödinger's CSS issue koreader/koreader#11235

Closed

poire-z mentioned this pull request Dec 17, 2023

LVStyleSheet: fix stylesheet possibly not applied #550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve css class matching #545

Improve css class matching #545

bbshelper commented Oct 28, 2023 •

edited by Frenzie

Loading

poire-z commented Oct 28, 2023

bbshelper commented Oct 31, 2023

NiLuJe commented Oct 31, 2023 •

edited

Loading

poire-z commented Oct 31, 2023

bbshelper commented Oct 31, 2023

bbshelper commented Nov 1, 2023

poire-z commented Nov 1, 2023

poire-z Nov 1, 2023

NiLuJe Nov 1, 2023

bbshelper Nov 3, 2023

bbshelper commented Nov 3, 2023

poire-z commented Nov 3, 2023

bbshelper commented Nov 4, 2023

poire-z commented Nov 4, 2023

bbshelper commented Nov 4, 2023

bbshelper commented Nov 18, 2023

poire-z commented Nov 18, 2023

bbshelper commented Nov 19, 2023

poire-z commented Nov 19, 2023

Improve css class matching #545

Improve css class matching #545

Conversation

bbshelper commented Oct 28, 2023 • edited by Frenzie Loading

poire-z commented Oct 28, 2023

bbshelper commented Oct 31, 2023

NiLuJe commented Oct 31, 2023 • edited Loading

poire-z commented Oct 31, 2023

bbshelper commented Oct 31, 2023

bbshelper commented Nov 1, 2023

poire-z commented Nov 1, 2023

poire-z Nov 1, 2023

Choose a reason for hiding this comment

NiLuJe Nov 1, 2023

Choose a reason for hiding this comment

bbshelper Nov 3, 2023

Choose a reason for hiding this comment

bbshelper commented Nov 3, 2023

poire-z commented Nov 3, 2023

bbshelper commented Nov 4, 2023

poire-z commented Nov 4, 2023

bbshelper commented Nov 4, 2023

bbshelper commented Nov 18, 2023

poire-z commented Nov 18, 2023

bbshelper commented Nov 19, 2023

poire-z commented Nov 19, 2023

bbshelper commented Oct 28, 2023 •

edited by Frenzie

Loading

NiLuJe commented Oct 31, 2023 •

edited

Loading