Speedup PreProcessXmlString() #526

bbshelper · 2023-09-25T13:28:23Z

There are more than necessary comparions made on every char. By

do a quick check before proceeding to more complicated paths
use pointers instead of indices
we see ~50% fewer time spent in the function.

This change is

NiLuJe

Not familiar with the logic, so I'll let @poire-z comment, but I'm not seeing anything too egregious, and the reasoning is definitely sound ;).

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @bbshelper)

crengine/src/lvxml.cpp line 5436 at r1 (raw file):

static void decode_character(const lChar32 *end, const lChar32 *enc_table,
        const lChar32 *&src, lChar32 *&dst) {

Style nit: there's no real house style because it's a mess already, but I'm not a fan of the indent there ;).

crengine/src/lvxml.cpp line 5437 at r1 (raw file):

static void decode_character(const lChar32 *end, const lChar32 *enc_table,
        const lChar32 *&src, lChar32 *&dst) {
    int state = 1, ch;

Shouldn't ch be a lChar32?

(Linters probably won't like this type of declarations anyway).

crengine/src/lvxml.cpp line 5536 at r1 (raw file):

    for (const lChar32 *src = str; src < end; ++src) {
        lChar32 ch = *src;
        if (ch <= '&') [[unlikely]] {

Can't rely on C++20 features, we've settled on GNU++17.

You'll want to go with compilers builtins instead, and that's... slightly trickier than expected in C++, e.g.,

// Likely/Unlikely branch tagging  
#if defined(__cplusplus)  
// c.f., https://stackoverflow.com/a/43870188/1067003  
#       define likely(x)   __builtin_expect(static_cast<bool>((x)), 1)  
#       define unlikely(x) __builtin_expect(static_cast<bool>((x)), 0)  
#else  
#       define likely(x)   __builtin_expect(!!(x), 1)  
#       define unlikely(x) __builtin_expect(!!(x), 0)  
#endif

NiLuJe

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @bbshelper)

bbshelper · 2023-09-26T02:20:38Z

Ah I thought I replied to your comments but it seems it wasn't sent.

Regarding [[unlikely]] attribute, it's actually supported since GCC 9. Can I be lazy here? :p

NiLuJe · 2023-09-26T04:03:50Z

Regarding [[unlikely]] attribute, it's actually supported since GCC 9. Can I be lazy here? :p

My understanding is that this is conditional on std being c++20 or gnu++20, not gnu++17 (which is what we enforce).

(I haven't double-checked if that's actually what happens in practice, but that's what I get out of the GCC C++ support table).

NiLuJe · 2023-09-26T04:23:25Z

I haven't double-checked

__has_cpp_attribute seems to imply yes, at least on current Clang & GCC versions.

So, if it so holds true with the NDK's clang, good enough for me ;).

bbshelper · 2023-09-27T13:00:22Z

So, if it so holds true with the NDK's clang, good enough for me ;).

Haven't tried Android build recently. What's the NDK version?

NiLuJe · 2023-09-27T16:41:13Z

What's the NDK version

r23c, IIRC.

poire-z · 2023-09-29T00:47:05Z

Your code is obviously clearer and easier to read.

But comparing with diff or side by side, it's not obvious to see if it's strictly equivalent, so we'll have to trust you :)
We have to be 100% sure it's strictly equivalent, otherwise if chars indices shifts can happen, users' past highlights/xpointers may be shifted too and no longer point to the same char - and this would be a big issue :/

Btw, about "unlikely": wouldn't it just be a matter of inverting the tests ? if (the likely stuff) { ...} else { the unlikely stuff } ? Will the compiler ensure our test as-is if we don't use likely/unlikely ?

bbshelper · 2023-09-29T16:39:14Z

What's the NDK version

r23c, IIRC.

Just tried that and it's supported. Hooray!

bbshelper · 2023-09-29T17:06:16Z

But comparing with diff or side by side, it's not obvious to see if it's strictly equivalent, so we'll have to trust you :)

I think you can safely skip the & part, because I only refactored that to a function to make the code clearer. There is intentionally no change besides index to pointer conversion.

We have to be 100% sure it's strictly equivalent, otherwise if chars indices shifts can happen, users' past highlights/xpointers may be shifted too and no longer point to the same char - and this would be a big issue :/

Well that's tricky :) Before commit, I've tested against 100 largest epubs from Project Gutenburg and the result is the same between the old and the new, but they might be too "regular" to be considered enough. I've also written some simple tests but definitely haven't covered all corner cases. I can try fuzzing if you feel the need.

Btw, about "unlikely": wouldn't it just be a matter of inverting the tests ? if (the likely stuff) { ...} else { the unlikely stuff } ? Will the compiler ensure our test as-is if we don't use likely/unlikely ?

Assembly code is architecture dependent and doesn't have a strict mapping to C expressions/statements. With optimizations on, compilers are free to rearrange the assembly. [[unlikely]] is just a hint to compilers to favor the likely branch. I added it here because it did show some (minor) improvement in my profiling.

poire-z · 2023-09-29T17:30:30Z

Well that's tricky :) Before commit, I've tested against 100 largest epubs from Project Gutenburg and the result is the same between the old and the new, but they might be too "regular" to be considered enough. I've also written some simple tests but definitely haven't covered all corner cases. I can try fuzzing if you feel the need.

I would trust the developper's own feeling about it :) How he feels, whether he rewrote it from scratch, or if he tried to follow each little branch and condition to have it strictly equivalent while rewriting it :)
If you don't have a good feeling about the strict equivalence, yes, more tests are welcome (even more if you have experience with fuzzing and it does not cost you too much time), it's something quite serious if we break highlights.
(Also dunno about PRE, it may be more rare in EPUBs, but it is as serious. The attribute codepath too.)

If you don't feel confident, the other solution (which would be the one I would have taken :) not being confident by nature :) would be to keep the old PreProcessXmlString as-is, add a new PreProcessXMLStringV2, and make the use of one or the other depending on a new bump of DOM_VERSION_CURRENT - see top of lvtinydom.cpp for when I needed to do that.

Do a quick check before comparing `ch` to '\n', '\r' and '\t' in most cases.

Don't write to `lch` if it's not pre.

bbshelper · 2023-10-11T14:37:18Z

Well, fuzzing quickly showed many discrepancies, and it turned out that my previous commit is beyond fix. I experimented a lot and decided to commit a series of small changes, hopefully both easy to review and to reason about functional equivalence. Feel free to squash them if you prefer that way.

The original code has quirks in handling malformed input (i.e. random combinations of & \n\r). They are kept as is for compatibility.

poire-z · 2023-10-13T14:25:12Z

The original code has quirks in handling malformed input (i.e. random combinations of & \n\r). They are kept as is for compatibility.

Yes, we should support bad input, just not assume we'll always get clean and properly encoded stuff :)

I experimented a lot and decided to commit a series of small changes, hopefully both easy to review and to reason about functional equivalence.

Thank you for taking the time to do that :)

About the first commit replace indices with pointers, I'm not friend with pointer arithmetic, so I'll let @NiLuJe who eats that for breakfast :) review its equivalence.

Fine with the other commits.
(Dunno about goto: next, @NiLuJe : are we fine with using goto ? :)

I guess the point of the template<bool pre, bool ampersand> stuff is to actually creates 2x2 = 4 compiled functions, with each the pre and ampersand branches kept or removed depending on the pre/ampersand given, right?
Feels a bit overkill :) and a bit less easy to grasp may be (although the old code was really hard, so this might be easier).
Does this trick actually results in noticable improvements in your benchmarks ?

No need to also include cdata in the template?

Feel free to squash them if you prefer that way.

I guess I'll prefer if we keep them.

Well, fuzzing quickly showed many discrepancies

So, I guess you did run fuzzing with this new code, and it's all 100% equivalent - and your author feeling is confident?
No need to additionally put that in a PreProcessXMLStringV2 depending on a dom_version like I mentionned?

NiLuJe · 2023-10-13T15:39:01Z

About the first commit replace indices with pointers, I'm not friend with pointer arithmetic, so I'll let @NiLuJe who eats that for breakfast :) review its equivalence.

Looks sound at a glance, yeah ;).

(Dunno about goto: next, @NiLuJe : are we fine with using goto ? :)

Yeah! \m/

bbshelper · 2023-10-14T16:55:23Z

I guess the point of the template<bool pre, bool ampersand> stuff is to actually creates 2x2 = 4 compiled functions, with each the pre and ampersand branches kept or removed depending on the pre/ampersand given, right?

Yes

Feels a bit overkill :) and a bit less easy to grasp may be (although the old code was really hard, so this might be easier). Does this trick actually results in noticable improvements in your benchmarks ?

Yes, in my test, they are estimated to use ~6% and ~13% fewer cycles than their respective parent commits.

No need to also include cdata in the template?

cdata is only checked when ch == '&' so I expect it to be of little impact.

So, I guess you did run fuzzing with this new code, and it's all 100% equivalent - and your author feeling is confident? No need to additionally put that in a PreProcessXMLStringV2 depending on a dom_version like I mentionned?

Yes. I believe every change on this branch can be proven to be equivalent. Fuzzing really helped during the process, as some corner cases are hard to reason about, and even harder to invent.

poire-z · 2023-10-20T07:18:18Z

Gave you one week for afterthoughts :)
Still confident ? No need for PreProcessXMLStringV1/PreProcessXMLStringV2 ?

If ok, can you add in the message on the first commit First commit of a few aimed at optimizing PreProcessXMLString. so we know it and its buddies are not standalone, and update the timestamps so they look chronologically ordered in git log (I usually use git rebase --force-rebase --exec "sleep 2" --exec "git commit --amend --no-edit --date=now" master for that.)

bbshelper · 2023-10-21T17:32:10Z

Gave you one week for afterthoughts :) Still confident ? No need for PreProcessXMLStringV1/PreProcessXMLStringV2 ?

Yes. I tried really hard, just to eliminate that need :)

If ok, can you add in the message on the first commit First commit of a few aimed at optimizing PreProcessXMLString. so we know it and its buddies are not standalone, and update the timestamps so they look chronologically ordered in git log (I usually use git rebase --force-rebase --exec "sleep 2" --exec "git commit --amend --no-edit --date=now" master for that.)

How about a merge commit?

poire-z · 2023-10-21T19:07:36Z

How about a merge commit?

Wouldn't that just add a 7th commit, while keeping your 6 associated commits with their original timestamps?

(I could Squash and merge and get a single commit, and just link to here for the details. But dunno, it might be nice to be able to see each of them when working with git, and checkout to an individual commit.)

NiLuJe reviewed Sep 25, 2023

View reviewed changes

bbshelper force-pushed the perf-preprocess-xml branch from e5281de to ae73f23 Compare September 26, 2023 00:47

NiLuJe reviewed Sep 26, 2023

View reviewed changes

bbshelper added 6 commits October 11, 2023 18:53

PreProcessXmlString(): replace indices with pointers

6ed121e

PreProcessXmlString(): reduce comparisons

bb83767

Do a quick check before comparing `ch` to '\n', '\r' and '\t' in most cases.

PreProcessXmlString(): refactor the loop to its own function

b2e3b4b

PreProcessXmlString(): reduce writes

ffec599

Don't write to `lch` if it's not pre.

PreProcessXmlString(): move space check into '&' check

5ef063e

PreProcessXmlString(): reduce state checks

c908af8

bbshelper force-pushed the perf-preprocess-xml branch from ae73f23 to c908af8 Compare October 11, 2023 14:22

poire-z merged commit ceb9603 into koreader:master Oct 22, 2023
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup PreProcessXmlString() #526

Speedup PreProcessXmlString() #526

bbshelper commented Sep 25, 2023 •

edited by Frenzie

Loading

NiLuJe left a comment

NiLuJe left a comment

bbshelper commented Sep 26, 2023

NiLuJe commented Sep 26, 2023 •

edited

Loading

NiLuJe commented Sep 26, 2023 •

edited

Loading

bbshelper commented Sep 27, 2023

NiLuJe commented Sep 27, 2023

poire-z commented Sep 29, 2023

bbshelper commented Sep 29, 2023 •

edited

Loading

bbshelper commented Sep 29, 2023

poire-z commented Sep 29, 2023

bbshelper commented Oct 11, 2023

poire-z commented Oct 13, 2023

NiLuJe commented Oct 13, 2023

bbshelper commented Oct 14, 2023

poire-z commented Oct 20, 2023

bbshelper commented Oct 21, 2023

poire-z commented Oct 21, 2023

Speedup PreProcessXmlString() #526

Speedup PreProcessXmlString() #526

Conversation

bbshelper commented Sep 25, 2023 • edited by Frenzie Loading

NiLuJe left a comment

Choose a reason for hiding this comment

NiLuJe left a comment

Choose a reason for hiding this comment

bbshelper commented Sep 26, 2023

NiLuJe commented Sep 26, 2023 • edited Loading

NiLuJe commented Sep 26, 2023 • edited Loading

bbshelper commented Sep 27, 2023

NiLuJe commented Sep 27, 2023

poire-z commented Sep 29, 2023

bbshelper commented Sep 29, 2023 • edited Loading

bbshelper commented Sep 29, 2023

poire-z commented Sep 29, 2023

bbshelper commented Oct 11, 2023

poire-z commented Oct 13, 2023

NiLuJe commented Oct 13, 2023

bbshelper commented Oct 14, 2023

poire-z commented Oct 20, 2023

bbshelper commented Oct 21, 2023

poire-z commented Oct 21, 2023

bbshelper commented Sep 25, 2023 •

edited by Frenzie

Loading

NiLuJe commented Sep 26, 2023 •

edited

Loading

NiLuJe commented Sep 26, 2023 •

edited

Loading

bbshelper commented Sep 29, 2023 •

edited

Loading