Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup PreProcessXmlString() #526

Merged
merged 6 commits into from
Oct 22, 2023
Merged

Conversation

bbshelper
Copy link
Contributor

@bbshelper bbshelper commented Sep 25, 2023

There are more than necessary comparions made on every char. By

  1. do a quick check before proceeding to more complicated paths
  2. use pointers instead of indices
    we see ~50% fewer time spent in the function.

This change is Reviewable

Copy link
Member

@NiLuJe NiLuJe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the logic, so I'll let @poire-z comment, but I'm not seeing anything too egregious, and the reasoning is definitely sound ;).

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @bbshelper)


crengine/src/lvxml.cpp line 5436 at r1 (raw file):

static void decode_character(const lChar32 *end, const lChar32 *enc_table,
        const lChar32 *&src, lChar32 *&dst) {

Style nit: there's no real house style because it's a mess already, but I'm not a fan of the indent there ;).


crengine/src/lvxml.cpp line 5437 at r1 (raw file):

static void decode_character(const lChar32 *end, const lChar32 *enc_table,
        const lChar32 *&src, lChar32 *&dst) {
    int state = 1, ch;

Shouldn't ch be a lChar32?

(Linters probably won't like this type of declarations anyway).


crengine/src/lvxml.cpp line 5536 at r1 (raw file):

    for (const lChar32 *src = str; src < end; ++src) {
        lChar32 ch = *src;
        if (ch <= '&') [[unlikely]] {

Can't rely on C++20 features, we've settled on GNU++17.

You'll want to go with compilers builtins instead, and that's... slightly trickier than expected in C++, e.g.,

// Likely/Unlikely branch tagging  
#if defined(__cplusplus)  
// c.f., https://stackoverflow.com/a/43870188/1067003  
#       define likely(x)   __builtin_expect(static_cast<bool>((x)), 1)  
#       define unlikely(x) __builtin_expect(static_cast<bool>((x)), 0)  
#else  
#       define likely(x)   __builtin_expect(!!(x), 1)  
#       define unlikely(x) __builtin_expect(!!(x), 0)  
#endif

Copy link
Member

@NiLuJe NiLuJe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @bbshelper)

@bbshelper
Copy link
Contributor Author

Ah I thought I replied to your comments but it seems it wasn't sent.

Regarding [[unlikely]] attribute, it's actually supported since GCC 9. Can I be lazy here? :p

@NiLuJe
Copy link
Member

NiLuJe commented Sep 26, 2023

Regarding [[unlikely]] attribute, it's actually supported since GCC 9. Can I be lazy here? :p

My understanding is that this is conditional on std being c++20 or gnu++20, not gnu++17 (which is what we enforce).

(I haven't double-checked if that's actually what happens in practice, but that's what I get out of the GCC C++ support table).

@NiLuJe
Copy link
Member

NiLuJe commented Sep 26, 2023

I haven't double-checked

__has_cpp_attribute seems to imply yes, at least on current Clang & GCC versions.

So, if it so holds true with the NDK's clang, good enough for me ;).

@bbshelper
Copy link
Contributor Author

So, if it so holds true with the NDK's clang, good enough for me ;).

Haven't tried Android build recently. What's the NDK version?

@NiLuJe
Copy link
Member

NiLuJe commented Sep 27, 2023

What's the NDK version

r23c, IIRC.

@poire-z
Copy link
Contributor

poire-z commented Sep 29, 2023

Your code is obviously clearer and easier to read.

But comparing with diff or side by side, it's not obvious to see if it's strictly equivalent, so we'll have to trust you :)
We have to be 100% sure it's strictly equivalent, otherwise if chars indices shifts can happen, users' past highlights/xpointers may be shifted too and no longer point to the same char - and this would be a big issue :/

Btw, about "unlikely": wouldn't it just be a matter of inverting the tests ? if (the likely stuff) { ...} else { the unlikely stuff } ? Will the compiler ensure our test as-is if we don't use likely/unlikely ?

@bbshelper
Copy link
Contributor Author

bbshelper commented Sep 29, 2023

What's the NDK version

r23c, IIRC.

Just tried that and it's supported. Hooray!

@bbshelper
Copy link
Contributor Author

But comparing with diff or side by side, it's not obvious to see if it's strictly equivalent, so we'll have to trust you :)

I think you can safely skip the & part, because I only refactored that to a function to make the code clearer. There is intentionally no change besides index to pointer conversion.

We have to be 100% sure it's strictly equivalent, otherwise if chars indices shifts can happen, users' past highlights/xpointers may be shifted too and no longer point to the same char - and this would be a big issue :/

Well that's tricky :) Before commit, I've tested against 100 largest epubs from Project Gutenburg and the result is the same between the old and the new, but they might be too "regular" to be considered enough. I've also written some simple tests but definitely haven't covered all corner cases. I can try fuzzing if you feel the need.

Btw, about "unlikely": wouldn't it just be a matter of inverting the tests ? if (the likely stuff) { ...} else { the unlikely stuff } ? Will the compiler ensure our test as-is if we don't use likely/unlikely ?

Assembly code is architecture dependent and doesn't have a strict mapping to C expressions/statements. With optimizations on, compilers are free to rearrange the assembly. [[unlikely]] is just a hint to compilers to favor the likely branch. I added it here because it did show some (minor) improvement in my profiling.

@poire-z
Copy link
Contributor

poire-z commented Sep 29, 2023

Well that's tricky :) Before commit, I've tested against 100 largest epubs from Project Gutenburg and the result is the same between the old and the new, but they might be too "regular" to be considered enough. I've also written some simple tests but definitely haven't covered all corner cases. I can try fuzzing if you feel the need.

I would trust the developper's own feeling about it :) How he feels, whether he rewrote it from scratch, or if he tried to follow each little branch and condition to have it strictly equivalent while rewriting it :)
If you don't have a good feeling about the strict equivalence, yes, more tests are welcome (even more if you have experience with fuzzing and it does not cost you too much time), it's something quite serious if we break highlights.
(Also dunno about PRE, it may be more rare in EPUBs, but it is as serious. The attribute codepath too.)

If you don't feel confident, the other solution (which would be the one I would have taken :) not being confident by nature :) would be to keep the old PreProcessXmlString as-is, add a new PreProcessXMLStringV2, and make the use of one or the other depending on a new bump of DOM_VERSION_CURRENT - see top of lvtinydom.cpp for when I needed to do that.

@bbshelper
Copy link
Contributor Author

Well, fuzzing quickly showed many discrepancies, and it turned out that my previous commit is beyond fix. I experimented a lot and decided to commit a series of small changes, hopefully both easy to review and to reason about functional equivalence. Feel free to squash them if you prefer that way.

The original code has quirks in handling malformed input (i.e. random combinations of & \n\r). They are kept as is for compatibility.

@poire-z
Copy link
Contributor

poire-z commented Oct 13, 2023

The original code has quirks in handling malformed input (i.e. random combinations of & \n\r). They are kept as is for compatibility.

Yes, we should support bad input, just not assume we'll always get clean and properly encoded stuff :)

I experimented a lot and decided to commit a series of small changes, hopefully both easy to review and to reason about functional equivalence.

Thank you for taking the time to do that :)

About the first commit replace indices with pointers, I'm not friend with pointer arithmetic, so I'll let @NiLuJe who eats that for breakfast :) review its equivalence.

Fine with the other commits.
(Dunno about goto: next, @NiLuJe : are we fine with using goto ? :)

I guess the point of the template<bool pre, bool ampersand> stuff is to actually creates 2x2 = 4 compiled functions, with each the pre and ampersand branches kept or removed depending on the pre/ampersand given, right?
Feels a bit overkill :) and a bit less easy to grasp may be (although the old code was really hard, so this might be easier).
Does this trick actually results in noticable improvements in your benchmarks ?

No need to also include cdata in the template?

Feel free to squash them if you prefer that way.

I guess I'll prefer if we keep them.

Well, fuzzing quickly showed many discrepancies

So, I guess you did run fuzzing with this new code, and it's all 100% equivalent - and your author feeling is confident?
No need to additionally put that in a PreProcessXMLStringV2 depending on a dom_version like I mentionned?

@NiLuJe
Copy link
Member

NiLuJe commented Oct 13, 2023

About the first commit replace indices with pointers, I'm not friend with pointer arithmetic, so I'll let @NiLuJe who eats that for breakfast :) review its equivalence.

Looks sound at a glance, yeah ;).

(Dunno about goto: next, @NiLuJe : are we fine with using goto ? :)

Yeah! \m/

@bbshelper
Copy link
Contributor Author

I guess the point of the template<bool pre, bool ampersand> stuff is to actually creates 2x2 = 4 compiled functions, with each the pre and ampersand branches kept or removed depending on the pre/ampersand given, right?

Yes

Feels a bit overkill :) and a bit less easy to grasp may be (although the old code was really hard, so this might be easier). Does this trick actually results in noticable improvements in your benchmarks ?

Yes, in my test, they are estimated to use ~6% and ~13% fewer cycles than their respective parent commits.

No need to also include cdata in the template?

cdata is only checked when ch == '&' so I expect it to be of little impact.

So, I guess you did run fuzzing with this new code, and it's all 100% equivalent - and your author feeling is confident? No need to additionally put that in a PreProcessXMLStringV2 depending on a dom_version like I mentionned?

Yes. I believe every change on this branch can be proven to be equivalent. Fuzzing really helped during the process, as some corner cases are hard to reason about, and even harder to invent.

@poire-z
Copy link
Contributor

poire-z commented Oct 20, 2023

Gave you one week for afterthoughts :)
Still confident ? No need for PreProcessXMLStringV1/PreProcessXMLStringV2 ?

If ok, can you add in the message on the first commit First commit of a few aimed at optimizing PreProcessXMLString. so we know it and its buddies are not standalone, and update the timestamps so they look chronologically ordered in git log (I usually use git rebase --force-rebase --exec "sleep 2" --exec "git commit --amend --no-edit --date=now" master for that.)

@bbshelper
Copy link
Contributor Author

Gave you one week for afterthoughts :) Still confident ? No need for PreProcessXMLStringV1/PreProcessXMLStringV2 ?

Yes. I tried really hard, just to eliminate that need :)

If ok, can you add in the message on the first commit First commit of a few aimed at optimizing PreProcessXMLString. so we know it and its buddies are not standalone, and update the timestamps so they look chronologically ordered in git log (I usually use git rebase --force-rebase --exec "sleep 2" --exec "git commit --amend --no-edit --date=now" master for that.)

How about a merge commit?

@poire-z
Copy link
Contributor

poire-z commented Oct 21, 2023

How about a merge commit?

Wouldn't that just add a 7th commit, while keeping your 6 associated commits with their original timestamps?

(I could Squash and merge and get a single commit, and just link to here for the details. But dunno, it might be nice to be able to see each of them when working with git, and checkout to an individual commit.)

@poire-z poire-z merged commit ceb9603 into koreader:master Oct 22, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants