Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPUB: fix possible buffer overflow #572

Merged
merged 5 commits into from
Jun 27, 2024

Conversation

benoit-pierre
Copy link
Contributor

@benoit-pierre benoit-pierre commented Jun 23, 2024

Handle obfuscated font files smaller than 1040 bytes.


This change is Reviewable

@poire-z
Copy link
Contributor

poire-z commented Jun 23, 2024

Can we assume the same issue can happen with the function just above, AdobeDemanglingStream(), from which I copied most of the code to make IdpfDemanglingStream() and that a similar fix can be made here (even if we can't really test it) ?

@benoit-pierre benoit-pierre marked this pull request as draft June 23, 2024 07:54
@benoit-pierre
Copy link
Contributor Author

Yep. Will amend.

crengine/src/epubfmt.cpp Outdated Show resolved Hide resolved
@benoit-pierre
Copy link
Contributor Author

benoit-pierre commented Jun 23, 2024

And my code is buggy, I'm surprised it works… We want the start of the read, so _base->GetPos() must be called before _base->Read(…).

@benoit-pierre
Copy link
Contributor Author

And was the previous code for AdobeDemanglingStream correct? Do you have a file to test it @poire-z?

@benoit-pierre
Copy link
Contributor Author

I think AdobeDemanglingStream is correct and IdpfDemanglingStream is buggy:

                int keyPos = (i + pos) & 15;
                ((lUInt8*)buf)[i] ^= _key[keyPos];

vs

                int keyPos = (i + pos) % 20;
                ((lUInt8*)buf)[i+pos] ^= _key[keyPos];

We should be writing at buf[i] in both cases, no?

@benoit-pierre benoit-pierre force-pushed the pr/fix_crash branch 3 times, most recently from bb1a4ae to 23fd335 Compare June 23, 2024 09:25
@poire-z
Copy link
Contributor

poire-z commented Jun 23, 2024

Here are the few EPUBs with some META-INF/encryption.xml I may have used for testing:
[removed]
(Tell me when downloaded, so I can remove it.)

@poire-z
Copy link
Contributor

poire-z commented Jun 23, 2024

We should be writing at buf[i] in both cases, no?

Looks like so.
Dunno why I changed it. If it worked, it's because we most often (always?) get here with getPos() = 0 - or Way after 1024 or 1040 because each call reads a bigger chunk than that?

@benoit-pierre
Copy link
Contributor Author

benoit-pierre commented Jun 23, 2024

Here are the few EPUBs with some META-INF/encryption.xml I may have used for testing: [removed] (Tell me when downloaded, so I can remove it.)

Thanks.

(Because right now I can #if 0 that whole de-obfuscate loop and still open my file with no issue…)

@benoit-pierre
Copy link
Contributor Author

I get "unknown file format" errors from FT_New_Memory_Face when attempting to load the obfuscated fonts from your last 2 files and the EPUB from koreader/koreader#12071.

@poire-z
Copy link
Contributor

poire-z commented Jun 23, 2024

Before and after this PR ?
If already before, well, let's not worry :) If it works for all the other EPUBs, I guess the code is correct, and these 2 EPUBs have it wrong (have you tried to deobfuscate them to see if they are correct font files?)

@benoit-pierre
Copy link
Contributor Author

Before and after this PR ?

Yes.

Hexdumps of the first 4 bytes of the key and each obfuscated font:

  • file 1:

    • f953bee5 [IDPF key: “urn:uuid:b165936f-c791-4f75-a5f4-092a9d99dee6”]
    • 140bc589 [“OEBPS/font/TeXGyreSchola-Bold.otf”]
    • 14efb9bd [“OEBPS/font/TeXGyreSchola-Italic.otf”]
    • 1deeb9b9 [“OEBPS/font/TeXGyreSchola-Regular.otf”]
  • file 2:

    • 7b4517c5 [Adobe key: “urn:uuid:7b4517c5-ed5f-48d7-80dd-14c3e979a209”]
    • 96396e99 [“OEBPS/Fonts/Calibri-400-7b4517c5-ed5f-48d7-80dd-14c3e979a209.otf”]
    • 96d91ebd [“OEBPS/Fonts/Calibri-700-7b4517c5-ed5f-48d7-80dd-14c3e979a209.otf”]
    • b7f81eb9 [“OEBPS/Fonts/Zawgyi-One-400-7b4517c5-ed5f-48d7-80dd-14c3e979a209.otf”]
  • file 3:

    • 8ae82122 [IDPF key: “uuid:e24e71ff-d32e-40ab-b254-0e09f8a5e34d”]
    • 27932a5e [“OEBPS/Fonts/TimesNewRomanPSMT.ttf”]
    • 37d12a7a [“OEBPS/Fonts/Times-Italic.ttf”]
    • 3fbc6c49 [“OEBPS/Fonts/FZHTK--GBK1-0.TTF”]
    • 3fbd6e4e [“OEBPS/Fonts/FZXH1K--GBK1-0.TTF”]
    • 3fbe7a4e [“OEBPS/Fonts/FZSYK--GBK1-0.TTF”]
    • f655269e [“OEBPS/Fonts/FZXDXK--GBK1-0.ttf”]
    • fe55269e [“OEBPS/Fonts/FZFSK--GBK1-0.TTF”]
    • fe55269e [“OEBPS/Fonts/FZKTK--GBK1-0.TTF”]

The 4 fist bytes of each font should be the sfntVersion field: 0x00010000, or 0x4F54544F. But even with the same key, we somehow get a lot of variations.

@poire-z
Copy link
Contributor

poire-z commented Jun 23, 2024

Any idea what other reading softwares that supports obfuscated fonts (dunno which does, calibre?) do?

Necessary for cases where we can't be sure of the full size:
e.g. when the data is compressed with no information about
the unpacked size.
Fix `AdobeDemanglingStream` & `IdpfDemanglingStream` implementations.
The unobfuscated data of some obfuscated (possibly compressed)
fonts can sometime itself be a raw DEFLATE compressed stream.
@benoit-pierre
Copy link
Contributor Author

I looked at what happens with calibre, and the deDRM plugin has some extra code for obfuscated fonts: it turns out that the unobfuscated data can sometime itself be a raw DEFLATE compressed stream!

After implementing, all fonts for those 3 ebooks now load fine.

Before/after screenshots:

  • file 1:
    11_before
    12_after
  • file 2:
    21_before
    22_after
  • file 3:
    31_before
    32_after

@benoit-pierre benoit-pierre marked this pull request as ready for review June 24, 2024 16:04
Copy link
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!
(Trusting you on the uncompress code, not familiar with zlib.)

Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No comment, I'm not that familiar with zlib either; I assume this is close to some sample code (and if not I'm fine with it ;-).

@poire-z poire-z merged commit 84901aa into koreader:master Jun 27, 2024
1 check passed
@poire-z
Copy link
Contributor

poire-z commented Jun 27, 2024

@benoit-pierre : any idea why I don't get the embedded font for the text in blue ?
image
Didn't you get that ? Any other setting you had to change to get it better?

I need to remove font-weight: bold; in this snippet to get the Zawgyi-One font to be used, which you seem to not have needed to:

.Emphasis4 {
        font-style: italic;
        color: #2f5496;
        font-family: Zawgyi-One,sans-serif;
        font-weight: bold;
        font-size: 1.091em;
}

@benoit-pierre benoit-pierre deleted the pr/fix_crash branch June 27, 2024 20:34
@benoit-pierre
Copy link
Contributor Author

I don't have to do anything to get it: clean build, no custom settings, removed .sdr entry: blue Zawgyi-One.

@poire-z
Copy link
Contributor

poire-z commented Jun 27, 2024

OK.
So, it happens when I have font-family font for sans-serif checked (so, I guess you don't have it checked, right?)
image
and it's ok when I unchecked it:
image
If I recheck it, it's still ok... until the reload, and then it's back to not ok.

Feels like some hell I've already met...

@benoit-pierre
Copy link
Contributor Author

That font selection could use another sub-menu… It's annoying to use, since the page for the selected font is focused, I have to go back manually to the first one to get access to the 2 other submenus. As a new user, you would not even now they are available.

@benoit-pierre
Copy link
Contributor Author

Why is my system installed DejaVu Sans not available in the font family serif sub-menu?

@benoit-pierre
Copy link
Contributor Author

By default, everything in font family is unchecked.

@benoit-pierre
Copy link
Contributor Author

[…] since the page for the selected font is focused, I have to go back manually to the first one […]

And you can't do it fast because the page height changes, so the back button position changes…

@benoit-pierre
Copy link
Contributor Author

[…] since the page for the selected font is focused, I have to go back manually to the first one […]

And you can't do it fast because the page height changes, so the back button position changes…

Page up to the rescue! \o/

@benoit-pierre
Copy link
Contributor Author

Or page down, it's a wrap! Even faster.

@benoit-pierre
Copy link
Contributor Author

I still can't get into your state.

@Frenzie
Copy link
Member

Frenzie commented Jun 27, 2024

(Presumably Home/End should be make to work there.)

@poire-z
Copy link
Contributor

poire-z commented Jun 27, 2024

By default, everything in font family is unchecked.

Ok, so you're saying you had it unchecked, right? And so that's the reason it worked for you.

Why is my system installed DejaVu Sans not available in the font family serif sub-menu?

No idea, it should be there in Sans> and in Serif> as we don't distinguish sans from serif.

That font selection could use another sub-menu…

Well, it's not super essential. It's just mimicking the main font menu & reusing the code, that's good enough.

I still can't get into your state.

Even if you let it reload - or quit, rm cache, and restart?

@benoit-pierre
Copy link
Contributor Author

  • pristine state (no sdr), load ebook: correct blue font
  • quit, remove koreader-emulator-x86_64-pc-linux-gnu-debug/koreader/cache/, reload: still correct
  • toggle Sans-serif: FreeSans in font family: still correct
  • quit, remove cache, reload: still correct

@benoit-pierre
Copy link
Contributor Author

(Presumably Home/End should be make to work there.)

Nope.

@poire-z
Copy link
Contributor

poire-z commented Jun 27, 2024

toggle Sans-serif: FreeSans in font family: still correct
quit, remove cache, reload: still correct

Correct also for me. (Also correct with FreeSerif).
But not correct if I choose another font, DejaVu Sans or Gentium.

May be because the Free fonts have support for that script, and it's them that are used - and our western eyes are not educated enough to tell :)

@poire-z
Copy link
Contributor

poire-z commented Jun 28, 2024

I need to remove font-weight: bold; in this snippet to get the Zawgyi-One font to be used, which you seem to not have needed to:

.Emphasis4 {
        font-style: italic;
        color: #2f5496;
        font-family: Zawgyi-One,sans-serif;
        font-weight: bold;
        font-size: 1.091em;
}

So, not confirmed by printfs, but I believe that when the font you have associated to sans-serif has a bold variant (which is the case of my DejaVuSans, but not of our Free fonts), and because that Zawgyi-One is provided only in regular, that our found sans-serif bold ttf gets a bigger score than the Zawgyi-One (which would need to be synthecized bold), so it is used instead... :/

@poire-z
Copy link
Contributor

poire-z commented Jun 29, 2024

Possible fix for the above issue:

--- a/crengine/src/lvfntman.cpp
+++ b/crengine/src/lvfntman.cpp
@@ -7137,5 +7137,5 @@ int LVFontDef::CalcMatch( const LVFontDef & def, bool useBias ) const
         + (features_match * 1000)
         + (family_match   * 100)
-        + (typeface_match * 1000);
+        + (typeface_match * 10000);

 //    printf("### %s (%d) vs %s (%d): size=%d weight=%d italic=%d family=%d typeface=%d bias=%d => %d\n",
@@ -7387,6 +7387,9 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
         else
             def.setTypeFace(lString8::empty_str);
+        int typeface_match = false;
         for (i=0; i<_instance_list.length(); i++) {
             int match = _instance_list[i]->_def.CalcMatch( def, useBias );
+            if ( match >= 2560000 )
+                typeface_match = true;
             match = match * 256 + ordering_weight;
             if (match > best_instance_match) {
@@ -7397,4 +7400,6 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
         for (i=0; i<_registered_list.length(); i++) {
             int match = _registered_list[i]->_def.CalcMatch( def, useBias );
+            if ( match >= 2560000 )
+                typeface_match = true;
             match = match * 256 + ordering_weight;
             if (match > best_match) {
@@ -7403,7 +7408,15 @@ LVFontCacheItem * LVFontCache::find( const LVFontDef * fntdef, bool useBias )
             }
         }
+        if ( typeface_match ) {
+            // No need to check next font names (which may get a better score
+            // if the first fonts do not have the requested italic or weight
+            // variants, so let's avoid that too).
+            break;
+        }
     }
     if (best_index<0)
         return NULL;
+    // if (best_instance_match >= best_match) printf("Find '%s' best instance: %s\n", fntdef->getTypeFace().c_str(), _instance_list[best_instance_index]->_def.getTypeFace().c_str());
+    // else printf("Find '%s' best registered: %s\n", fntdef->getTypeFace().c_str(), _registered_list[best_index]->_def.getTypeFace().c_str());
     if (best_instance_match >= best_match)
         return _instance_list[best_instance_index];

I'll keep that for later this summer, as it's a bit risky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants