Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix accidentally quadratic JPEG parsing #215

Merged
merged 9 commits into from
Dec 4, 2022

Conversation

nh2
Copy link
Contributor

@nh2 nh2 commented Dec 4, 2022

Makes JPEG parsing 1000x faster for large pictures (I tried with a 110 MB one).

  • I found this speedup accidentally while fixing various incorrect uses of the binary package's Get parser API, which caused the Get based functions to not correctly maintain the parser offset (how many bytes were consumed).
  • Also adds an incremental parsing API.

Please see the individual commit messages, especially the one of the commit titled Jpg: Fix quadratic JPEG parsing, for full details.

Other preparatory refactoring commits are also included.

Copying the main commit's initial message here for easy reading:

  • Data.ByteString.Lazy's index is O(chunks), not O(1).
    The default chunk size is 32 KB.
    Thus, calling that L.index in a +1 loop, as extractScanContent did, caused accidentally quadratic runtime.
  • This could be easily observed by putting a trace of n into the loop, and observing how printouts get slower over time.
  • Further, the (ab)use of getRemainingLazyBytes from binary's Data.Binary.Get module resulted in weird semi-lazy behaviour that was both incorrect (not setting the parser offset correctly, generating misleading error offsets and messages) and slow (encouraging this loop over lazy ByteStrings instead of just using the Get parser as intended).
  • I thoroughly documented the quirks of the existing parseFrames function in its haddocks, and subsequently renamed it to parseFramesSemiLazy.
  • This commit fixes the above issues by replacing extractScanContent by a normal Get based parser.
  • Two variants of that fixed parser are added: parseECS and parseECS_simple. (ECS is the proper name for the "scan content" bytes according to the spec.)
    The simple one is the straightforward translation, the other one a higher-performance implementation that is only ~20% slower than a non-lazy ByteString based loop.
    Both variants are faster than the original implementation because they are linear, not accidentally quadratic.
  • parseFrames is replaced by a strict implementation that uses the new, correct parseECS.
    Compared to the previous parseFrames (and current parseFramesSemiLazy) it also fixes its remaining issues I found:
    • It did not correctly check the 0xFF value of the frame marker, instead just skipping 1 Byte.
    • It ignored any unparseable garbage at the end of the JPEG, even including truncated (and thus not properly framed) data.
      I added both as TODOs for the retained legacy implementation. parseFrames was previously unexported from this .Internal module, so this rename has no backwards incompatibility implications.
  • Backwards compatibility is maintained because the instance Binary JpgImage continues to use parseFramesSemiLazy, for which I've taken care to preserve its existing quirky laziness
    semantics.
    I kept it this way for now because due to lack of comments it is unclear to me whether this quirky lazy behaviour was intended, or a complete accident.

Beyond that:

  • Further exports are added.
    This allows users to write their own equivalent of parseFrames, for example one that that searches for the first scan header to determine the image dimensions without parsing the whole JPG.
  • These fixes make the library compatible with sane IO approaches, such as streaming parsing with binary-conduit or other uses of binary's incremental parser input interface.
  • An equivalence test for parseECS with parseECS_simple is added.

nh2 added 9 commits November 14, 2022 17:31
They are useful for applications that just want to search for
these signatures, or implement PNG logic slighly differently.
The cabal file requires a much larger `binary` version than
this backwards compat code is for.
* remove `go` boilerplate
  (`f ... = go where go = ...` without any recursive call to `go`)
* simplify `parse*` functions by returning `Maybe`,
  instead of taking a list to which the result is prepended
  (that conflated the concerns of parsing with using the parsed results)
* make it more explicit that for some parsing errors, JuicyPixels just skips
  he corresponding `JpgFrame`s
* improve some variable names
Cabal 3.2 complains:

    Warning: 'ghc-prof-options: -prof' is not necessary and will lead to problems
    when used on a library. Use the configure flag --enable-library-profiling
Switch to Generics for some existing NFData instances.
Also add `Eq` for result comparisons.
This makes reading large JPEGs (e.g. 110 MB) 1000x faster.

* `Data.ByteString.Lazy.index` is O(chunks), not O(1).
  The default chunk size is 32 KB.
  Thus, calling that `L.index` in a `+1` loop, as `extractScanContent` did,
  caused accidentally quadratic runtime.
* This could be easily observed by putting a `trace` of `n` into the
  loop, and observing how printouts get slower over time.
* Further, the (ab)use of `getRemainingLazyBytes` from `binary`'s
  `Data.Binary.Get` module resulted in weird semi-lazy behaviour
  that was both incorrect (not setting the parser offset correctly,
  generating misleading error offsets and messages) and slow
  (encouraging this loop over lazy ByteStrings instead of just using
  the `Get` parser as intended).
* I thoroughly documented the quirks of the existing `parseFrames`
  function in its haddocks, and subsequently renamed it to
  `parseFramesSemiLazy`.
* This commit fixes the above issues by replacing `extractScanContent`
  by a normal `Get` based parser.
* Two variants of that fixed parser are added: `parseECS` and `parseECS_simple`.
  (ECS is the proper name for the "scan content" bytes according to the spec.)
  The simple one is the straightforward translation, the other one
  a higher-performance implementation that is only ~20% slower
  than a non-lazy ByteString based loop.
  Both variants are faster than the original implementation because
  they are linear, not accidentally quadratic.
* `parseFrames` is replaced by a strict implementation that uses
  the new, correct `parseECS`.
  Compared to the previous `parseFrames` (and current `parseFramesSemiLazy`)
  it also fixes its remaining issues I found:
  * It did not correctly check the 0xFF value of the frame marker,
    instead just skipping 1 Byte.
  * It ignored any unparseable garbage at the end of the JPEG,
    even including truncated (and thus not properly framed) data.
  I added both as TODOs for the retained legacy implementation.
  `parseFrames` was previously unexported from this `.Internal` module,
  so this rename has no backwards incompatibility implications.
* Backwards compatibility is maintained because the
  `instance Binary JpgImage` continues to use `parseFramesSemiLazy`,
  for which I've taken care to preserve its existing quirky laziness
  semantics.
  I kept it this way for now because due to lack of comments it is
  unclear to me whether this quirky lazy behaviour was intended,
  or a complete accident.

Beyond that:

* Further exports are added.
  This allows users to write their own equivalent of `parseFrames`,
  for example one that that searches for the first scan header
  to determine the image dimensions without parsing the whole JPG.
* These fixes make the library compatible with sane IO approaches,
  such as streaming parsing with `binary-conduit` or other uses
  of `binary`'s incremental parser input interface.
* An equivalence test for `parseECS` with `parseECS_simple` is added.
@Twinside
Copy link
Owner

Twinside commented Dec 4, 2022

I'll trust you on that, I don't have the tooling to check it

@Twinside Twinside merged commit 69e7aef into Twinside:master Dec 4, 2022
@nh2
Copy link
Contributor Author

nh2 commented Dec 4, 2022

@Twinside Here's an easy way to measure, e.g. in GHCi:

import qualified Data.ByteString.Lazy as L
import Data.Binary
import Data.Binary.Get
import qualified Codec.Picture.Jpg.Internal.Types as JPG
:set -XTypeApplications
:set +s

L.readFile "large110MB.jpg" >>= \bs -> return $ case runGetOrFail (get @JPG.JpgImage) bs of { Left (_rest, offset, err) -> Left ("ERROR", offset, err) ; Right (_rest, offset, jpgImage) -> Right (offset, jpgImage `deepseq` ()) }

The previous implementation prints (by means of :set +s which enables timing):

(806.22 secs, 121,282,712 bytes)

Doing the same with the new implementation (parseECS) from this PR prints:

(0.40 secs, 122,191,224 bytes)

So for this case, it is 2000x faster.

For parseECS_simple, I get:

(0.88 secs, 4,729,102,080 bytes)

This is still quite fast, but 2.5x slower than parseECS and doing 20x more allocation.


For the claim

only ~20% slower than a non-lazy ByteString based loop

I simply made a copy of the existing extractScanContent and switched the types from .Lazy to normal ByteString, like this:

extractScanContentStrict :: L.ByteString -> (L.ByteString, L.ByteString)
extractScanContentStrict str_lazy = aux 0
  where !maxi = fromIntegral $ B.length str - 1
        !str = L.toStrict str_lazy

        aux !n | n >= maxi = (L.fromStrict str, L.empty)
               | v == 0xFF && vNext /= 0 && not isReset = (let (a, b) = B.splitAt n str in (L.fromStrict a, L.fromStrict b))
               | otherwise = aux (n + 1)
             where v = {- (if n `mod` 1000000 == 0 then trace (" n = " ++ show n) else id) -} str `B.index` n
                   vNext = str `B.index` (n + 1)
                   isReset = 0xD0 <= vNext && vNext <= 0xD7

For that I obtained:

(0.35 secs, 235,527,808 bytes)

@nh2
Copy link
Contributor Author

nh2 commented Dec 4, 2022

In the commit I made a claim that getRemainingLazyByteString does not work well with binary's incremental input interface. That can be checked with these commands, using the binary-conduit as an example:

import Conduit
import Data.Conduit.Serialization.Binary -- from `binary-conduit`

-- Only reads a small part of the file:
runConduitRes $ sourceFile "bigfile.bin" .| sinkGet ((\a rest -> a) <$> getWord8 <*> getWord8)

-- This reads the entire file via the conduit (which is not lazy IO):
runConduitRes $ sourceFile "bigfile.bin" .| sinkGet ((\a rest -> a) <$> getWord8 <*> getRemainingLazyByteString)

@nh2
Copy link
Contributor Author

nh2 commented Dec 4, 2022

@Twinside

There are other usages of L.index and Lb.index in JuicyPixels that might also be quadratic and that I didn't fix.

For example:

git grep '\.index' | grep -v '\bB\.index\b'
src/Codec/Picture/HDR.hs:  where at n = L.index str . fromIntegral $ idx + n
src/Codec/Picture/HDR.hs:            | otherwise = pure $ L.index inputData (fromIntegral idx)
src/Codec/Picture/Png.hs:      PixelRGBA8 r g b $ Lb.index transpBuffer (fromIntegral ix)

@nh2
Copy link
Contributor Author

nh2 commented Dec 4, 2022

@Twinside It would be nice if you could tell me if the semi-lazy behaviour of JPEG parsing was accidental or intentional.

If it was accidental, we could consider it a bug, and perhaps switch the implementation of the JPEG parser away from the quirky to the strict one (which so far I haven't done).

@NorfairKing
Copy link

This PR seems to introduce a bug.

This code:

 -- Load the image
  dynamicImage <- decodeImage contents
  pure $ imageToJpg 100 dynamicImage

Turns this image:
marvin
Into this image
signal-2022-12-06-110116_002

@nh2
Copy link
Contributor Author

nh2 commented Dec 6, 2022

I will investigate.

nh2 added a commit to nh2/Juicy.Pixels that referenced this pull request Dec 7, 2022
…n value.

Fixes regression reported in
Twinside#215 (comment)

Also fix follow-up error that `parseFrames`, `parseFramesSemiLazy` and `parseToFirstFrameHeader`
incorrectly tried to parse beyond the `JpgEndOfImage`.

Jpg: Fix `parseToFirstFrameHeader`
nh2 added a commit to nh2/Juicy.Pixels that referenced this pull request Dec 7, 2022
@nh2 nh2 mentioned this pull request Dec 7, 2022
@nh2
Copy link
Contributor Author

nh2 commented Dec 7, 2022

This should fix it: PR #216

@Twinside
Copy link
Owner

Twinside commented Dec 7, 2022

merged! I'll push an update on hackage "soon©"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants