Skip to content

Exploring novel algorithms to decode the PNG `sub` filter 32 bytes at a time using AVX2 and vectorized prefix-sum

License

Notifications You must be signed in to change notification settings

connorskees/simd-png

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project explores an implementation of the PNG filters using AVX2. Existing implementations of these filters in PNG decoders like libpng are only capable of decoding up to 4 bytes at a time for 8-bit RGBA pixels.

The up filter is simple enough that it is trivially auto-vectorized by modern compilers; however, the sub, average, and paeth filters are sufficiently complex as to make it very unlikely for auto-vectorization to occur.

By using a modified implementation of vectorized prefix-sum, we are able to decode the sub filter 16 bytes at a time for 8-bit RGBA pixels -- using the full width of the MMX registers, rather than being bound by the bytes-per-pixel.

Based on benchmarks done on a dedicated Linux server, this implementation of the sub filter is around 20-25% faster than existing SSE4.1 implementations.

Below are the timings for a buffer of size 2**20 (1,048,576 bytes):

RUSTFLAGS='-C target-cpu=native' cargo bench

running 3 tests
test tests::bench_baseline_memcpy               ... bench:      62,683 ns/iter (+/- 18,025)
test tests::bench_sub_sse2                      ... bench:      86,519 ns/iter (+/- 2,527)
test tests::bench_sub_sse_prefix_sum_no_extract ... bench:      69,864 ns/iter (+/- 1,696)

A similar performance win of 20-25% can be seen with buffers as small as 2**9 (512) and for values that are not powers of 2.

These benchmarks live within this repository and can be reproduced by running the above command.

The algorithm itself and the intuition behind it are described in the associated blog post.

About

Exploring novel algorithms to decode the PNG `sub` filter 32 bytes at a time using AVX2 and vectorized prefix-sum

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published