Skip to content

Commit

Permalink
Small additional speedup for deflate_quick
Browse files Browse the repository at this point in the history
Before and after benchmarks on Intel x86_64, compiled with
`RUSTFLAGS="-Ctarget-cpu=native -Cllvm-args=-enable-dfa-jump-thread" cargo build --release`:

```
Benchmark 1 (60 runs): ./compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.1ms ± 1.93ms    81.4ms … 93.9ms          6 (10%)        0%
  peak_rss           26.7MB ± 73.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          303M  ± 1.12M      302M  …  309M           3 ( 5%)        0%
  instructions        655M  ±  265       655M  …  655M           1 ( 2%)        0%
  cache_references    404K  ± 12.6K      396K  …  468K           7 (12%)        0%
  cache_misses        302K  ± 6.45K      284K  …  321K           6 (10%)        0%
  branch_misses      3.15M  ± 6.83K     3.14M  … 3.17M           0 ( 0%)        0%
Benchmark 2 (62 runs): ./target/release/examples/compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.9ms ±  930us    80.2ms … 84.3ms          0 ( 0%)        ⚡-  2.6% ±  0.6%
  peak_rss           26.7MB ± 65.8KB    26.6MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          298M  ±  656K      297M  …  300M           0 ( 0%)        ⚡-  1.7% ±  0.1%
  instructions        645M  ±  255       645M  …  645M           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    400K  ± 3.70K      397K  …  417K           4 ( 6%)          -  0.9% ±  0.8%
  cache_misses        300K  ± 6.50K      282K  …  309K           4 ( 6%)          -  0.8% ±  0.8%
  branch_misses      3.06M  ± 8.81K     3.05M  … 3.08M           0 ( 0%)        ⚡-  2.9% ±  0.1%
```

No change in performance appeared when running the benchmark at
higher compression levels.
  • Loading branch information
brianpane authored and folkertdev committed Dec 12, 2024
1 parent afcf420 commit bc43129
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion zlib-rs/src/deflate/algorithm/quick.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ pub fn deflate_quick(stream: &mut DeflateStream, flush: DeflateFlush) -> BlockSt

macro_rules! first_two_bytes {
($slice:expr, $offset:expr) => {
$slice[$offset] as u16 | ($slice[$offset + 1] as u16) << 8
u16::from_le_bytes($slice[$offset..$offset+2].try_into().unwrap())
}
}
if first_two_bytes!(str_start, 0) == first_two_bytes!(match_start, 0) {
Expand Down

0 comments on commit bc43129

Please sign in to comment.