Small additional speedup for deflate_quick #262

brian-pane · 2024-12-12T18:43:53Z

Before and after benchmarks on Intel x86_64, compiled with RUSTFLAGS="-Ctarget-cpu=native -Cllvm-args=-enable-dfa-jump-thread" cargo build --release:

Benchmark 1 (60 runs): ./compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.1ms ± 1.93ms    81.4ms … 93.9ms          6 (10%)        0%
  peak_rss           26.7MB ± 73.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          303M  ± 1.12M      302M  …  309M           3 ( 5%)        0%
  instructions        655M  ±  265       655M  …  655M           1 ( 2%)        0%
  cache_references    404K  ± 12.6K      396K  …  468K           7 (12%)        0%
  cache_misses        302K  ± 6.45K      284K  …  321K           6 (10%)        0%
  branch_misses      3.15M  ± 6.83K     3.14M  … 3.17M           0 ( 0%)        0%
Benchmark 2 (62 runs): ./target/release/examples/compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.9ms ±  930us    80.2ms … 84.3ms          0 ( 0%)        ⚡-  2.6% ±  0.6%
  peak_rss           26.7MB ± 65.8KB    26.6MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          298M  ±  656K      297M  …  300M           0 ( 0%)        ⚡-  1.7% ±  0.1%
  instructions        645M  ±  255       645M  …  645M           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    400K  ± 3.70K      397K  …  417K           4 ( 6%)          -  0.9% ±  0.8%
  cache_misses        300K  ± 6.50K      282K  …  309K           4 ( 6%)          -  0.8% ±  0.8%
  branch_misses      3.06M  ± 8.81K     3.05M  … 3.08M           0 ( 0%)        ⚡-  2.9% ±  0.1%

No change in performance appeared when running the benchmark at higher compression levels.

Before and after benchmarks on Intel x86_64, compiled with `RUSTFLAGS="-Ctarget-cpu=native -Cllvm-args=-enable-dfa-jump-thread" cargo build --release`: ``` Benchmark 1 (60 runs): ./compress-baseline 1 rs silesia-small.tar measurement mean ± σ min … max outliers delta wall_time 84.1ms ± 1.93ms 81.4ms … 93.9ms 6 (10%) 0% peak_rss 26.7MB ± 73.6KB 26.5MB … 26.7MB 0 ( 0%) 0% cpu_cycles 303M ± 1.12M 302M … 309M 3 ( 5%) 0% instructions 655M ± 265 655M … 655M 1 ( 2%) 0% cache_references 404K ± 12.6K 396K … 468K 7 (12%) 0% cache_misses 302K ± 6.45K 284K … 321K 6 (10%) 0% branch_misses 3.15M ± 6.83K 3.14M … 3.17M 0 ( 0%) 0% Benchmark 2 (62 runs): ./target/release/examples/compress 1 rs silesia-small.tar measurement mean ± σ min … max outliers delta wall_time 81.9ms ± 930us 80.2ms … 84.3ms 0 ( 0%) ⚡- 2.6% ± 0.6% peak_rss 26.7MB ± 65.8KB 26.6MB … 26.7MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 298M ± 656K 297M … 300M 0 ( 0%) ⚡- 1.7% ± 0.1% instructions 645M ± 255 645M … 645M 0 ( 0%) ⚡- 1.5% ± 0.0% cache_references 400K ± 3.70K 397K … 417K 4 ( 6%) - 0.9% ± 0.8% cache_misses 300K ± 6.50K 282K … 309K 4 ( 6%) - 0.8% ± 0.8% branch_misses 3.06M ± 8.81K 3.05M … 3.08M 0 ( 0%) ⚡- 2.9% ± 0.1% ``` No change in performance appeared when running the benchmark at higher compression levels.

folkertdev

nice, thanks!

brian-pane mentioned this pull request Dec 12, 2024

Refactoring to allow inlining of quick_insert_string #261

Closed

folkertdev approved these changes Dec 12, 2024

View reviewed changes

folkertdev merged commit bc43129 into trifectatechfoundation:main Dec 12, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small additional speedup for deflate_quick #262

Small additional speedup for deflate_quick #262

brian-pane commented Dec 12, 2024

folkertdev left a comment

Small additional speedup for deflate_quick #262

Small additional speedup for deflate_quick #262

Conversation

brian-pane commented Dec 12, 2024

folkertdev left a comment

Choose a reason for hiding this comment