Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small additional speedup for deflate_quick #262

Merged

Conversation

brian-pane
Copy link

Before and after benchmarks on Intel x86_64, compiled with RUSTFLAGS="-Ctarget-cpu=native -Cllvm-args=-enable-dfa-jump-thread" cargo build --release:

Benchmark 1 (60 runs): ./compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.1ms ± 1.93ms    81.4ms … 93.9ms          6 (10%)        0%
  peak_rss           26.7MB ± 73.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          303M  ± 1.12M      302M  …  309M           3 ( 5%)        0%
  instructions        655M  ±  265       655M  …  655M           1 ( 2%)        0%
  cache_references    404K  ± 12.6K      396K  …  468K           7 (12%)        0%
  cache_misses        302K  ± 6.45K      284K  …  321K           6 (10%)        0%
  branch_misses      3.15M  ± 6.83K     3.14M  … 3.17M           0 ( 0%)        0%
Benchmark 2 (62 runs): ./target/release/examples/compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.9ms ±  930us    80.2ms … 84.3ms          0 ( 0%)        ⚡-  2.6% ±  0.6%
  peak_rss           26.7MB ± 65.8KB    26.6MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          298M  ±  656K      297M  …  300M           0 ( 0%)        ⚡-  1.7% ±  0.1%
  instructions        645M  ±  255       645M  …  645M           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    400K  ± 3.70K      397K  …  417K           4 ( 6%)          -  0.9% ±  0.8%
  cache_misses        300K  ± 6.50K      282K  …  309K           4 ( 6%)          -  0.8% ±  0.8%
  branch_misses      3.06M  ± 8.81K     3.05M  … 3.08M           0 ( 0%)        ⚡-  2.9% ±  0.1%

No change in performance appeared when running the benchmark at higher compression levels.

Before and after benchmarks on Intel x86_64, compiled with
`RUSTFLAGS="-Ctarget-cpu=native -Cllvm-args=-enable-dfa-jump-thread" cargo build --release`:

```
Benchmark 1 (60 runs): ./compress-baseline 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          84.1ms ± 1.93ms    81.4ms … 93.9ms          6 (10%)        0%
  peak_rss           26.7MB ± 73.6KB    26.5MB … 26.7MB          0 ( 0%)        0%
  cpu_cycles          303M  ± 1.12M      302M  …  309M           3 ( 5%)        0%
  instructions        655M  ±  265       655M  …  655M           1 ( 2%)        0%
  cache_references    404K  ± 12.6K      396K  …  468K           7 (12%)        0%
  cache_misses        302K  ± 6.45K      284K  …  321K           6 (10%)        0%
  branch_misses      3.15M  ± 6.83K     3.14M  … 3.17M           0 ( 0%)        0%
Benchmark 2 (62 runs): ./target/release/examples/compress 1 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          81.9ms ±  930us    80.2ms … 84.3ms          0 ( 0%)        ⚡-  2.6% ±  0.6%
  peak_rss           26.7MB ± 65.8KB    26.6MB … 26.7MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          298M  ±  656K      297M  …  300M           0 ( 0%)        ⚡-  1.7% ±  0.1%
  instructions        645M  ±  255       645M  …  645M           0 ( 0%)        ⚡-  1.5% ±  0.0%
  cache_references    400K  ± 3.70K      397K  …  417K           4 ( 6%)          -  0.9% ±  0.8%
  cache_misses        300K  ± 6.50K      282K  …  309K           4 ( 6%)          -  0.8% ±  0.8%
  branch_misses      3.06M  ± 8.81K     3.05M  … 3.08M           0 ( 0%)        ⚡-  2.9% ±  0.1%
```

No change in performance appeared when running the benchmark at
higher compression levels.
Copy link
Collaborator

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks!

@folkertdev folkertdev merged commit bc43129 into trifectatechfoundation:main Dec 12, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants