Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: exclude tinybench overhead from the benchmark results #189

Closed
yifanwww opened this issue Nov 17, 2024 · 4 comments
Closed

Request: exclude tinybench overhead from the benchmark results #189

yifanwww opened this issue Nov 17, 2024 · 4 comments

Comments

@yifanwww
Copy link

yifanwww commented Nov 17, 2024

Hi thank you for creating such an amazing benchmarking tool! However the benchmarking result is not exactly what I want.

I have read these issues

I think i'm requesting another feature so i write this issue.


For example we write this code to benchmark:

function noop() {}

function fibonacci(n) {
    if (n === 1 || n === 2) return 1;
    let a = 1;
    let b = 1;
    let c = 2;
    for (let i = 4; i <= n; i++) {
        a = b;
        b = c;
        c = a + b;
    }
    return c;
}

const bench = new Bench({ time: 500 });
bench
    .add('noop', () => noop())
    .add('fibonacci 4', () => fibonacci(4))
    .add('fibonacci 20', () => fibonacci(20));
await bench.run();
console.table(bench.table());

and the result is

┌─────────┬────────────────┬──────────────────────┬─────────────────────┬────────────────────────────┬───────────────────────────┬──────────┐
│ (index) │ Task name      │ Latency average (ns) │ Latency median (ns) │ Throughput average (ops/s) │ Throughput median (ops/s) │ Samples  │
├─────────┼────────────────┼──────────────────────┼─────────────────────┼────────────────────────────┼───────────────────────────┼──────────┤
│ 0       │ 'noop'         │ '44.29 ± 0.22%'      │ '0.00'              │ '17078062 ± 0.02%'         │ '22579347'                │ 11289676 │
│ 1       │ 'fibonacci 4'  │ '44.47 ± 0.34%'      │ '0.00'              │ '17019036 ± 0.02%'         │ '22488750'                │ 11244377 │
│ 2       │ 'fibonacci 20' │ '48.05 ± 0.35%'      │ '0.00'              │ '15692678 ± 0.02%'         │ '20812354'                │ 10406179 │
└─────────┴────────────────┴──────────────────────┴─────────────────────┴────────────────────────────┴───────────────────────────┴──────────┘

Refer to this example to reproduce the result.

Hmm I don't think this simple fibonacci algorithm would take that long to run. Or even a noop function takes 44 ns to run. A noop function should take zero time, or less than 1 ns due to the direct function call if it's not inlined.

Let's assume the tinybench overhead is 44.29 ns. By excluding 44.29 ns the benchmarking results will be:

  • noop: 0 ns
  • fibonacci 4: 0.18 ns
  • fibonacci 20: 3.76 ns

I cannot say this results are correct because I don't know if we can just consider the noop benchmark result as tinybench overhead. But at least it shows how we can get close to the correct result.


I tried other benchmarking tools and here're the benchmarking results:

  • C# BenchmarkDotNet (refer to this example to reproduce the result)
    • noop: 0.0002 ns
    • fibonacci 4: 0.1536 ns
    • fibonacci 20: 8.0300 ns
  • go built-in benchmark (refer to this example to reproduce the result)
    • noop: 0.2455 ns
    • fibonacci 4: 0.9764 ns
    • fibonacci 20: 8.760 ns
  • Rust criterion (refer to this example to reproduce the result)
    • noop: zero time
    • fibonacci 4: 1.7134 ns
    • fibonacci 20: 2.4494 ns

Those results are significantly different from the tinybench results.

If we look into the BenchmarkDotNet logs, we will see "OverheadActual", "WorkloadActual", for example:

L286: OverheadActual  15: 53370432 op, 72714400.00 ns, 1.3624 ns/op
L311: WorkloadActual  15: 53370432 op, 500232200.00 ns, 9.3728 ns/op

If we subtract them we can get WorkloadActual - OverheadActual = 8.0104 ns/op, it's pretty close to the average result 8.0300 ns/op.

What BenchmarkDotNet actually does is slightly different from that. You can read How it works. It says BenchmarkDotNet gets the result by calcualting Result = ActualWorkload - <MedianOverhead>.

@jerome-benoit
Copy link
Collaborator

jerome-benoit commented Nov 17, 2024

Comparing the timing of an algo implemented in different langages with the same measurement tool just tells you which language runs it faster.
Comparing the timing of an algo implemented in different langages with different measurement tools just tell you ... absolutely nothing. The measurement methodology is completely wrong.

And measuring the overhead of timestamping a block execution time in an interpreted language has nothing to do with the measurement of executing noop: in your example the latency median of the noop is zero with a zero median absolute deviation. Furthermore, a benchmarking tool for an interpreted language that is not including the interpreter overhead in its measurement is just meaningless.

The measurement methodology used in BenchmarkDotNet is utterly wrong: #143 (comment)

@yifanwww
Copy link
Author

Before we go any further i have a question:

What's the difference between "bench 1" and "bench 2", is "bench 2" correct? or is there a way to benchmark a super fast code by tinybench?

function fn() {
  // a small fn that only runs for a few nanoseconds
}

function bigFn() {
  // a big fn that runs for a few milliseconds
  for (let i = 0; i < 100_000_000; i ++) {
    fn()
  }
}

const bench = new Bench({ time: 500 });
bench
    .add('bench 1', () => fn())
    .add('bench 2', () => bigFn());
await bench.run();

@jerome-benoit
Copy link
Collaborator

It's two different experiments that have nothing in common.
I'm not going to explain again here what I have already explained in the link given. Please read it.

@yifanwww
Copy link
Author

OK I understand what you mean now.

@yifanwww yifanwww closed this as not planned Won't fix, can't repro, duplicate, stale Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants