Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can engineers force memory reporting into one consistent scale unit (K or M)? #237

Closed
vanvoorden opened this issue Mar 28, 2024 · 10 comments

Comments

@vanvoorden
Copy link
Contributor

Hi! I just started using the package and I like it a lot! I am running two parallel benchmarks with grouping metric to compare the memory usage between the two. I am seeing results like this:

Memory (resident peak)
╒════════════════════════════════════════════════════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╤═════════╕
│ Test                                                   │      p0 │     p25 │     p50 │     p75 │     p90 │     p99 │    p100 │ Samples │
╞════════════════════════════════════════════════════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╪═════════╡
│ First (K)                                              │    8913 │   10273 │   10273 │   10273 │   10273 │   10273 │   10273 │     100 │
├────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ Second (M)                                             │      10 │      10 │      10 │      10 │      10 │      10 │      10 │     100 │
╘════════════════════════════════════════════════════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╧═════════╛

It's a little weird because one metric is reporting as 10273K and the other metric is reporting as 10M… do I have any option to pass a flag (similar to timeUnits) that can force that reporting to one unit so I can compare them a little easier? Thanks!

@hassila
Copy link
Contributor

hassila commented Mar 29, 2024

Hmm, I seem to remember we wanted to force the units to be the same and actually had an option for that / but couldn't see it now.

Would make sense to have the option - in the meantime you can try exploring some of the other output options here: https://swiftpackageindex.com/ordo-one/package-benchmark/1.22.4/documentation/benchmark/exportingbenchmarks

I seem to remember that you could compare two runs in jmh?

Away for a couple of weeks now and don't have access to a machine at the moment - so apologies for a bit fuzzy answers.

@vanvoorden
Copy link
Contributor Author

@hassila Thanks!

@dabrahams
Copy link

dabrahams commented Feb 2, 2025

Do this for all the measurements by default please in the textual format. Otherwise it's far too easy to misinterpret the results by 3 or more orders of magnitude! I have literally spent hours optimizing the wrong things because I overlooked a difference in units.

If you are not inclined to make it the default everywhere, please consider doing it when grouping by metric.

Thanks

@hassila
Copy link
Contributor

hassila commented Feb 3, 2025

I would be happy to change the default when grouping per metric, that makes sense (as that is typically for comparisons anyway).

@dabrahams
Copy link

That's wonderful and I don't want to push my luck, but… wouldn't consistency across the formats be better? People are still liable to compare regardless of grouping. How much value is there in using different units in the other format?

@hassila
Copy link
Contributor

hassila commented Feb 4, 2025

That's wonderful and I don't want to push my luck, but… wouldn't consistency across the formats be better? People are still liable to compare regardless of grouping. How much value is there in using different units in the other format?

Just to clarify, are you talking about memory metrics only?

@dabrahams
Copy link

dabrahams commented Feb 4, 2025

All metrics. My particular interest wasn't in the memory use, in fact I was comparing time.

@hassila
Copy link
Contributor

hassila commented Feb 5, 2025

Ok, then it sounds like you just measure variations of the same thing in your benchmarks in the same suite, which is a valid use case and then it would make sense.

But we extensively have also different things being measured in the same suite (related to a specific subsystem), e.g.:

  • Benchmark for measuring cost of opening a database
  • Benchmark for traversing a specific dataset with a bulk cursor
  • Benchmark for writing a specific dataset
  • Benchmark for reading a single item with direct indexing
  • Benchmark for extract database metadata and statistics

These have wildly different values for their metrics, are not comparable and makes no sense to compare between them. Forcing everything to the same metric would make measurements unusable for the comparison.

This is one reason why we have grouping per benchmark (where comparisons makes no sense), and grouping per metric (when the benchmarks should be compared).

It seems it make much sense to enforce the same units for the second use case (grouping per metric), as that is typically used for benchmarks which are variations of the same work that you want to compare.

It makes less sense for the per-benchmark grouping of metrics, as it would remove all useful information for the faster running benchmarks and all values would revert to the worse performing benchmarks magnitude.

I would suggest the following changes:

  • Normalize the display unit when grouping by metric to make comparisons robust for the use case when you compare benchmarks that are variation on the same workload
  • Keep the output adopted to the magnitude of the output when grouping per benchmark (which would be for different workload)
  • Add an option to override and choose a specific magnitude for displaying memory usage across all benchmarks (basically disabling the automatic choice)
  • Add an option to override and choose a specific magnitude for displaying time usage
  • and finally an option for overriding magnitude also for counts (ARC, custom metrics, ...)

Do you agree and see the rationale?

@dabrahams
Copy link

dabrahams commented Feb 5, 2025

SGTM, and thank you for your patient explanation!

hassila added a commit that referenced this issue Feb 11, 2025
…ric (#309)

## Description

Swift 6 benchmark targets should now be possible with:

```swift
let benchmarks: @sendable () -> Void = {
...
}
```

Make it possible to specify output units for the text output for a given
metric by specifying it in the configuration.

E.g.
```swift
    Benchmark.defaultConfiguration.units = [.peakMemoryResident: .mega, .peakMemoryVirtual: .giga]
```

Also add the ability to override the time units from the command line
using `--time-units`.

E.g.
```bash
swift package benchmark --time-units microseconds
```

This update also displays overflowing numeric values in scientific
notation in the text output:
```
╒═══════════════════════════════════════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕
│ Test                                                                      │        p0 │       p25 │       p50 │       p75 │       p90 │       p99 │      p100 │   Samples │
╞═══════════════════════════════════════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ Samples:All metrics, full concurrency, async (ns) *                       │  8.08e+07 │  8.34e+07 │  8.38e+07 │  8.40e+07 │  8.41e+07 │  8.46e+07 │  8.49e+07 │       183 │
├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Samples:Counter, custom metric thresholds (ns) *                          │      2375 │      2417 │      2417 │      2459 │      2501 │      2709 │      4334 │      4753 │
├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Samples:Extended + custom metrics (ns) *                                  │      1875 │      2000 │      2000 │      2041 │      2167 │      2667 │      8750 │       707 │
├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤
│ Samples:Extended metrics (ns) *                                           │      1750 │      1875 │      1875 │      1917 │      1958 │      2209 │      3250 │       705 │
╘═══════════════════════════════════════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛
```

Also updates documentation.

Addresses:
#306 
and
#237
#293
#277
#258

---------

Co-authored-by: Axel Andersson <[email protected]>
Co-authored-by: dimlio <[email protected]>
@hassila
Copy link
Contributor

hassila commented Feb 11, 2025

Slightly different solution, but you can now override all units per metric individually as part of:
https://github.com/ordo-one/package-benchmark/releases/tag/1.28.0

it is also possible to force time units from command line specifically.

Please give it a try and open new issues if needed.

@hassila hassila closed this as completed Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants