-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can engineers force memory reporting into one consistent scale unit (K or M)? #237
Comments
Hmm, I seem to remember we wanted to force the units to be the same and actually had an option for that / but couldn't see it now. Would make sense to have the option - in the meantime you can try exploring some of the other output options here: https://swiftpackageindex.com/ordo-one/package-benchmark/1.22.4/documentation/benchmark/exportingbenchmarks I seem to remember that you could compare two runs in jmh? Away for a couple of weeks now and don't have access to a machine at the moment - so apologies for a bit fuzzy answers. |
@hassila Thanks! |
Do this for all the measurements by default please in the textual format. Otherwise it's far too easy to misinterpret the results by 3 or more orders of magnitude! I have literally spent hours optimizing the wrong things because I overlooked a difference in units. If you are not inclined to make it the default everywhere, please consider doing it when grouping by metric. Thanks |
I would be happy to change the default when grouping per metric, that makes sense (as that is typically for comparisons anyway). |
That's wonderful and I don't want to push my luck, but… wouldn't consistency across the formats be better? People are still liable to compare regardless of grouping. How much value is there in using different units in the other format? |
Just to clarify, are you talking about memory metrics only? |
All metrics. My particular interest wasn't in the memory use, in fact I was comparing time. |
Ok, then it sounds like you just measure variations of the same thing in your benchmarks in the same suite, which is a valid use case and then it would make sense. But we extensively have also different things being measured in the same suite (related to a specific subsystem), e.g.:
These have wildly different values for their metrics, are not comparable and makes no sense to compare between them. Forcing everything to the same metric would make measurements unusable for the comparison. This is one reason why we have grouping per benchmark (where comparisons makes no sense), and grouping per metric (when the benchmarks should be compared). It seems it make much sense to enforce the same units for the second use case (grouping per metric), as that is typically used for benchmarks which are variations of the same work that you want to compare. It makes less sense for the per-benchmark grouping of metrics, as it would remove all useful information for the faster running benchmarks and all values would revert to the worse performing benchmarks magnitude. I would suggest the following changes:
Do you agree and see the rationale? |
SGTM, and thank you for your patient explanation! |
…ric (#309) ## Description Swift 6 benchmark targets should now be possible with: ```swift let benchmarks: @sendable () -> Void = { ... } ``` Make it possible to specify output units for the text output for a given metric by specifying it in the configuration. E.g. ```swift Benchmark.defaultConfiguration.units = [.peakMemoryResident: .mega, .peakMemoryVirtual: .giga] ``` Also add the ability to override the time units from the command line using `--time-units`. E.g. ```bash swift package benchmark --time-units microseconds ``` This update also displays overflowing numeric values in scientific notation in the text output: ``` ╒═══════════════════════════════════════════════════════════════════════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╤═══════════╕ │ Test │ p0 │ p25 │ p50 │ p75 │ p90 │ p99 │ p100 │ Samples │ ╞═══════════════════════════════════════════════════════════════════════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╪═══════════╡ │ Samples:All metrics, full concurrency, async (ns) * │ 8.08e+07 │ 8.34e+07 │ 8.38e+07 │ 8.40e+07 │ 8.41e+07 │ 8.46e+07 │ 8.49e+07 │ 183 │ ├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤ │ Samples:Counter, custom metric thresholds (ns) * │ 2375 │ 2417 │ 2417 │ 2459 │ 2501 │ 2709 │ 4334 │ 4753 │ ├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤ │ Samples:Extended + custom metrics (ns) * │ 1875 │ 2000 │ 2000 │ 2041 │ 2167 │ 2667 │ 8750 │ 707 │ ├───────────────────────────────────────────────────────────────────────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┼───────────┤ │ Samples:Extended metrics (ns) * │ 1750 │ 1875 │ 1875 │ 1917 │ 1958 │ 2209 │ 3250 │ 705 │ ╘═══════════════════════════════════════════════════════════════════════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╧═══════════╛ ``` Also updates documentation. Addresses: #306 and #237 #293 #277 #258 --------- Co-authored-by: Axel Andersson <[email protected]> Co-authored-by: dimlio <[email protected]>
Slightly different solution, but you can now override all units per metric individually as part of: it is also possible to force time units from command line specifically. Please give it a try and open new issues if needed. |
Hi! I just started using the package and I like it a lot! I am running two parallel benchmarks with
grouping metric
to compare the memory usage between the two. I am seeing results like this:It's a little weird because one metric is reporting as
10273K
and the other metric is reporting as10M
… do I have any option to pass a flag (similar totimeUnits
) that can force that reporting to one unit so I can compare them a little easier? Thanks!The text was updated successfully, but these errors were encountered: