Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use our own Discard for testing #35

Merged
merged 1 commit into from
Oct 20, 2024
Merged

Conversation

nirs
Copy link
Member

@nirs nirs commented Oct 19, 2024

Turns out that io.Discard is implementing ReadFrom using a small buffer (8192), confusing our benchmarks. We copyBuffer with 1 MiB buffer, but io.Discard is using its own 8 KiB buffer to do huge amount of tiny reads. These tiny reads are extremely slow for reading compressed clusters, since we have to read and decompress the same cluster multiple times.

With this change qcow2 zlib performance is 4 times better - it it still slow, but matches better the real performance.

Before:

% go test -bench Read
BenchmarkRead0p/qcow2-12          14      78238414 ns/op     3430.99 MB/s      1051160 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12     14      78577923 ns/op     3416.17 MB/s      1051733 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12         21      54889353 ns/op     4890.48 MB/s      1183231 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12     1    3466799292 ns/op       77.43 MB/s    736076536 B/op    178764 allocs/op
BenchmarkRead100p/qcow2-12        38      30562127 ns/op     8783.27 MB/s      1182901 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12    1    6834526167 ns/op       39.28 MB/s   1471530256 B/op    357570 allocs/op

After:

% go test -bench Read
BenchmarkRead0p/qcow2-12          14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12     14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12         24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12     2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
BenchmarkRead100p/qcow2-12        61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12    1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

Turns out that io.Discard is implementing ReadFrom using a small buffer
(8192), confusing our benchmarks. We copyBuffer with 1 MiB buffer, but
io.Discard is using its own 8 KiB buffer to do huge amount of tiny
reads. These tiny reads are extremely slow for reading compressed
clusters, since we have to read and decompress the same cluster multiple
times.

With this change qcow2 zlib performance is 4 times better - it it still
slow, but matches better the real performance.

Before:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12          14      78238414 ns/op     3430.99 MB/s      1051160 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12     14      78577923 ns/op     3416.17 MB/s      1051733 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12         21      54889353 ns/op     4890.48 MB/s      1183231 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12     1    3466799292 ns/op       77.43 MB/s    736076536 B/op    178764 allocs/op
    BenchmarkRead100p/qcow2-12        38      30562127 ns/op     8783.27 MB/s      1182901 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12    1    6834526167 ns/op       39.28 MB/s   1471530256 B/op    357570 allocs/op

After:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12          14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12     14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12         24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12     2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
    BenchmarkRead100p/qcow2-12        61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12    1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

Signed-off-by: Nir Soffer <[email protected]>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit b119fa3 into lima-vm:master Oct 20, 2024
2 checks passed
@nirs nirs deleted the fix-discard branch November 18, 2024 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants