Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize zero reads #34

Merged
merged 1 commit into from
Oct 20, 2024
Merged

Optimize zero reads #34

merged 1 commit into from
Oct 20, 2024

Conversation

nirs
Copy link
Member

@nirs nirs commented Oct 17, 2024

Use optimized range loop[1], optimized by the compiler to single memclr
call. This dramatically speeds up zero reads.

format compression utilization speedup
qcow2 - 0% 28.72
qcow2 zlib 0% 28.04
qcow2 - 50% 4.54
qcow2 zlib 50% 1.03
qcow2 - 100% 1.01
qcow2 zlib 100% 1.00

Before:

% go test -bench Read
BenchmarkRead0p/qcow2-12           14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12      14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12          24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12      2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
BenchmarkRead100p/qcow2-12         61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12     1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

After:

% go test -bench Read
BenchmarkRead0p/qcow2-12          471       2698377 ns/op    99480.34 MB/s      1050514 B/op        39 allocs/op
BenchmarkRead0p/qcow2_zlib-12     468       2774952 ns/op    96735.15 MB/s      1050511 B/op        39 allocs/op
BenchmarkRead50p/qcow2-12         100      10735870 ns/op    25003.61 MB/s      1181854 B/op        45 allocs/op
BenchmarkRead50p/qcow2_zlib-12      2     868310583 ns/op      309.15 MB/s    185038456 B/op     43263 allocs/op
BenchmarkRead100p/qcow2-12         63      18977718 ns/op    14144.77 MB/s      1181851 B/op        45 allocs/op
BenchmarkRead100p/qcow2_zlib-12     1    1727832917 ns/op      155.36 MB/s    368886656 B/op     86471 allocs/op

Comparing with qemu-img show that we match qemu-img performance for
uncompressed version of the lima default image:

% time ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img
./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img  0.06s user 0.73s system 93% cpu 0.854 total

% time qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img
qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img  0.04s user 0.70s system 98% cpu 0.756 total

[1] https://go-review.googlesource.com/c/go/+/2520

Part-of #32

Based on #35 for more correct benchmarks.

image/qcow2/qcow2.go Outdated Show resolved Hide resolved
image/qcow2/qcow2.go Outdated Show resolved Hide resolved
@nirs
Copy link
Member Author

nirs commented Oct 19, 2024

@AkihiroSuda current version is simpler, needs no temporary buffer, and much faster.

Use optimized range loop[1], optimized by the compiler to single memclr
call. This dramatically speeds up zero reads.

| format | compression | utilization | speedup |
|--------|-------------|-------------|---------|
| qcow2  | -           |          0% |   28.72 |
| qcow2  | zlib        |          0% |   28.04 |
| qcow2  | -           |         50% |    4.54 |
| qcow2  | zlib        |         50% |    1.03 |
| qcow2  | -           |        100% |    1.01 |
| qcow2  | zlib        |        100% |    1.00 |

Before:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12           14      77515735 ns/op     3462.98 MB/s      1050518 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12      14      77823402 ns/op     3449.29 MB/s      1050504 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12          24      48812158 ns/op     5499.36 MB/s      1181856 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12      2     899659187 ns/op      298.37 MB/s    184996316 B/op     43247 allocs/op
    BenchmarkRead100p/qcow2-12         61      19306020 ns/op    13904.24 MB/s      1181854 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12     1    1732168542 ns/op      154.97 MB/s    368850952 B/op     86460 allocs/op

After:

    % go test -bench Read
    BenchmarkRead0p/qcow2-12          471       2698377 ns/op    99480.34 MB/s      1050514 B/op        39 allocs/op
    BenchmarkRead0p/qcow2_zlib-12     468       2774952 ns/op    96735.15 MB/s      1050511 B/op        39 allocs/op
    BenchmarkRead50p/qcow2-12         100      10735870 ns/op    25003.61 MB/s      1181854 B/op        45 allocs/op
    BenchmarkRead50p/qcow2_zlib-12      2     868310583 ns/op      309.15 MB/s    185038456 B/op     43263 allocs/op
    BenchmarkRead100p/qcow2-12         63      18977718 ns/op    14144.77 MB/s      1181851 B/op        45 allocs/op
    BenchmarkRead100p/qcow2_zlib-12     1    1727832917 ns/op      155.36 MB/s    368886656 B/op     86471 allocs/op

Comparing with qemu-img show that we match qemu-img performance for
uncompressed version of the lima default image:

    % time ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img
    ./go-qcow2reader-example /tmp/test.qcow2 > /tmp/tmp.img  0.06s user 0.73s system 93% cpu 0.854 total

    % time qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img
    qemu-img convert -O raw /tmp/test.qcow2 /tmp/tmp.img  0.04s user 0.70s system 98% cpu 0.756 total

[1] https://go-review.googlesource.com/c/go/+/2520

Signed-off-by: Nir Soffer <[email protected]>
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 5f33c4a into lima-vm:master Oct 20, 2024
2 checks passed
@nirs nirs deleted the zero-reads branch October 20, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants