Add parallel convert #36

nirs · 2024-10-24T20:29:48Z

Converting images efficiently requires parallel reads and writes, keeping multiple I/O requests in flight. For compressed images we want to decompress the compressed clusters in parallel. It is not hard to implement this using the io.ReaderAt interface, but providing an implementation in this library will make it much more useful for users.

This change add a convert package and add it to the example program.

Testing show significant speedup for compressed images and for unallocated or zero clusters, and smaller speedup for uncompressed images.

image	size	compression	throughput	speedup
Ubuntu 24.04	3.5 GiB	-	6.04 GiB/s	1.51
Ubuntu 24.04	3.5 GiB	zlib	1.62 GiB/s	5.42
Empty image	100.0 GiB	-	240.15 GiB/s	7.32

Example usage in lima: lima-vm/lima#2798

Fixes #32

With this converting the default qcow2 image to raw is 5.6 times faster: Before: % limactl create --tty=false 3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 329.75 MiB/s After: % limactl create --tty=false 3.50 GiB / 3.50 GiB [---------------------------------------] 100.00% 1.67 GiB/s Depends on lima-vm/go-qcow2reader#36 Signed-off-by: Nir Soffer <[email protected]>

With this converting the default ubuntu 24.10 qcow2 compressed image to raw is 5.4 times faster: Before: % limactl create --tty=false 3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 317.58 MiB/s After: % limactl create --tty=false 3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 1.67 GiB/s Depends on lima-vm/go-qcow2reader#36 Signed-off-by: Nir Soffer <[email protected]>

AkihiroSuda · 2024-10-25T00:51:45Z

convert/convert.go

+		bufferSize:  opts.BufferSize,
+		workers:     opts.Workers,
+	}
+}


This should validate opts (e.g. negative integers) and return err

Right, will add validation.

Addressed in current version.

Validation extracted to Options.Validate() that can also be used by the caller to validated options before creating a converter.

AkihiroSuda · 2024-10-25T00:52:41Z

convert/convert.go

+	c.offset = 0
+}
+
+// Convert copy and decompress guest data from source image and write it to the


Nit: s/copy/copies/

What is “guest data”

What is “guest data”

I tried to make it clear that we copy the content of the image as seen by the guest running this image, and not the host data (compressed, unordered). But this is not relevant here, it is handled by the io.ReaderAt. I'll remove this.

Comments updated.

AkihiroSuda · 2024-10-25T00:54:14Z

convert/convert.go

+
+const SegmentSize = 32 * 1024 * 1024
+const BufferSize = 1024 * 1024
+const Workers = 8


Should this default to GOMAXPROCS?

I'm not sure this is a good way. For I/O we want to have multiple request in-flight, regardless of the number of cores. This helps to get better throughput on NVMe devices. For decompression more using the number of cores is optimal, but I'm not sure that using all cores is good approach. Since we have a buffer per core, using more cores uses also more memory for the buffers. Using too much memory typically slow down the copy since you don't fit into L2/L3 cache.

I started with qemu-img approach, defaulting to 8 coroutines and 8 threads in the thread pool, regardless of the number of cores. I tested 1, 2, 4, 8, 12 workers on M2 Max (8 performance cores, 4 efficiency cores) and M1 Pro (8 performance cores, 2 efficiency cores). 8 workers looks good on both machines.

I'll do more testing to evaluate this.

Thanks for explanation, could you add that explanation as a code comment too?

Sure, good idea.

I added more benchmarks results showing that 8 workers gives almost best result in all cases. I think this is a good default value, and users that want to get maximum performance can tweak the options.

For compressed images using number of cores is 20% faster, but I think the way to improve decompression is using faster library like go-libdeflate. I did a quick test and got 650 MiB/s for single thread instead of 150 MiB/s.

convert/convert.go

The convert package implements efficient conversion of qcow2 image to raw sparse image using multiple threads. This will be useful for users of this library that need to work with raw images. Signed-off-by: Nir Soffer <[email protected]>

Using the same benchmarks infrastructure we can run compare the serial read (using io.Reader inteface) and parallel copy using the convert package. It may be useful to change BenchmarkRead() to actually copy the data to a file instead of discarding it. We can do this later. % go test -bench Benchmark/ Benchmark0p/qcow2/read-12 421 2569832 ns/op 104456.41 MB/s 1050513 B/op 39 allocs/op Benchmark0p/qcow2/convert-12 550 2188487 ns/op 122657.99 MB/s 9440266 B/op 60 allocs/op Benchmark0p/qcow2_zlib/read-12 534 2230269 ns/op 120360.10 MB/s 1050511 B/op 39 allocs/op Benchmark0p/qcow2_zlib/read#01-12 570 2126408 ns/op 126238.91 MB/s 9440130 B/op 60 allocs/op Benchmark50p/qcow2/read-12 100 10936881 ns/op 24544.06 MB/s 1181852 B/op 45 allocs/op Benchmark50p/qcow2/convert-12 28 60437272 ns/op 4441.55 MB/s 10157185 B/op 79 allocs/op Benchmark50p/qcow2_zlib/read-12 2 892929271 ns/op 300.62 MB/s 185073236 B/op 43275 allocs/op Benchmark50p/qcow2_zlib/convert-12 6 187644889 ns/op 1430.55 MB/s 194612194 B/op 43346 allocs/op Benchmark100p/qcow2/read-12 60 19555156 ns/op 13727.09 MB/s 1181857 B/op 45 allocs/op Benchmark100p/qcow2/convert-12 22 66297214 ns/op 4048.97 MB/s 10233635 B/op 83 allocs/op Benchmark100p/qcow2_zlib/read-12 1 1775486625 ns/op 151.19 MB/s 368587320 B/op 86425 allocs/op Benchmark100p/qcow2_zlib/convert-12 3 338774583 ns/op 792.37 MB/s 378895709 B/op 86617 allocs/op Signed-off-by: Nir Soffer <[email protected]>

nirs · 2024-10-25T12:53:19Z

Benchmarking number of workers for converting qcow2 zlib image

Best result was with 24 workers on 12 cores machine. 8 workers give 81% of maximum performance.

% hyperfine --time-unit second -w1 -L w 1,2,4,8,12,16,24 "./go-qcow2reader-example convert -workers {w} /tmp/images/test.zlib.qcow2 /tmp/tmp.img"
Benchmark 1: ./go-qcow2reader-example convert -workers 1 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):     10.930 s ±  0.211 s    [User: 10.096 s, System: 0.724 s]
  Range (min … max):   10.493 s … 11.166 s    10 runs
 
Benchmark 2: ./go-qcow2reader-example convert -workers 2 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      5.905 s ±  0.117 s    [User: 10.411 s, System: 0.700 s]
  Range (min … max):    5.671 s …  6.062 s    10 runs
 
Benchmark 3: ./go-qcow2reader-example convert -workers 4 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      3.422 s ±  0.030 s    [User: 10.714 s, System: 0.872 s]
  Range (min … max):    3.383 s …  3.480 s    10 runs
 
Benchmark 4: ./go-qcow2reader-example convert -workers 8 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      2.130 s ±  0.015 s    [User: 10.953 s, System: 1.020 s]
  Range (min … max):    2.109 s …  2.156 s    10 runs
 
Benchmark 5: ./go-qcow2reader-example convert -workers 12 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      1.861 s ±  0.026 s    [User: 12.585 s, System: 1.324 s]
  Range (min … max):    1.823 s …  1.897 s    10 runs
 
Benchmark 6: ./go-qcow2reader-example convert -workers 16 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      1.838 s ±  0.039 s    [User: 12.825 s, System: 1.301 s]
  Range (min … max):    1.762 s …  1.887 s    10 runs
 
Benchmark 7: ./go-qcow2reader-example convert -workers 24 /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      1.788 s ±  0.019 s    [User: 12.982 s, System: 1.269 s]
  Range (min … max):    1.748 s …  1.812 s    10 runs
 
Summary
  './go-qcow2reader-example convert -workers 24 /tmp/images/test.zlib.qcow2 /tmp/tmp.img' ran
    1.03 ± 0.02 times faster than './go-qcow2reader-example convert -workers 16 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'
    1.04 ± 0.02 times faster than './go-qcow2reader-example convert -workers 12 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'
    1.19 ± 0.02 times faster than './go-qcow2reader-example convert -workers 8 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'
    1.91 ± 0.03 times faster than './go-qcow2reader-example convert -workers 4 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'
    3.30 ± 0.07 times faster than './go-qcow2reader-example convert -workers 2 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'
    6.11 ± 0.13 times faster than './go-qcow2reader-example convert -workers 1 /tmp/images/test.zlib.qcow2 /tmp/tmp.img'

nirs · 2024-10-25T12:55:54Z

Benchmarking number of workers for qcow2 image

Using number of cores is best, but 8 workers gives 99% of the performance.

% hyperfine --time-unit second -w1 -L w 1,2,4,8,12,16,24 "./go-qcow2reader-example convert -workers {w} /tmp/images/test.qcow2 /tmp/tmp.img"
Benchmark 1: ./go-qcow2reader-example convert -workers 1 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.807 s ±  0.029 s    [User: 0.065 s, System: 0.387 s]
  Range (min … max):    0.768 s …  0.873 s    10 runs
 
Benchmark 2: ./go-qcow2reader-example convert -workers 2 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.651 s ±  0.024 s    [User: 0.071 s, System: 0.425 s]
  Range (min … max):    0.618 s …  0.703 s    10 runs
 
Benchmark 3: ./go-qcow2reader-example convert -workers 4 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.589 s ±  0.016 s    [User: 0.077 s, System: 0.492 s]
  Range (min … max):    0.561 s …  0.611 s    10 runs
 
Benchmark 4: ./go-qcow2reader-example convert -workers 8 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.566 s ±  0.016 s    [User: 0.091 s, System: 0.662 s]
  Range (min … max):    0.540 s …  0.590 s    10 runs
 
Benchmark 5: ./go-qcow2reader-example convert -workers 12 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.573 s ±  0.017 s    [User: 0.123 s, System: 0.837 s]
  Range (min … max):    0.547 s …  0.599 s    10 runs
 
Benchmark 6: ./go-qcow2reader-example convert -workers 16 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.562 s ±  0.013 s    [User: 0.142 s, System: 0.952 s]
  Range (min … max):    0.546 s …  0.590 s    10 runs
 
Benchmark 7: ./go-qcow2reader-example convert -workers 24 /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.612 s ±  0.111 s    [User: 0.155 s, System: 1.036 s]
  Range (min … max):    0.554 s …  0.923 s    10 runs
  
Summary
  './go-qcow2reader-example convert -workers 16 /tmp/images/test.qcow2 /tmp/tmp.img' ran
    1.01 ± 0.04 times faster than './go-qcow2reader-example convert -workers 8 /tmp/images/test.qcow2 /tmp/tmp.img'
    1.02 ± 0.04 times faster than './go-qcow2reader-example convert -workers 12 /tmp/images/test.qcow2 /tmp/tmp.img'
    1.05 ± 0.04 times faster than './go-qcow2reader-example convert -workers 4 /tmp/images/test.qcow2 /tmp/tmp.img'
    1.09 ± 0.20 times faster than './go-qcow2reader-example convert -workers 24 /tmp/images/test.qcow2 /tmp/tmp.img'
    1.16 ± 0.05 times faster than './go-qcow2reader-example convert -workers 2 /tmp/images/test.qcow2 /tmp/tmp.img'
    1.43 ± 0.06 times faster than './go-qcow2reader-example convert -workers 1 /tmp/images/test.qcow2 /tmp/tmp.img'

nirs · 2024-10-25T12:58:37Z

Benchmarking number of workers for 100 GiB empty image

Using number of cores is best, 8 workers gives 89% of performance.

% hyperfine --time-unit second -w1 -L w 1,2,4,8,12,16,24 "./go-qcow2reader-example convert -workers {w} /tmp/images/test.0p.qcow2 /tmp/tmp.img"
Benchmark 1: ./go-qcow2reader-example convert -workers 1 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      2.971 s ±  0.030 s    [User: 2.929 s, System: 0.037 s]
  Range (min … max):    2.931 s …  3.027 s    10 runs
 
Benchmark 2: ./go-qcow2reader-example convert -workers 2 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      1.588 s ±  0.022 s    [User: 3.126 s, System: 0.035 s]
  Range (min … max):    1.556 s …  1.630 s    10 runs
 
Benchmark 3: ./go-qcow2reader-example convert -workers 4 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.818 s ±  0.013 s    [User: 3.214 s, System: 0.026 s]
  Range (min … max):    0.810 s …  0.850 s    10 runs
  
Benchmark 4: ./go-qcow2reader-example convert -workers 8 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.418 s ±  0.011 s    [User: 3.269 s, System: 0.017 s]
  Range (min … max):    0.410 s …  0.448 s    10 runs
  
Benchmark 5: ./go-qcow2reader-example convert -workers 12 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.378 s ±  0.014 s    [User: 4.301 s, System: 0.028 s]
  Range (min … max):    0.365 s …  0.405 s    10 runs
 
Benchmark 6: ./go-qcow2reader-example convert -workers 16 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.397 s ±  0.004 s    [User: 4.517 s, System: 0.030 s]
  Range (min … max):    0.390 s …  0.404 s    10 runs
 
Benchmark 7: ./go-qcow2reader-example convert -workers 24 /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.394 s ±  0.006 s    [User: 4.483 s, System: 0.032 s]
  Range (min … max):    0.379 s …  0.402 s    10 runs
 
Summary
  './go-qcow2reader-example convert -workers 12 /tmp/images/test.0p.qcow2 /tmp/tmp.img' ran
    1.04 ± 0.04 times faster than './go-qcow2reader-example convert -workers 24 /tmp/images/test.0p.qcow2 /tmp/tmp.img'
    1.05 ± 0.04 times faster than './go-qcow2reader-example convert -workers 16 /tmp/images/test.0p.qcow2 /tmp/tmp.img'
    1.11 ± 0.05 times faster than './go-qcow2reader-example convert -workers 8 /tmp/images/test.0p.qcow2 /tmp/tmp.img'
    2.17 ± 0.09 times faster than './go-qcow2reader-example convert -workers 4 /tmp/images/test.0p.qcow2 /tmp/tmp.img'
    4.20 ± 0.17 times faster than './go-qcow2reader-example convert -workers 2 /tmp/images/test.0p.qcow2 /tmp/tmp.img'
    7.87 ± 0.30 times faster than './go-qcow2reader-example convert -workers 1 /tmp/images/test.0p.qcow2 /tmp/tmp.img'

nirs · 2024-10-25T13:16:17Z

Comparing to qemu-img convert

For real images we provide similar performance (faster for qcow2, slower for qcow2 zlib). For empty image qemu-img is 58 times faster since it has efficient block status interface and it does not do any memset() and zero detection.

Ubuntu 24.04 qcow2 zlib image

% hyperfine --time-unit second -w3 "qemu-img convert -f qcow2 -O raw -W /tmp/images/test.zlib.qcow2 /tmp/tmp.img" \
                                  "./go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img"
Benchmark 1: qemu-img convert -f qcow2 -O raw -W /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      1.805 s ±  0.418 s    [User: 2.354 s, System: 3.033 s]
  Range (min … max):    1.623 s …  2.989 s    10 runs
  
Benchmark 2: ./go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img
  Time (mean ± σ):      2.163 s ±  0.033 s    [User: 10.992 s, System: 1.032 s]
  Range (min … max):    2.123 s …  2.237 s    10 runs
 
Summary
  'qemu-img convert -f qcow2 -O raw -W /tmp/images/test.zlib.qcow2 /tmp/tmp.img' ran
    1.20 ± 0.28 times faster than './go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img'

Ubuntu 24.04 qcow2 image

% hyperfine --time-unit second -w3 "qemu-img convert -f qcow2 -O raw -W /tmp/images/test.qcow2 /tmp/tmp.img" "./go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img" 
Benchmark 1: qemu-img convert -f qcow2 -O raw -W /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.684 s ±  0.014 s    [User: 0.038 s, System: 0.897 s]
  Range (min … max):    0.661 s …  0.703 s    10 runs
 
Benchmark 2: ./go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.574 s ±  0.019 s    [User: 0.090 s, System: 0.679 s]
  Range (min … max):    0.549 s …  0.601 s    10 runs
 
Summary
  './go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img' ran
    1.19 ± 0.05 times faster than 'qemu-img convert -f qcow2 -O raw -W /tmp/images/test.qcow2 /tmp/tmp.img'

100g empty qcow2 image

% hyperfine --time-unit second -w3 "qemu-img convert -f qcow2 -O raw -W /tmp/images/test.0p.qcow2 /tmp/tmp.img" \
                                   "./go-qcow2reader-example convert /tmp/images/test.0p.qcow2 /tmp/tmp.img"
Benchmark 1: qemu-img convert -f qcow2 -O raw -W /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.007 s ±  0.000 s    [User: 0.005 s, System: 0.002 s]
  Range (min … max):    0.007 s …  0.009 s    312 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./go-qcow2reader-example convert /tmp/images/test.0p.qcow2 /tmp/tmp.img
  Time (mean ± σ):      0.422 s ±  0.009 s    [User: 3.293 s, System: 0.015 s]
  Range (min … max):    0.414 s …  0.439 s    10 runs
 
Summary
  'qemu-img convert -f qcow2 -O raw -W /tmp/images/test.0p.qcow2 /tmp/tmp.img' ran
   58.00 ± 1.93 times faster than './go-qcow2reader-example convert /tmp/images/test.0p.qcow2 /tmp/tmp.img'

To make room for a new convert sub command, using parallel convert. This is also a better example, having only the relevant argument and separate file for every example command. Example usage: % ./go-qcow2reader-example Usage: ./go-qcow2reader-example COMMAND [OPTIONS...] Available commands: info show image information read read image data and print to stdout % ./go-qcow2reader-example info /tmp/images/test.zlib.qcow2 { "type": "qcow2", "size": 3758096384, ... % time ./go-qcow2reader-example read /tmp/images/test.zlib.qcow2 >/dev/null ./go-qcow2reader-example read /tmp/images/test.zlib.qcow2 > /dev/null 10.05s user 0.35s system 101% cpu 10.279 total Signed-off-by: Nir Soffer <[email protected]>

Add convert subcommand using the new convert package and include the new command in the functional tests. The new example is also good way to benchmark the library with real images as we can see bellow. Testing show significant speedup for compressed images and for unallocated or zero clusters, and smaller speedup for uncompressed images. | image | size | compression | throughput | speedup | |--------------|-----------|-------------|--------------|---------| | Ubuntu 24.04 | 3.5 GiB | - | 6.04 GiB/s | 1.51 | | Ubuntu 24.04 | 3.5 GiB | zlib | 1.62 GiB/s | 5.42 | | Empty image | 100.0 GiB | - | 240.15 GiB/s | 7.32 | Ubuntu 24.04 image in qcow2 format: % hyperfine -w3 "./go-qcow2reader-example read /tmp/images/test.qcow2 >/tmp/tmp.img" \ "./go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img" Benchmark 1: ./go-qcow2reader-example read /tmp/images/test.qcow2 >/tmp/tmp.img Time (mean ± σ): 874.8 ms ± 41.3 ms [User: 64.3 ms, System: 717.5 ms] Range (min … max): 851.9 ms … 985.3 ms 10 runs Benchmark 2: ./go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img Time (mean ± σ): 579.4 ms ± 22.8 ms [User: 90.5 ms, System: 681.2 ms] Range (min … max): 556.0 ms … 631.3 ms 10 runs Summary './go-qcow2reader-example convert /tmp/images/test.qcow2 /tmp/tmp.img' ran 1.51 ± 0.09 times faster than './go-qcow2reader-example read /tmp/images/test.qcow2 >/tmp/tmp.img' Ubuntu 24.04 image in qcow2 compressed format: % hyperfine -w3 -r3 "./go-qcow2reader-example read /tmp/images/test.zlib.qcow2 >/tmp/tmp.img" \ "./go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img" Benchmark 1: ./go-qcow2reader-example read /tmp/images/test.zlib.qcow2 >/tmp/tmp.img Time (mean ± σ): 11.702 s ± 0.200 s [User: 10.423 s, System: 1.121 s] Range (min … max): 11.533 s … 11.923 s 3 runs Benchmark 2: ./go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img Time (mean ± σ): 2.161 s ± 0.027 s [User: 10.980 s, System: 1.032 s] Range (min … max): 2.139 s … 2.191 s 3 runs Summary './go-qcow2reader-example convert /tmp/images/test.zlib.qcow2 /tmp/tmp.img' ran 5.42 ± 0.11 times faster than './go-qcow2reader-example read /tmp/images/test.zlib.qcow2 >/tmp/tmp.img' 100 GiB empty sparse image in qcow2 format. Comparing to the read command is not useful since it writes 100 GiB of zeros, so I'm comparing 1 and 8 workers. % hyperfine -w3 "./go-qcow2reader-example convert -workers 1 /tmp/images/test.0p.qcow2 /tmp/tmp.img" \ "./go-qcow2reader-example convert -workers 8 /tmp/images/test.0p.qcow2 /tmp/tmp.img" Benchmark 1: ./go-qcow2reader-example convert -workers 1 /tmp/images/test.0p.qcow2 /tmp/tmp.img Time (mean ± σ): 3.050 s ± 0.023 s [User: 2.991 s, System: 0.054 s] Range (min … max): 3.036 s … 3.107 s 10 runs Benchmark 2: ./go-qcow2reader-example convert -workers 8 /tmp/images/test.0p.qcow2 /tmp/tmp.img Time (mean ± σ): 416.4 ms ± 3.4 ms [User: 3252.7 ms, System: 16.3 ms] Range (min … max): 412.0 ms … 421.9 ms 10 runs Summary './go-qcow2reader-example convert -workers 8 /tmp/images/test.0p.qcow2 /tmp/tmp.img' ran 7.32 ± 0.08 times faster than './go-qcow2reader-example convert -workers 1 /tmp/images/test.0p.qcow2 /tmp/tmp.img' Signed-off-by: Nir Soffer <[email protected]>

With this converting the default ubuntu 24.10 qcow2 compressed image to raw is 5.4 times faster: Before: % limactl create --tty=false 3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 317.58 MiB/s After: % limactl create --tty=false 3.50 GiB / 3.50 GiB [-------------------------------------] 100.00% 1.67 GiB/s Depends on lima-vm/go-qcow2reader#36 Signed-off-by: Nir Soffer <[email protected]>

AkihiroSuda

Thanks, will release v0.3.0

This was referenced Oct 24, 2024

Replace copySparse with go-qcow2reader/convert lima-vm/lima#2798

Merged

Poor performance compared with qemu-img convert #32

Closed

AkihiroSuda reviewed Oct 25, 2024

View reviewed changes

nirs commented Oct 25, 2024

View reviewed changes

convert/convert.go Outdated Show resolved Hide resolved

nirs added 2 commits October 25, 2024 15:08

Add convert package

ad60270

The convert package implements efficient conversion of qcow2 image to raw sparse image using multiple threads. This will be useful for users of this library that need to work with raw images. Signed-off-by: Nir Soffer <[email protected]>

nirs force-pushed the convert-package branch from 1614581 to 232960d Compare October 25, 2024 13:16

nirs added 2 commits October 25, 2024 16:17

nirs force-pushed the convert-package branch from 232960d to 46183d3 Compare October 25, 2024 13:19

nirs requested a review from AkihiroSuda October 25, 2024 13:40

AkihiroSuda approved these changes Oct 25, 2024

View reviewed changes

AkihiroSuda merged commit b008ef1 into lima-vm:master Oct 25, 2024
2 checks passed

nirs deleted the convert-package branch November 18, 2024 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel convert #36

Add parallel convert #36

nirs commented Oct 24, 2024 •

edited

Loading

AkihiroSuda Oct 25, 2024

nirs Oct 25, 2024

nirs Oct 25, 2024

AkihiroSuda Oct 25, 2024

AkihiroSuda Oct 25, 2024

nirs Oct 25, 2024

nirs Oct 25, 2024

AkihiroSuda Oct 25, 2024

nirs Oct 25, 2024

AkihiroSuda Oct 25, 2024

nirs Oct 25, 2024

nirs Oct 25, 2024

nirs commented Oct 25, 2024

nirs commented Oct 25, 2024

nirs commented Oct 25, 2024

nirs commented Oct 25, 2024

AkihiroSuda left a comment

Add parallel convert #36

Add parallel convert #36

Conversation

nirs commented Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirs commented Oct 25, 2024

Benchmarking number of workers for converting qcow2 zlib image

nirs commented Oct 25, 2024

Benchmarking number of workers for qcow2 image

nirs commented Oct 25, 2024

Benchmarking number of workers for 100 GiB empty image

nirs commented Oct 25, 2024

Comparing to qemu-img convert

Ubuntu 24.04 qcow2 zlib image

Ubuntu 24.04 qcow2 image

100g empty qcow2 image

AkihiroSuda left a comment

Choose a reason for hiding this comment

nirs commented Oct 24, 2024 •

edited

Loading