Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: track compressed size & compare to parquet(zstd)? & canonical #882

Merged
merged 2 commits into from
Sep 20, 2024

Conversation

danking
Copy link
Member

@danking danking commented Sep 19, 2024

We now track these six values:

  1. Compression time (s).
  2. Compression throughput (bytes/s).
  3. Compressed size (bytes).
  4. Compressed size as fraction of a Vortex Canonical array.
  5. Compressed Layout size as fraction of Parquet without block compression.
  6. Compressed Layout size as fraction of Parquet with Zstd.

It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't figure out how to ask criterion which benchmark regex is currently in use so, for example, cargo bench taxi will still run all the size benchmarks for every other dataset.

I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected by the benchmark-action GitHub action that we use.

Nevertheless, now, for each commit to develop, we should get all six numbers for the Taxi, Airline Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment datasets. They'll be displayed under Vortex
Compression
at the benchmarks site.

I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark names, but after a few commits, those plots should become useful measures of our compression performance in space and time.

We now track these six values:

1. Compression time (s).
2. Compression throughput (bytes/s).
3. Compressed size (bytes).
4. Compressed size as fraction of a Vortex Canonical array.
5. Compressed Layout size as fraction of Parquet without block compression.
6. Compressed Layout size as fraction of Parquet with Zstd.

It's a bit janky: I just unconditionally compute these values for several datasets. I couldn't
figure out how to ask criterion which benchmark regex is currently in use so, for example, `cargo
bench taxi` will still run all the size benchmarks for every other dataset.

I also had to do some janky jq parsing to convert from Criterion's JSON output to the style expected
by the benchmark-action GitHub action that we use.

Nevertheless, now, for each commit to `develop`, we should get all six numbers for the Taxi, Airline
Sentiment, Arade, Bimbo, CMSprovider, Euro2016, Food, HashTags, and TPC-H l_comment
datasets. They'll be displayed under [Vortex
Compression](https://spiraldb.github.io/vortex/dev/bench/#Vortex_Compression) at the benchmarks
site.

I might need to delete some old data form the gh-pages-bench branch since I changed some benchmark
names, but after a few commits, those plots should become useful measures of our compression
performance in space and time.
@danking danking added the benchmark Run benchmarks on this branch label Sep 19, 2024
@github-actions github-actions bot removed the benchmark Run benchmarks on this branch label Sep 19, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex bytes_at

Benchmark suite Current: 615466f Previous: a96ff2c Ratio
bytes_at/array_data 609.7744546778674 ns (0.13420378286656387) 613 ns/iter (± 8) 0.99
bytes_at/array_data #2 1039.372483865207 ns (0.5307530144946213) 1043 ns/iter (± 4) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex random_access

Benchmark suite Current: 615466f Previous: a96ff2c Ratio
vortex/tokio local disk 1245366.8620128417 ns (4668.183183866553) 1308917 ns/iter (± 29650) 0.95
vortex/localfs 1403735.5290472142 ns (4471.246426050318) 1457592 ns/iter (± 32225) 0.96
parquet/tokio local disk 194141199.46666664 ns (2304201.5900000036) 178158170 ns/iter (± 2466099) 1.09

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex DataFusion

Benchmark suite Current: 615466f Previous: a96ff2c Ratio
arrow/planning 816320.3626479161 ns (2024.9349698948208) 813880 ns/iter (± 4517) 1.00
arrow/exec 1771804.0215798502 ns (10301.68340335507) 1774262 ns/iter (± 18680) 1.00
vortex-pushdown-compressed/planning 515887.4864556427 ns (1043.680268734548) 516095 ns/iter (± 1831) 1.00
vortex-pushdown-compressed/exec 3078940.8429411757 ns (2820.7001838223077) 3209669 ns/iter (± 141970) 0.96
vortex-pushdown-uncompressed/planning 521224.8686081361 ns (5009.368536001508) 514579 ns/iter (± 1971) 1.01
vortex-pushdown-uncompressed/exec 2937165.6294444446 ns (2031.9434375003912) 3336867 ns/iter (± 9294) 0.88
vortex-nopushdown-compressed/planning 716020.948887611 ns (384.9729955360526) 710291 ns/iter (± 5322) 1.01
vortex-nopushdown-compressed/exec 8489314.503333332 ns (73138.21887499839) 14988542 ns/iter (± 251952) 0.57
vortex-nopushdown-uncompressed/planning 715991.6986297732 ns (370.5863519538543) 715546 ns/iter (± 3807) 1.00
vortex-nopushdown-uncompressed/exec 2007280.3079999995 ns (1379.8270149999298) 2001661 ns/iter (± 82038) 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@lwwmanning lwwmanning enabled auto-merge (squash) September 19, 2024 21:48
Comment on lines +55 to +72
cargo criterion --bench ${{ matrix.benchmark.id }} --message-format=json 2>&1 | tee out.json

cat out.json

sudo apt-get update && sudo apt-get install -y jq

jq --raw-input --compact-output '
fromjson?
| [ (if .mean != null then {name: .id, value: .mean.estimate, unit: .unit, range: ((.mean.upper_bound - .mean.lower_bound) / 2) } else {} end),
(if .throughput != null then {name: (.id + " throughput"), value: .throughput[].per_iteration, unit: .throughput[].unit, range: 0} else {} end),
{name, value, unit, range} ]
| .[]
| select(.value != null)
' \
out.json \
| jq --slurp --compact-output '.' >${{ matrix.benchmark.id }}.json

cat ${{ matrix.benchmark.id }}.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit excessive. I wonder if this is simpler if we write our own github action

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think my preferred solution is either a CSV or a JSON Line file that we just append to.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex Compression

Benchmark suite Current: 615466f Previous: a96ff2c Ratio
Yellow Taxi Trip Data Compression Time/taxi compression 2513565430.2 ns (10122599.299999952)
Yellow Taxi Trip Data Compression Time/taxi compression throughput 470808924 bytes
Yellow Taxi Trip Data Vortex-to-ParquetZstd Ratio/taxi 0.9560604643330857 ratio
Yellow Taxi Trip Data Vortex-to-ParquetUncompressed Ratio/taxi 0.6137144059032362 ratio
Yellow Taxi Trip Data Compression Ratio/taxi 0.10783895846460209 ratio
Yellow Taxi Trip Data Compression Size/taxi 50771544 bytes
Public BI Compression Time/AirlineSentiment compression 415039.5464639052 ns (491.88687187436153)
Public BI Compression Time/AirlineSentiment compression throughput 2020 bytes
Public BI Vortex-to-ParquetZstd Ratio/AirlineSentiment 6.400830737279335 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/AirlineSentiment 4.353107344632768 ratio
Public BI Compression Ratio/AirlineSentiment 0.6207920792079208 ratio
Public BI Compression Size/AirlineSentiment 1254 bytes
Public BI Compression Time/Arade compression 3131902697.3 ns (6480990.841249704)
Public BI Compression Time/Arade compression throughput 787023760 bytes
Public BI Vortex-to-ParquetZstd Ratio/Arade 0.4927803394425952 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/Arade 0.4398463104814441 ratio
Public BI Compression Ratio/Arade 0.1862664667201407 ratio
Public BI Compression Size/Arade 146596135 bytes
Public BI Compression Time/Bimbo compression 21855721191.1 ns (20350694.538749695)
Public BI Compression Time/Bimbo compression throughput 7121333608 bytes
Public BI Vortex-to-ParquetZstd Ratio/Bimbo 1.293293825007246 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/Bimbo 0.8768962136437118 ratio
Public BI Compression Ratio/Bimbo 0.06423232573827764 ratio
Public BI Compression Size/Bimbo 457419820 bytes
Public BI Compression Time/CMSprovider compression 12917920336.7 ns (26398202.20625019)
Public BI Compression Time/CMSprovider compression throughput 5149123964 bytes
Public BI Vortex-to-ParquetZstd Ratio/CMSprovider 1.2021505846266516 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/CMSprovider 0.7762200888869946 ratio
Public BI Compression Ratio/CMSprovider 0.17574964310958274 ratio
Public BI Compression Size/CMSprovider 904956699 bytes
Public BI Compression Time/Euro2016 compression 2219852099 ns (15588005.231250286)
Public BI Compression Time/Euro2016 compression throughput 393253221 bytes
Public BI Vortex-to-ParquetZstd Ratio/Euro2016 1.4705138909171633 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/Euro2016 0.6239071488283204 ratio
Public BI Compression Ratio/Euro2016 0.43458292742120985 ratio
Public BI Compression Size/Euro2016 170901136 bytes
Public BI Compression Time/Food compression 1095478080.3 ns (3527534.875)
Public BI Compression Time/Food compression throughput 332718229 bytes
Public BI Vortex-to-ParquetZstd Ratio/Food 1.2297872376838528 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/Food 0.6953516685794864 ratio
Public BI Compression Ratio/Food 0.13031750959458252 ratio
Public BI Compression Size/Food 43359011 bytes
Public BI Compression Time/HashTags compression 2930012702.6 ns (17763756.576250076)
Public BI Compression Time/HashTags compression throughput 804495592 bytes
Public BI Vortex-to-ParquetZstd Ratio/HashTags 1.6464093663569246 ratio
Public BI Vortex-to-ParquetUncompressed Ratio/HashTags 0.4680774335616459 ratio
Public BI Compression Ratio/HashTags 0.2652765038394393 ratio
Public BI Compression Size/HashTags 213413778 bytes
TPC-H l_comment Compression Time/chunked-without-fsst compression 187786756.78414685 ns (925951.3523437679)
TPC-H l_comment Compression Time/chunked-without-fsst compression throughput 183010921 bytes
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-without-fsst 3.2154759555157804 ratio
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-without-fsst 0.9983658315767541 ratio
TPC-H l_comment Compression Ratio/chunked-without-fsst 0.999965750677797 ratio
TPC-H l_comment Compression Size/chunked-without-fsst 183004653 bytes
TPC-H l_comment Compression Time/chunked-with-fsst compression 1134202541.95 ns (2623869.8625000715)
TPC-H l_comment Compression Time/chunked-with-fsst compression throughput 183010921 bytes
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-with-fsst 1.504212244020189 ratio
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-with-fsst 0.4670394456823924 ratio
TPC-H l_comment Compression Ratio/chunked-with-fsst 0.442999322428414 ratio
TPC-H l_comment Compression Size/chunked-with-fsst 81073714 bytes
TPC-H l_comment Compression Time/canonical-with-fsst compression 1131178437.95 ns (761932.415624857)
TPC-H l_comment Compression Time/canonical-with-fsst compression throughput 183010937 bytes
TPC-H l_comment Vortex-to-ParquetZstd Ratio/canonical-with-fsst 1.5059821792895995 ratio
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/canonical-with-fsst 0.46759301141944365 ratio
TPC-H l_comment Compression Ratio/canonical-with-fsst 0.44354151358724536 ratio
TPC-H l_comment Compression Size/canonical-with-fsst 81172948 bytes

This comment was automatically generated by workflow using github-action-benchmark.

@robert3005
Copy link
Member

Also can we not run benchmarks on every pr? I think label would be enough and then on every develop commit? It seems like a lot to run for every commit

@lwwmanning lwwmanning disabled auto-merge September 19, 2024 22:03
@lwwmanning
Copy link
Member

Also can we not run benchmarks on every pr? I think label would be enough and then on every develop commit? It seems like a lot to run for every commit

Am I missing something? This PR doesn't make it run on every PR...?

@robert3005
Copy link
Member

This pr doesn’t but they currently do run. This was mostly since we are making benchmark changes we should change that

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vortex benchmarks

Benchmark suite Current: 615466f Previous: a96ff2c Ratio
tpch_q1/vortex-in-memory-no-pushdown 464843920.35 ns (2703997.192499995) 456752113 ns/iter (± 3867547) 1.02
tpch_q1/vortex-in-memory-pushdown 532535430.5 ns (1441835.3449999988) 532735558 ns/iter (± 1607247) 1.00
tpch_q1/arrow 448641396.35 ns (983773.7437499762) 443097274 ns/iter (± 626790) 1.01
tpch_q1/parquet 651934200.8 ns (1167224.8499999642) 653884935 ns/iter (± 2663691) 1.00
tpch_q1/vortex-file-compressed 631224996.7 ns (1159530.3299999833) 625869224 ns/iter (± 2721859) 1.01
tpch_q1/vortex-file-uncompressed 636241881.9 ns (2468183.79125005) 631514808 ns/iter (± 9245882) 1.01
tpch_q2/vortex-in-memory-no-pushdown 146135948.23503968 ns (596119.4463889003) 146858416 ns/iter (± 2531945) 1.00
tpch_q2/vortex-in-memory-pushdown 144400001.83011904 ns (438025.6820833236) 143317300 ns/iter (± 2886213) 1.01
tpch_q2/arrow 122211692.76773807 ns (168044.8480624929) 122568135 ns/iter (± 397231) 1.00
tpch_q2/parquet 161668685.44722223 ns (492714.924999997) 159787707 ns/iter (± 5185870) 1.01
tpch_q2/vortex-file-compressed 156831594.42023808 ns (853652.4739612937) 156466031 ns/iter (± 1277797) 1.00
tpch_q2/vortex-file-uncompressed 166610068.98972222 ns (501684.8970833421) 162139847 ns/iter (± 3160753) 1.03
tpch_q3/vortex-in-memory-no-pushdown 152064205.7482143 ns (344621.85196428) 153229410 ns/iter (± 1562047) 0.99
tpch_q3/vortex-in-memory-pushdown 186610418.2 ns (955602.7458333224) 186029987 ns/iter (± 1669213) 1.00
tpch_q3/arrow 148540561.55555555 ns (353743.82150000334) 147328590 ns/iter (± 1417356) 1.01
tpch_q3/parquet 343069328.65 ns (1382273.2912499905) 336975915 ns/iter (± 2025628) 1.02
tpch_q3/vortex-file-compressed 311862426.95 ns (998395.625) 309258408 ns/iter (± 2757350) 1.01
tpch_q3/vortex-file-uncompressed 380565568.2 ns (1064147.7043749988) 375568417 ns/iter (± 4807910) 1.01
tpch_q4/vortex-in-memory-no-pushdown 109463963.10416666 ns (575272.8522916734) 106376311 ns/iter (± 304925) 1.03
tpch_q4/vortex-in-memory-pushdown 144525012.9798016 ns (331599.61648860574) 141994138 ns/iter (± 1248613) 1.02
tpch_q4/arrow 102683885.50023809 ns (381721.1955833286) 101065390 ns/iter (± 370932) 1.02
tpch_q4/parquet 219624434.13333336 ns (916865.9166666567) 214248841 ns/iter (± 2228222) 1.03
tpch_q4/vortex-file-compressed 275844832.4 ns (779808.6381250024) 262684851 ns/iter (± 1417453) 1.05
tpch_q4/vortex-file-uncompressed 322938093.15 ns (2089683.4512500167) 322371836 ns/iter (± 4220226) 1.00
tpch_q5/vortex-in-memory-no-pushdown 296875979.6 ns (1442079.2993750274) 296691840 ns/iter (± 6168057) 1.00
tpch_q5/vortex-in-memory-pushdown 311055426.75 ns (2625022.0974999964) 321813589 ns/iter (± 5635245) 0.97
tpch_q5/arrow 301113685 ns (1192929.6487500072) 289107585 ns/iter (± 2924141) 1.04
tpch_q5/parquet 463548366.3 ns (1005448.7349999845) 449018047 ns/iter (± 2489275) 1.03
tpch_q5/vortex-file-compressed 342553821.15 ns (1811957.981249988) 341880037 ns/iter (± 8522037) 1.00
tpch_q5/vortex-file-uncompressed 361094967.65 ns (1257259.4193750024) 356647316 ns/iter (± 5958948) 1.01
tpch_q6/vortex-in-memory-no-pushdown 38618631.72686508 ns (166164.57365079597) 40138218 ns/iter (± 630500) 0.96
tpch_q6/vortex-in-memory-pushdown 92286594.83333334 ns (140081.3341666609) 92149267 ns/iter (± 303889) 1.00
tpch_q6/arrow 36310182.61406084 ns (165533.02780538797) 36334469 ns/iter (± 211591) 1.00
tpch_q6/parquet 154528287.31761903 ns (505493.8115416467) 151921473 ns/iter (± 1264234) 1.02
tpch_q6/vortex-file-compressed 80680396.58406746 ns (245124.53606721014) 78859071 ns/iter (± 1115685) 1.02
tpch_q6/vortex-file-uncompressed 167328617.76924604 ns (1367708.262455359) 167141882 ns/iter (± 1751525) 1.00
tpch_q7/vortex-in-memory-no-pushdown 568360396.5 ns (1364090.5849999785) 562119306 ns/iter (± 3476977) 1.01
tpch_q7/vortex-in-memory-pushdown 632136235.7 ns (1477559.2749999762) 611059188 ns/iter (± 6446587) 1.03
tpch_q7/arrow 573491795.9 ns (1734030.0337500572) 553024994 ns/iter (± 2909226) 1.04
tpch_q7/parquet 733469206.1 ns (3208002.5250000358) 710209548 ns/iter (± 5017550) 1.03
tpch_q7/vortex-file-compressed 682726773.6 ns (2675624.6862499714) 672453257 ns/iter (± 5566775) 1.02
tpch_q7/vortex-file-uncompressed 759621166.1 ns (3345556.2650000453) 744071550 ns/iter (± 5659596) 1.02
tpch_q8/vortex-in-memory-no-pushdown 217474477.0333333 ns (868808.4662500024) 216237880 ns/iter (± 504152) 1.01
tpch_q8/vortex-in-memory-pushdown 234419944.0333333 ns (589083.5312500149) 230296027 ns/iter (± 963193) 1.02
tpch_q8/arrow 220631285.26666665 ns (352138.8029166907) 215487494 ns/iter (± 822806) 1.02
tpch_q8/parquet 494087190.7 ns (937395.3787499964) 482558982 ns/iter (± 1927968) 1.02
tpch_q8/vortex-file-compressed 264829153.8 ns (518212.8099999875) 272225347 ns/iter (± 3218905) 0.97
tpch_q8/vortex-file-uncompressed 297438551.9 ns (3614358.862499982) 307092746 ns/iter (± 4647118) 0.97
tpch_q9/vortex-in-memory-no-pushdown 412446957.85 ns (953962.8081250191) 405778945 ns/iter (± 3408198) 1.02
tpch_q9/vortex-in-memory-pushdown 414989723.45 ns (1026939.4056250155) 409784837 ns/iter (± 8637477) 1.01
tpch_q9/arrow 403587610.3 ns (1367421.574999988) 400998246 ns/iter (± 7870465) 1.01
tpch_q9/parquet 716911614.5 ns (2406505.4037500024) 687723525 ns/iter (± 2769586) 1.04
tpch_q9/vortex-file-compressed 464833005.2 ns (957100.875) 449724976 ns/iter (± 6436082) 1.03
tpch_q9/vortex-file-uncompressed 490149814.3 ns (1457817.625) 482884495 ns/iter (± 6141352) 1.02
tpch_q10/vortex-in-memory-no-pushdown 228002276.6 ns (483031.4987499863) 224740155 ns/iter (± 1207852) 1.01
tpch_q10/vortex-in-memory-pushdown 266292655.8 ns (608651.8568750024) 265222009 ns/iter (± 4544285) 1.00
tpch_q10/arrow 225509220.8666667 ns (501292.3099999577) 219076345 ns/iter (± 7024828) 1.03
tpch_q10/parquet 478092539.45 ns (1642956.1399999857) 481426698 ns/iter (± 4462077) 0.99
tpch_q10/vortex-file-compressed 473475771.1 ns (706985.8381249905) 474019593 ns/iter (± 4032038) 1.00
tpch_q10/vortex-file-uncompressed 407417004.75 ns (988742.6606250107) 408859777 ns/iter (± 3938984) 1.00
tpch_q11/vortex-in-memory-no-pushdown 224700780.53333336 ns (467540.4262499958) 219129162 ns/iter (± 1832776) 1.03
tpch_q11/vortex-in-memory-pushdown 225753908.73333335 ns (1244039.6466666758) 220793553 ns/iter (± 918518) 1.02
tpch_q11/arrow 177459193.11484125 ns (327176.177959308) 175455464 ns/iter (± 1125682) 1.01
tpch_q11/parquet 191560226.3 ns (932539.5670833439) 185576140 ns/iter (± 2442270) 1.03
tpch_q11/vortex-file-compressed 230677296.2333333 ns (574339.2933333367) 229509379 ns/iter (± 1644525) 1.01
tpch_q11/vortex-file-uncompressed 239172281.3666667 ns (1564077.60041669) 232738732 ns/iter (± 1873655) 1.03
tpch_q12/vortex-in-memory-no-pushdown 181748026.5152381 ns (119161.99357143044) 179897756 ns/iter (± 1962967) 1.01
tpch_q12/vortex-in-memory-pushdown 269784045.6 ns (167057.17499998212) 268815014 ns/iter (± 1804665) 1.00
tpch_q12/arrow 171955794.4640476 ns (166796.68347024918) 170395809 ns/iter (± 844024) 1.01
tpch_q12/parquet 365822822 ns (725210.1762500107) 365760882 ns/iter (± 5113024) 1.00
tpch_q12/vortex-file-compressed 613578776.5 ns (2318134.6500000358) 611089999 ns/iter (± 3516355) 1.00
tpch_q12/vortex-file-uncompressed 366636327.7 ns (477821.15125000477) 363970552 ns/iter (± 2594091) 1.01
tpch_q13/vortex-in-memory-no-pushdown 190910588.36666667 ns (1207546.7833333164) 171007772 ns/iter (± 4051193) 1.12
tpch_q13/vortex-in-memory-pushdown 186951125.5 ns (1458854.6433333158) 169154998 ns/iter (± 6178477) 1.11
tpch_q13/arrow 181922969.06813493 ns (2688358.8900689334) 179394695 ns/iter (± 11628817) 1.01
tpch_q13/parquet 335171461.75 ns (1085733.2525000274) 343672528 ns/iter (± 12687514) 0.98
tpch_q13/vortex-file-compressed 219688230.26666665 ns (485762.50458332896) 221913395 ns/iter (± 3831446) 0.99
tpch_q13/vortex-file-uncompressed 223551243.83333334 ns (1028771.7762500048) 212642135 ns/iter (± 1772424) 1.05
tpch_q14/vortex-in-memory-no-pushdown 39528689.122539684 ns (129273.55500794202) 37691250 ns/iter (± 577567) 1.05
tpch_q14/vortex-in-memory-pushdown 88687423.32335317 ns (166015.09192808717) 90997812 ns/iter (± 1433438) 0.97
tpch_q14/arrow 41340823.23357143 ns (99626.081876982) 39535216 ns/iter (± 535028) 1.05
tpch_q14/parquet 226617225.69999996 ns (592733.4029166698) 227708765 ns/iter (± 1731670) 1.00
tpch_q14/vortex-file-compressed 91236196.1070238 ns (300096.2808660716) 90305017 ns/iter (± 631935) 1.01
tpch_q14/vortex-file-uncompressed 146631346.60845238 ns (509835.0686994046) 144470755 ns/iter (± 708935) 1.01
tpch_q15/vortex-in-memory-no-pushdown 70106102.60710318 ns (455189.1161160767) 71237883 ns/iter (± 1431115) 0.98
tpch_q15/vortex-in-memory-pushdown 122245527.3222619 ns (656851.0220178589) 124403185 ns/iter (± 854376) 0.98
tpch_q15/arrow 68284718.06132935 ns (596973.3149846196) 66195092 ns/iter (± 1472663) 1.03
tpch_q15/parquet 305362649.85 ns (1483126.974999994) 295437640 ns/iter (± 1150294) 1.03
tpch_q15/vortex-file-compressed 166438923.76698413 ns (1290993.2371706367) 157382540 ns/iter (± 411484) 1.06
tpch_q15/vortex-file-uncompressed 281698067.55 ns (1131462.449999988) 275891348 ns/iter (± 6001720) 1.02
tpch_q16/vortex-in-memory-no-pushdown 123305430.06269841 ns (154485.40356349945) 118867963 ns/iter (± 629106) 1.04
tpch_q16/vortex-in-memory-pushdown 128782849.80015874 ns (231569.55085118115) 124703683 ns/iter (± 1081895) 1.03
tpch_q16/arrow 108126961.2945238 ns (299681.0858184621) 107392480 ns/iter (± 705642) 1.01
tpch_q16/parquet 126170624.60535714 ns (165293.12652678043) 123485091 ns/iter (± 3669376) 1.02
tpch_q16/vortex-file-compressed 140641349.5671429 ns (388245.0111131072) 138265217 ns/iter (± 832715) 1.02
tpch_q16/vortex-file-uncompressed 140168289.80829364 ns (179870.77833086252) 137767991 ns/iter (± 578019) 1.02
tpch_q17/vortex-in-memory-no-pushdown 721306336.4 ns (5332481.033749998) 649086157 ns/iter (± 16184725) 1.11
tpch_q17/vortex-in-memory-pushdown 725077462.5 ns (5307768.612499952) 654515157 ns/iter (± 14489515) 1.11
tpch_q17/arrow 653677774.7 ns (5063423.765000045) 567239351 ns/iter (± 11560937) 1.15
tpch_q17/parquet 606152570.1 ns (3143144.972500026) 595915976 ns/iter (± 6246602) 1.02
tpch_q17/vortex-file-compressed 649481190 ns (2199092.0737499595) 612595783 ns/iter (± 2547793) 1.06
tpch_q17/vortex-file-uncompressed 709473433.2 ns (6775567.780000031) 667861291 ns/iter (± 8580372) 1.06
tpch_q18/vortex-in-memory-no-pushdown 1116718640.7 ns (6570566.079999924) 1034223912 ns/iter (± 23067942) 1.08
tpch_q18/vortex-in-memory-pushdown 1119784218.6 ns (9246197.169999957) 994376340 ns/iter (± 5989420) 1.13
tpch_q18/arrow 1106066697.9 ns (3555563.296250105) 1004004887 ns/iter (± 4588695) 1.10
tpch_q18/parquet 1294545462 ns (8926828.143750072) 1186490542 ns/iter (± 18651939) 1.09
tpch_q18/vortex-file-compressed 1128273292.8 ns (4267479.612499952) 1065012633 ns/iter (± 14258649) 1.06
tpch_q18/vortex-file-uncompressed 1167309511.9 ns (7950887.348750114) 1135720332 ns/iter (± 32401940) 1.03
tpch_q19/vortex-in-memory-no-pushdown 166112949.20166668 ns (325413.97500000894) 165874289 ns/iter (± 732945) 1.00
tpch_q19/vortex-in-memory-pushdown 259734381.3 ns (386084.44750000536) 260523501 ns/iter (± 1589056) 1.00
tpch_q19/arrow 153134599.72357142 ns (315480.78404167295) 153540134 ns/iter (± 536959) 1.00
tpch_q19/parquet 479759370.1 ns (673078.1499999762) 477195361 ns/iter (± 3004902) 1.01
tpch_q19/vortex-file-compressed 788425593.3 ns (1567368.550000012) 757301083 ns/iter (± 6091638) 1.04
tpch_q19/vortex-file-uncompressed 369395385.7 ns (860898.25) 374268649 ns/iter (± 1709655) 0.99
tpch_q20/vortex-in-memory-no-pushdown 277640939.25 ns (854839.7300000191) 267235874 ns/iter (± 6059222) 1.04
tpch_q20/vortex-in-memory-pushdown 300427448.5 ns (1794995.0856249928) 299117478 ns/iter (± 6408411) 1.00
tpch_q20/arrow 255485112.5 ns (1111991.8399999887) 256806267 ns/iter (± 8213627) 0.99
tpch_q20/parquet 370123782.65 ns (1381059.2018750012) 377456624 ns/iter (± 5175006) 0.98
tpch_q20/vortex-file-compressed 335169433.7 ns (2389999.5) 327554400 ns/iter (± 7045439) 1.02
tpch_q20/vortex-file-uncompressed 422524469.55 ns (986920.4343750179) 416259584 ns/iter (± 5453969) 1.02
tpch_q21/vortex-in-memory-no-pushdown 867259857.2 ns (1559337.7662500143) 839899381 ns/iter (± 9947739) 1.03
tpch_q21/vortex-in-memory-pushdown 927417861.3 ns (3553385.9662500024) 904150556 ns/iter (± 17125645) 1.03
tpch_q21/arrow 862319475.5 ns (2946472.407499969) 834172632 ns/iter (± 6626356) 1.03
tpch_q21/parquet 1022675321.3 ns (3296504.0487499833) 987756274 ns/iter (± 14734132) 1.04
tpch_q21/vortex-file-compressed 1248097842.4 ns (1881255.3212499619) 1173609856 ns/iter (± 5160528) 1.06
tpch_q21/vortex-file-uncompressed 1373740637.7 ns (4932148.789999843) 1328981880 ns/iter (± 9120789) 1.03
tpch_q22/vortex-in-memory-no-pushdown 97738027.65321428 ns (409296.28778125346) 97935522 ns/iter (± 453887) 1.00
tpch_q22/vortex-in-memory-pushdown 98659317.80075397 ns (1492063.2876845077) 97710962 ns/iter (± 829935) 1.01
tpch_q22/arrow 67136981.28025793 ns (241315.54801810905) 69526644 ns/iter (± 272226) 0.97
tpch_q22/parquet 98115234.07230158 ns (519330.05611111224) 96659104 ns/iter (± 1019394) 1.02
tpch_q22/vortex-file-compressed 103439443.46646826 ns (360589.396852687) 103294285 ns/iter (± 932885) 1.00
tpch_q22/vortex-file-uncompressed 110684712.36460316 ns (458732.90418849885) 111618839 ns/iter (± 1236391) 0.99

This comment was automatically generated by workflow using github-action-benchmark.

// .with_limit(100_000)
.build()
.unwrap();
let reader = builder.with_batch_size(BATCH_SIZE).build().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i believe this is already the DEFAULT_BATCH_SIZE for the reader

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Parquet crate claims otherwise:

    /// Set the size of [`RecordBatch`] to produce. Defaults to 1024
    /// If the batch_size more than the file row count, use the file row count.
    pub fn with_batch_size(self, batch_size: usize) -> Self {

We define BATCH_SIZE as:

pub const BATCH_SIZE: usize = 65_536;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I confused this with our LayoutReaderBuilder

}

fn parquet_written_size(array: &Array, filepath: &str, compression: Compression) -> usize {
let mut file = std::fs::File::create(Path::new(filepath)).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but the ArrowWriter just needs something that impls Write, so instead of writing to file you could just give it a Vec<u8> and not worry about pushing random files

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Man, does everyone come to hate deriving impl's in Rust? /s

Yeah, you're totally right this is silly unnecessary pollution of the filesystem and causes the tests to blow out their disk. I switched to a Cursor<Vec> which tracks how many bytes have been written.

n_bytes
}

fn vortex_written_size(array: &Array, filepath: &str) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here: you can use a Vec instead of file if all you wanna do is measure the compressed size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though i could understand if you wanna dump them to disk to poke at manually, persisting would make that easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did poke at them while I was iterating but at least for testing and benchmarking it seems best to be fast and not use disk.

@robert3005
Copy link
Member

Apologies I was confused. Just didn't realize we had benchmark label added in quite a lot of prs

@danking danking enabled auto-merge (squash) September 20, 2024 14:38
@danking danking merged commit a87c720 into develop Sep 20, 2024
5 checks passed
@danking danking deleted the dk/bench-compression branch September 20, 2024 14:54
danking added a commit that referenced this pull request Oct 3, 2024
Ratio benchmarks are not supported by criterion. Instead, back in #882,
I added some code to generate ratios and print them in the format
expected by our GitHub Action. Unfortunately, this code currently runs
unconditionally which is annoying when you are filtering benchmarks.

Now you can do this:

```
BENCH_VORTEX_RATIOS=AirlineSentiment cargo bench --bench compress_noci -- AirlineSentiment
```

And you'll receive both ratios and compression time benchmarks for
AirlineSentiment and no output for other datasets.

But when you do this:

```
cargo bench --bench compress_noci -- AirlineSentiment
```

You only get compression time benchmarks for AirlineSentiment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants