-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add ChunkedCompressor which compresses chunk n+1 like chunk n #996
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex bytes_at
Benchmark suite | Current: 8bde547 | Previous: 2e227c3 | Ratio |
---|---|---|---|
bytes_at/array_data |
590.1076021511925 ns (1.088358376172664 ) |
589.7671824965236 ns (0.32772535470076036 ) |
1.00 |
bytes_at/array_view |
860.6663670956597 ns (0.7447788055580418 ) |
872.994025821359 ns (2.1701489010094974 ) |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFusion
Benchmark suite | Current: 8bde547 | Previous: 48982b8 | Ratio |
---|---|---|---|
arrow/planning |
800599.0077997915 ns (1064.2650901224115 ) |
797594.0942498998 ns (639.0693948336993 ) |
1.00 |
arrow/exec |
1741433.6727374913 ns (4969.691193088307 ) |
1732872.0781913844 ns (3710.234210194787 ) |
1.00 |
vortex-pushdown-compressed/planning |
503262.6144018927 ns (2237.3521829429956 ) |
503102.8830239783 ns (634.6895160177664 ) |
1.00 |
vortex-pushdown-compressed/exec |
2437655.7523809513 ns (1728.250976189971 ) |
2441262.2766666673 ns (3351.3878690474667 ) |
1.00 |
vortex-pushdown-uncompressed/planning |
506595.16015602107 ns (2763.4398337128514 ) |
504164.5095690734 ns (1916.879407954315 ) |
1.00 |
vortex-pushdown-uncompressed/exec |
3356240.2475 ns (9659.251773437485 ) |
3411699.438666667 ns (14241.643891666085 ) |
0.98 |
vortex-nopushdown-compressed/planning |
815244.3122822371 ns (776.4202180341235 ) |
813406.607646144 ns (744.5407101008459 ) |
1.00 |
vortex-nopushdown-compressed/exec |
14028352.8 ns (54161.73068750091 ) |
13242627.1675 ns (45222.493375000544 ) |
1.06 |
vortex-nopushdown-uncompressed/planning |
807976.5238915366 ns (627.2652247538208 ) |
809567.8450173971 ns (665.6596324790735 ) |
1.00 |
vortex-nopushdown-uncompressed/exec |
1764410.3465024584 ns (744.2067217733711 ) |
1757501.5737760141 ns (2017.416460580076 ) |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random Access
Benchmark suite | Current: 8bde547 | Previous: 48982b8 | Ratio |
---|---|---|---|
random-access/vortex-tokio-local-disk |
972858.5261898807 ns (3761.6191938016564 ) |
979460.9233854066 ns (3539.2417611633427 ) |
0.99 |
random-access/vortex-local-fs |
1121842.1350636475 ns (8993.60766926629 ) |
1111160.8693063792 ns (4993.151020677877 ) |
1.01 |
random-access/parquet-tokio-local-disk |
199474273.56666666 ns (2782111.415833339 ) |
193613640.9 ns (1985554.7433333248 ) |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vortex Compression
Benchmark suite | Current: 8bde547 | Previous: 2e227c3 | Ratio |
---|---|---|---|
Yellow Taxi Trip Data Compression Time/taxi compression |
1261340341.2 ns (3074006.9850000143 ) |
2485148812.6 ns (3108514.5662498474 ) |
0.51 |
Yellow Taxi Trip Data Compression Time/taxi compression throughput |
470808924 bytes |
470808924 bytes |
1 |
Yellow Taxi Trip Data Compression Time/taxi decompression |
678197765.3 ns (13092907.293749988 ) |
705486381.4 ns (18021002.982499957 ) |
0.96 |
Yellow Taxi Trip Data Compression Time/taxi decompression throughput |
470808924 bytes |
470808924 bytes |
1 |
Yellow Taxi Trip Data Vortex-to-ParquetZstd Ratio/taxi |
0.9431026462983648 ratio |
0.9352743107185189 ratio |
1.01 |
Yellow Taxi Trip Data Vortex-to-ParquetUncompressed Ratio/taxi |
0.6054206802747777 ratio |
0.6003953139789968 ratio |
1.01 |
Yellow Taxi Trip Data Compression Ratio/taxi |
0.10768370227408859 ratio |
0.10671793893184574 ratio |
1.01 |
Yellow Taxi Trip Data Compression Size/taxi |
50698448 bytes |
50243758 bytes |
1.01 |
Public BI Compression Time/AirlineSentiment compression |
340276.6654419865 ns (137.1909899164748 ) |
332157.12124522065 ns (231.8147614702757 ) |
1.02 |
Public BI Compression Time/AirlineSentiment compression throughput |
2020 bytes |
2020 bytes |
1 |
Public BI Compression Time/AirlineSentiment decompression |
29046.34266082869 ns (23.412117846495676 ) |
28876.6758236396 ns (31.08676957276839 ) |
1.01 |
Public BI Compression Time/AirlineSentiment decompression throughput |
2020 bytes |
2020 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/AirlineSentiment |
5.183040330920372 ratio |
5.183040330920372 ratio |
1 |
Public BI Vortex-to-ParquetUncompressed Ratio/AirlineSentiment |
3.5295774647887326 ratio |
3.5295774647887326 ratio |
1 |
Public BI Compression Ratio/AirlineSentiment |
0.6316831683168317 ratio |
0.6316831683168317 ratio |
1 |
Public BI Compression Size/AirlineSentiment |
1276 bytes |
1276 bytes |
1 |
Public BI Compression Time/Arade compression |
1966288390.8 ns (1738860.0412501097 ) |
3418131307 ns (1434399.0675001144 ) |
0.58 |
Public BI Compression Time/Arade compression throughput |
787023760 bytes |
787023760 bytes |
1 |
Public BI Compression Time/Arade decompression |
842063289.5 ns (23017898.431249976 ) |
1105020022 ns (33557330.98999995 ) |
0.76 |
Public BI Compression Time/Arade decompression throughput |
787023760 bytes |
787023760 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/Arade |
0.47880751063090327 ratio |
0.48093732922218235 ratio |
1.00 |
Public BI Vortex-to-ParquetUncompressed Ratio/Arade |
0.4273757600806135 ratio |
0.4292768013530913 ratio |
1.00 |
Public BI Compression Ratio/Arade |
0.18234782517874681 ratio |
0.181872502807285 ratio |
1.00 |
Public BI Compression Size/Arade |
143512071 bytes |
143137981 bytes |
1.00 |
Public BI Compression Time/Bimbo compression |
10264941244.4 ns (9819225.873750687 ) |
23901757627.6 ns (31364429.449998856 ) |
0.43 |
Public BI Compression Time/Bimbo compression throughput |
7121333608 bytes |
7121333608 bytes |
1 |
Public BI Compression Time/Bimbo decompression |
8193031738.5 ns (185310175.4499998 ) |
7031355738.9 ns (191661599.06124973 ) |
1.17 |
Public BI Compression Time/Bimbo decompression throughput |
7121333608 bytes |
7121333608 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/Bimbo |
1.184289432537036 ratio |
1.2383901140321767 ratio |
0.96 |
Public BI Vortex-to-ParquetUncompressed Ratio/Bimbo |
0.8030464268312414 ratio |
0.8397311744699478 ratio |
0.96 |
Public BI Compression Ratio/Bimbo |
0.05907529349943635 ratio |
0.06210353963802113 ratio |
0.95 |
Public BI Compression Size/Bimbo |
420694873 bytes |
442260024 bytes |
0.95 |
Public BI Compression Time/CMSprovider compression |
10821541276 ns (23367884.44999981 ) |
14116868181.9 ns (5918312.6000003815 ) |
0.77 |
Public BI Compression Time/CMSprovider compression throughput |
5149123964 bytes |
5149123964 bytes |
1 |
Public BI Compression Time/CMSprovider decompression |
13570967305.6 ns (72663203.76500034 ) |
11518712779.6 ns (21784165.73374939 ) |
1.18 |
Public BI Compression Time/CMSprovider decompression throughput |
5149123964 bytes |
5149123964 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/CMSprovider |
1.1279400868720821 ratio |
1.1276190532527797 ratio |
1.00 |
Public BI Vortex-to-ParquetUncompressed Ratio/CMSprovider |
0.7283176621973946 ratio |
0.7281103688687698 ratio |
1.00 |
Public BI Compression Ratio/CMSprovider |
0.16423246418465912 ratio |
0.16467576833036604 ratio |
1.00 |
Public BI Compression Size/CMSprovider |
845653317 bytes |
847935945 bytes |
1.00 |
Public BI Compression Time/Euro2016 compression |
2257260382.6 ns (3826588.0499999523 ) |
2277433363.6 ns (12927430.812500238 ) |
0.99 |
Public BI Compression Time/Euro2016 compression throughput |
393253221 bytes |
393253221 bytes |
1 |
Public BI Compression Time/Euro2016 decompression |
552111958.3 ns (3071199.832499981 ) |
560077686 ns (4033143.1399999857 ) |
0.99 |
Public BI Compression Time/Euro2016 decompression throughput |
393253221 bytes |
393253221 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/Euro2016 |
1.4358682833042142 ratio |
1.4128101754384608 ratio |
1.02 |
Public BI Vortex-to-ParquetUncompressed Ratio/Euro2016 |
0.6092113090837246 ratio |
0.5994281971916199 ratio |
1.02 |
Public BI Compression Ratio/Euro2016 |
0.4302218162886961 ratio |
0.4231260676692588 ratio |
1.02 |
Public BI Compression Size/Euro2016 |
169186115 bytes |
166395689 bytes |
1.02 |
Public BI Compression Time/Food compression |
1055159433.9 ns (663161.6499999762 ) |
1207370368.9 ns (824466.0887500048 ) |
0.87 |
Public BI Compression Time/Food compression throughput |
332718229 bytes |
332718229 bytes |
1 |
Public BI Compression Time/Food decompression |
746950255.8 ns (2272630.3725000024 ) |
467131772.3 ns (4595361.173749983 ) |
1.60 |
Public BI Compression Time/Food decompression throughput |
332718229 bytes |
332718229 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/Food |
1.236669829514 ratio |
1.2785727678320182 ratio |
0.97 |
Public BI Vortex-to-ParquetUncompressed Ratio/Food |
0.6992649944053574 ratio |
0.7229586733722262 ratio |
0.97 |
Public BI Compression Ratio/Food |
0.12943194945895195 ratio |
0.13463195609880454 ratio |
0.96 |
Public BI Compression Size/Food |
43064369 bytes |
44794506 bytes |
0.96 |
Public BI Compression Time/HashTags compression |
2373049708.5 ns (7246496.75 ) |
2997774033.6 ns (2281680.3499999046 ) |
0.79 |
Public BI Compression Time/HashTags compression throughput |
804495592 bytes |
804495592 bytes |
1 |
Public BI Compression Time/HashTags decompression |
1258681251.5 ns (9903010.890000105 ) |
1071558788.7 ns (2689239.6987499595 ) |
1.17 |
Public BI Compression Time/HashTags decompression throughput |
804495592 bytes |
804495592 bytes |
1 |
Public BI Vortex-to-ParquetZstd Ratio/HashTags |
1.580631604116457 ratio |
1.5580046438524044 ratio |
1.01 |
Public BI Vortex-to-ParquetUncompressed Ratio/HashTags |
0.44938505986447086 ratio |
0.44295205051154446 ratio |
1.01 |
Public BI Compression Ratio/HashTags |
0.2601474291235147 ratio |
0.2565506225918513 ratio |
1.01 |
Public BI Compression Size/HashTags |
209287460 bytes |
206393845 bytes |
1.01 |
TPC-H l_comment Compression Time/chunked-without-fsst compression |
187048194.26845238 ns (246107.07552826405 ) |
190880273.59718257 ns (385362.03238095343 ) |
0.98 |
TPC-H l_comment Compression Time/chunked-without-fsst compression throughput |
183010921 bytes |
183010921 bytes |
1 |
TPC-H l_comment Compression Time/chunked-without-fsst decompression |
37466954.81672123 ns (140156.76451494172 ) |
59977066.21305555 ns (69043.00287013873 ) |
0.62 |
TPC-H l_comment Compression Time/chunked-without-fsst decompression throughput |
183010921 bytes |
183010921 bytes |
1 |
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-without-fsst |
3.2155813496160457 ratio |
3.215581274971146 ratio |
1.00 |
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-without-fsst |
0.9983752622125646 ratio |
0.9983844541826632 ratio |
1.00 |
TPC-H l_comment Compression Ratio/chunked-without-fsst |
0.999965750677797 ratio |
0.999965750677797 ratio |
1 |
TPC-H l_comment Compression Size/chunked-without-fsst |
183004653 bytes |
183004653 bytes |
1 |
TPC-H l_comment Compression Time/chunked-with-fsst compression |
775114728.1 ns (1831565.856249988 ) |
1173074786.15 ns (1020173.8162499666 ) |
0.66 |
TPC-H l_comment Compression Time/chunked-with-fsst compression throughput |
183010921 bytes |
183010921 bytes |
1 |
TPC-H l_comment Compression Time/chunked-with-fsst decompression |
96815261.37376985 ns (156820.81744047254 ) |
115640037.75162697 ns (338498.6541284993 ) |
0.84 |
TPC-H l_comment Compression Time/chunked-with-fsst decompression throughput |
183010921 bytes |
183010921 bytes |
1 |
TPC-H l_comment Vortex-to-ParquetZstd Ratio/chunked-with-fsst |
1.348305390366293 ratio |
1.3490213532663975 ratio |
1.00 |
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/chunked-with-fsst |
0.41862251372066633 ratio |
0.4188486722276101 ratio |
1.00 |
TPC-H l_comment Compression Ratio/chunked-with-fsst |
0.4174715562466351 ratio |
0.41769830227781873 ratio |
1.00 |
TPC-H l_comment Compression Size/chunked-with-fsst |
76401854 bytes |
76443351 bytes |
1.00 |
TPC-H l_comment Compression Time/canonical-with-fsst compression |
776325542.7 ns (776120.8637500405 ) |
1176725398 ns (1345117.0943750143 ) |
0.66 |
TPC-H l_comment Compression Time/canonical-with-fsst compression throughput |
183010937 bytes |
183010937 bytes |
1 |
TPC-H l_comment Compression Time/canonical-with-fsst decompression |
107811972.56408732 ns (155691.01287698746 ) |
116820541.63132274 ns (148413.64523676038 ) |
0.92 |
TPC-H l_comment Compression Time/canonical-with-fsst decompression throughput |
183010937 bytes |
183010937 bytes |
1 |
TPC-H l_comment Vortex-to-ParquetZstd Ratio/canonical-with-fsst |
1.348305982420875 ratio |
1.3490190311794283 ratio |
1.00 |
TPC-H l_comment Vortex-to-ParquetUncompressed Ratio/canonical-with-fsst |
0.41862248632566124 ratio |
0.4188486585225473 ratio |
1.00 |
TPC-H l_comment Compression Ratio/canonical-with-fsst |
0.4174635639398972 ratio |
0.41769030995125717 ratio |
1.00 |
TPC-H l_comment Compression Size/canonical-with-fsst |
76400398 bytes |
76441895 bytes |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TPC-H
Benchmark suite | Current: 8bde547 | Previous: 48982b8 | Ratio |
---|---|---|---|
tpch_q1/vortex-in-memory-no-pushdown |
458661572.95 ns (1581287.449999988 ) |
475637040 ns (1517252.974999994 ) |
0.96 |
tpch_q1/vortex-in-memory-pushdown |
506276111.6 ns (630704.0500000119 ) |
521426212.5 ns (1729808.375 ) |
0.97 |
tpch_q1/arrow |
443220503.7 ns (1303618.6849999726 ) |
450510118.55 ns (876897.474999994 ) |
0.98 |
tpch_q1/parquet |
654706321.3 ns (1265215.9749999642 ) |
660605886.4 ns (1458236.300000012 ) |
0.99 |
tpch_q1/vortex-file-compressed |
646775214.4 ns (2412691.4287499785 ) |
677695129.4 ns (2460168.9000000358 ) |
0.95 |
tpch_q1/vortex-file-uncompressed |
530053163.7 ns (1001407.65625 ) |
540495825.3 ns (2568529.964999974 ) |
0.98 |
tpch_q2/vortex-in-memory-no-pushdown |
122027347.46948414 ns (372707.55017659813 ) |
129933571.43746033 ns (351019.82209920883 ) |
0.94 |
tpch_q2/vortex-in-memory-pushdown |
121143814.67964284 ns (271388.7580952272 ) |
130644405.99146824 ns (350026.14810565114 ) |
0.93 |
tpch_q2/arrow |
119434595.44738097 ns (138003.56400892884 ) |
123568997.48119049 ns (433027.3109136969 ) |
0.97 |
tpch_q2/parquet |
154565265.10325396 ns (357009.600391835 ) |
160081233.3619444 ns (657426.053562507 ) |
0.97 |
tpch_q2/vortex-file-compressed |
153142441.89940473 ns (588772.200297609 ) |
157597771.2007143 ns (430226.5305148661 ) |
0.97 |
tpch_q2/vortex-file-uncompressed |
152957328.1438492 ns (461338.9486527592 ) |
155789825.81273812 ns (642459.4571428299 ) |
0.98 |
tpch_q3/vortex-in-memory-no-pushdown |
152998606.62420636 ns (469900.1927241981 ) |
157938413.6459921 ns (1591380.145059526 ) |
0.97 |
tpch_q3/vortex-in-memory-pushdown |
182745246.86666664 ns (851451.0300000161 ) |
201503770.86666667 ns (923278.232916668 ) |
0.91 |
tpch_q3/arrow |
146315378.64757937 ns (1983486.99350892 ) |
149360912.58079365 ns (380187.28815476596 ) |
0.98 |
tpch_q3/parquet |
333445719.5 ns (1552429.4306250215 ) |
345195376.15 ns (1475443.75 ) |
0.97 |
tpch_q3/vortex-file-compressed |
320334931.4 ns (885575.3056249917 ) |
334018087.55 ns (630083.4174999893 ) |
0.96 |
tpch_q3/vortex-file-uncompressed |
281047458.5 ns (903515.5731250048 ) |
281323756 ns (796877.9249999821 ) |
1.00 |
tpch_q4/vortex-in-memory-no-pushdown |
109940850.85511903 ns (666701.3702202365 ) |
112499341.55984128 ns (465144.08063491434 ) |
0.98 |
tpch_q4/vortex-in-memory-pushdown |
137449277.52083334 ns (333845.4873125106 ) |
140301869.2661508 ns (755437.5224007815 ) |
0.98 |
tpch_q4/arrow |
100342903.2215476 ns (641422.7404761836 ) |
104418114.41341269 ns (598286.1615872979 ) |
0.96 |
tpch_q4/parquet |
212400071.40000004 ns (343130.7729166746 ) |
221724245.7 ns (457172.26291663945 ) |
0.96 |
tpch_q4/vortex-file-compressed |
280776711.1 ns (632521.8731249869 ) |
300388040.65 ns (687584.0037499964 ) |
0.93 |
tpch_q4/vortex-file-uncompressed |
227314694.2 ns (1539251.0333333313 ) |
230161104.6666667 ns (899370.8174999952 ) |
0.99 |
tpch_q5/vortex-in-memory-no-pushdown |
299491474.45 ns (1355037.125 ) |
334475550.7 ns (2354253.521874994 ) |
0.90 |
tpch_q5/vortex-in-memory-pushdown |
304784864.45 ns (1300102.4381249845 ) |
344987419 ns (1669842.3312499821 ) |
0.88 |
tpch_q5/arrow |
286750952.25 ns (1645150.6762500107 ) |
303436601.5 ns (2075220.046875 ) |
0.95 |
tpch_q5/parquet |
448302046.85 ns (3916005.793124974 ) |
476941171.15 ns (3691502.1168750226 ) |
0.94 |
tpch_q5/vortex-file-compressed |
342337952.65 ns (1326007.165625006 ) |
363391593.65 ns (1584685.1056249738 ) |
0.94 |
tpch_q5/vortex-file-uncompressed |
363684551.95 ns (8812639.118124992 ) |
353914915.6 ns (1053852.5387500226 ) |
1.03 |
tpch_q6/vortex-in-memory-no-pushdown |
41603456.42294974 ns (182772.11982903257 ) |
44778494.074140206 ns (85811.4083483778 ) |
0.93 |
tpch_q6/vortex-in-memory-pushdown |
97716476.41551587 ns (187918.62841667235 ) |
89586284.80684523 ns (443487.71927975863 ) |
1.09 |
tpch_q6/arrow |
34615588.17537037 ns (97717.27435185388 ) |
37005146.847989425 ns (169664.4284649454 ) |
0.94 |
tpch_q6/parquet |
151655986.37984127 ns (573351.7103075385 ) |
155052097.64424604 ns (376248.49499852955 ) |
0.98 |
tpch_q6/vortex-file-compressed |
68046295.29882936 ns (303663.2867395878 ) |
66187226.59113095 ns (196660.5936093703 ) |
1.03 |
tpch_q6/vortex-file-uncompressed |
186519198.2 ns (621111.9166666567 ) |
179031836.4965873 ns (688448.0650793612 ) |
1.04 |
tpch_q7/vortex-in-memory-no-pushdown |
553711537.8 ns (1901372.7999999523 ) |
580161489.9 ns (1612201.949999988 ) |
0.95 |
tpch_q7/vortex-in-memory-pushdown |
596317589 ns (3179533.831250012 ) |
621744961 ns (1415843.89624995 ) |
0.96 |
tpch_q7/arrow |
539411364.2 ns (2751725.216250032 ) |
567546965.4 ns (2812628.199999988 ) |
0.95 |
tpch_q7/parquet |
705021551.5 ns (5662838.998749971 ) |
731253638.6 ns (4841214.297499955 ) |
0.96 |
tpch_q7/vortex-file-compressed |
710152592.9 ns (4173051.5299999714 ) |
735665612.8 ns (6358194.650000036 ) |
0.97 |
tpch_q7/vortex-file-uncompressed |
692615813 ns (7463894.910000026 ) |
698880925.8 ns (6529443 ) |
0.99 |
tpch_q8/vortex-in-memory-no-pushdown |
224540604.6666667 ns (837648.8770833164 ) |
243305453 ns (987143.1820833534 ) |
0.92 |
tpch_q8/vortex-in-memory-pushdown |
234711886.8333333 ns (1590926.732916668 ) |
251473633.85 ns (1027957.9350000024 ) |
0.93 |
tpch_q8/arrow |
211288106.26666665 ns (523796.4208333492 ) |
211835851 ns (389751.28333331645 ) |
1.00 |
tpch_q8/parquet |
475504996.65 ns (1827408.7493749857 ) |
477367042.65 ns (1372305.4249999821 ) |
1.00 |
tpch_q8/vortex-file-compressed |
276769617.15 ns (725978.453125 ) |
280260864.8 ns (540696.890625 ) |
0.99 |
tpch_q8/vortex-file-uncompressed |
312526895.5 ns (1461782.553124994 ) |
269643866.5 ns (1133461.4831249714 ) |
1.16 |
tpch_q9/vortex-in-memory-no-pushdown |
407621152.45 ns (1957592.2999999821 ) |
431026554.55 ns (1833737.4962500036 ) |
0.95 |
tpch_q9/vortex-in-memory-pushdown |
410616641.8 ns (1667903.0006250143 ) |
426227276.9 ns (1292383.6550000012 ) |
0.96 |
tpch_q9/arrow |
392653816 ns (1416758.5799999833 ) |
385214933.55 ns (1005303.2031249702 ) |
1.02 |
tpch_q9/parquet |
687487542.6 ns (3918905.801249981 ) |
681882978.4 ns (2177764.949999988 ) |
1.01 |
tpch_q9/vortex-file-compressed |
490003252.55 ns (1297619.5256250203 ) |
477064290.25 ns (1385514.0962499976 ) |
1.03 |
tpch_q9/vortex-file-uncompressed |
491157705.7 ns (9223984.131249994 ) |
425118433.35 ns (1836270.4306250215 ) |
1.16 |
tpch_q10/vortex-in-memory-no-pushdown |
227867225.8666667 ns (715541.8333333284 ) |
223931342.2333334 ns (741702.0920833349 ) |
1.02 |
tpch_q10/vortex-in-memory-pushdown |
261362941.75 ns (2983009.064374998 ) |
257067734.5 ns (377787.1887500137 ) |
1.02 |
tpch_q10/arrow |
218820063.3333333 ns (1258935.745416686 ) |
215985482.86666664 ns (831520.7520833313 ) |
1.01 |
tpch_q10/parquet |
475821709.5 ns (1321776.391874969 ) |
471104058.9 ns (1256398.5506249964 ) |
1.01 |
tpch_q10/vortex-file-compressed |
448838731.85 ns (837384.4050000012 ) |
457415718.85 ns (1118964.056250006 ) |
0.98 |
tpch_q10/vortex-file-uncompressed |
360419231.15 ns (1388645.3256249726 ) |
351030059.8 ns (958430.275000006 ) |
1.03 |
tpch_q11/vortex-in-memory-no-pushdown |
179594004.06972224 ns (384873.2217847258 ) |
177476560.44349208 ns (621545.1311220229 ) |
1.01 |
tpch_q11/vortex-in-memory-pushdown |
179580394.9488889 ns (612704.7365972549 ) |
176887504.22222224 ns (509638.6070000082 ) |
1.02 |
tpch_q11/arrow |
178735344.94884923 ns (540426.9074553698 ) |
173017429.78448415 ns (836755.5850307643 ) |
1.03 |
tpch_q11/parquet |
186316513.7 ns (1078456.3708333373 ) |
184787052.26666662 ns (690430.183333382 ) |
1.01 |
tpch_q11/vortex-file-compressed |
225914565.2666667 ns (891167.8158333153 ) |
226099285.1 ns (887589.3204166591 ) |
1.00 |
tpch_q11/vortex-file-uncompressed |
227205860.6 ns (1919262.7666666508 ) |
223814317.33333334 ns (1064055.3733333647 ) |
1.02 |
tpch_q12/vortex-in-memory-no-pushdown |
199302546.86666664 ns (711640.3129166514 ) |
195305898.49999997 ns (244922.04291664064 ) |
1.02 |
tpch_q12/vortex-in-memory-pushdown |
249151292.3 ns (243294.64999999106 ) |
234062840.8666667 ns (461240.2487499863 ) |
1.06 |
tpch_q12/arrow |
166545971.28333333 ns (325576.8990416825 ) |
164663743.2840873 ns (440381.75235167146 ) |
1.01 |
tpch_q12/parquet |
353498020.2 ns (849258.224999994 ) |
354188285.6 ns (872954.5799999535 ) |
1.00 |
tpch_q12/vortex-file-compressed |
637903620.6 ns (2428782.3937499523 ) |
656868583.3 ns (890714.2012499571 ) |
0.97 |
tpch_q12/vortex-file-uncompressed |
351174999.05 ns (739019.5075000226 ) |
346862421.55 ns (1062543.5249999762 ) |
1.01 |
tpch_q13/vortex-in-memory-no-pushdown |
175377885.08099204 ns (5465737.433413714 ) |
164371801.7034921 ns (895269.1245238036 ) |
1.07 |
tpch_q13/vortex-in-memory-pushdown |
166679356.15781745 ns (2186408.8559240997 ) |
169881779.38027778 ns (3636627.2586250007 ) |
0.98 |
tpch_q13/arrow |
165012782.9222619 ns (2347232.5475237966 ) |
167784218.5671032 ns (2187023.831388384 ) |
0.98 |
tpch_q13/parquet |
313941935 ns (2870156.2631250024 ) |
317857778.2 ns (3799881.0493749976 ) |
0.99 |
tpch_q13/vortex-file-compressed |
206209097.23333335 ns (838473.7558333427 ) |
208638770.66666666 ns (1607727.550000012 ) |
0.99 |
tpch_q13/vortex-file-uncompressed |
194094280.3 ns (990201.1250000298 ) |
193641610.3 ns (1547785.7333333492 ) |
1.00 |
tpch_q14/vortex-in-memory-no-pushdown |
46562845.50263889 ns (184416.51152777672 ) |
45386539.54400794 ns (216845.576000005 ) |
1.03 |
tpch_q14/vortex-in-memory-pushdown |
90035597.62347223 ns (261097.7120295167 ) |
80187881.41924605 ns (602629.1283606067 ) |
1.12 |
tpch_q14/arrow |
37456439.3088492 ns (259786.67374652997 ) |
38490941.49415344 ns (386777.6300767213 ) |
0.97 |
tpch_q14/parquet |
224115278.33333334 ns (2168462.5691666454 ) |
225398861.5666667 ns (991819.9741666615 ) |
0.99 |
tpch_q14/vortex-file-compressed |
124657373.09436509 ns (427436.0742807463 ) |
123005495.09623018 ns (289599.80898809433 ) |
1.01 |
tpch_q14/vortex-file-uncompressed |
159454532.5102778 ns (581623.9065763801 ) |
153355184.1057143 ns (177446.5630684644 ) |
1.04 |
tpch_q15/vortex-in-memory-no-pushdown |
74183038.95841269 ns (208212.83235615492 ) |
74140631.44396825 ns (956807.9174583331 ) |
1.00 |
tpch_q15/vortex-in-memory-pushdown |
126416712.25055556 ns (465003.9616041705 ) |
104862534.42071429 ns (209020.78954464942 ) |
1.21 |
tpch_q15/arrow |
63349733.750972226 ns (206356.43922222778 ) |
62083618.38583334 ns (119847.99166666344 ) |
1.02 |
tpch_q15/parquet |
294324466 ns (1295691.331250012 ) |
296440986.25 ns (559746.1331249774 ) |
0.99 |
tpch_q15/vortex-file-compressed |
228159097.5333333 ns (1513795.9666666687 ) |
215966946.20000005 ns (257837.98666667938 ) |
1.06 |
tpch_q15/vortex-file-uncompressed |
322348954.45 ns (1315276.153124988 ) |
303869392.1 ns (451749.6999999881 ) |
1.06 |
tpch_q16/vortex-in-memory-no-pushdown |
106893224.62630951 ns (542444.4827633947 ) |
102810830.13087301 ns (201740.30115079135 ) |
1.04 |
tpch_q16/vortex-in-memory-pushdown |
123375044.80079365 ns (199240.01614880562 ) |
120718832.32777777 ns (133460.53209721297 ) |
1.02 |
tpch_q16/arrow |
105604173.86603174 ns (206021.0807936564 ) |
101815612.55956349 ns (66089.36590326577 ) |
1.04 |
tpch_q16/parquet |
122527336.29091272 ns (372233.48891071975 ) |
119217624.70896825 ns (103236.00382243097 ) |
1.03 |
tpch_q16/vortex-file-compressed |
135957588.44492063 ns (346797.41950298846 ) |
134610219.3376984 ns (442894.0186170712 ) |
1.01 |
tpch_q16/vortex-file-uncompressed |
133221802.34329367 ns (1132464.9658467248 ) |
128793331.99599203 ns (349453.52372122556 ) |
1.03 |
tpch_q17/vortex-in-memory-no-pushdown |
584686431.2 ns (8669903.080000043 ) |
551557853.6 ns (3007991.949999988 ) |
1.06 |
tpch_q17/vortex-in-memory-pushdown |
728248210 ns (13343961.193750024 ) |
630440861.1 ns (4312127.960000038 ) |
1.16 |
tpch_q17/arrow |
589499254.5 ns (7930660.876250029 ) |
514949517.4 ns (3271836.3787499964 ) |
1.14 |
tpch_q17/parquet |
590765855.4 ns (1541587.3887500167 ) |
583153129.7 ns (2322976.317499995 ) |
1.01 |
tpch_q17/vortex-file-compressed |
735124050.4 ns (3033209.2249999642 ) |
683693123.5 ns (2372760.5374999642 ) |
1.08 |
tpch_q17/vortex-file-uncompressed |
654834332.2 ns (5551130.774999976 ) |
595890466.8 ns (1193761.2975000143 ) |
1.10 |
tpch_q18/vortex-in-memory-no-pushdown |
1045330899.6 ns (15327446.090000033 ) |
994400166.7 ns (4006744.261250019 ) |
1.05 |
tpch_q18/vortex-in-memory-pushdown |
1035943404.9 ns (8714938.77000004 ) |
1001946644.7 ns (2342776.569999993 ) |
1.03 |
tpch_q18/arrow |
1031796352.4 ns (5868771.75 ) |
971919056.2 ns (3115116.4450000525 ) |
1.06 |
tpch_q18/parquet |
1212295635.4 ns (5474788.442499995 ) |
1149244274.9 ns (2721623.563750148 ) |
1.05 |
tpch_q18/vortex-file-compressed |
1156615178.2 ns (3855669.797499895 ) |
1098965163.1 ns (3107538.589999914 ) |
1.05 |
tpch_q18/vortex-file-uncompressed |
1039551076.4 ns (5317063.349999964 ) |
987141789.1 ns (2366368.807500005 ) |
1.05 |
tpch_q19/vortex-in-memory-no-pushdown |
160736092.91357142 ns (600137.599023819 ) |
156585450.10876986 ns (769088.1493685395 ) |
1.03 |
tpch_q19/vortex-in-memory-pushdown |
258407110.6 ns (407397.8118749857 ) |
236800835.2666667 ns (520555.35249997675 ) |
1.09 |
tpch_q19/arrow |
150465092.7822619 ns (477975.92738094926 ) |
147184344.04031748 ns (870414.6070198417 ) |
1.02 |
tpch_q19/parquet |
472223197.2 ns (511820.47499999404 ) |
468287932.35 ns (576167.2381249964 ) |
1.01 |
tpch_q19/vortex-file-compressed |
974279132.4 ns (3229831.4412499666 ) |
931061925.3 ns (1844214.8612499833 ) |
1.05 |
tpch_q19/vortex-file-uncompressed |
331590247.95 ns (804831.5749999881 ) |
311332555.6 ns (331283.7949999869 ) |
1.07 |
tpch_q20/vortex-in-memory-no-pushdown |
247910094.4333333 ns (2671678.03458333 ) |
234512821.4666667 ns (323341.1741666347 ) |
1.06 |
tpch_q20/vortex-in-memory-pushdown |
276279322.15 ns (1287096.193749994 ) |
249917827.26666665 ns (302344.134583354 ) |
1.11 |
tpch_q20/arrow |
250246731.8 ns (3035598.948333308 ) |
231271449.53333336 ns (197476.71666666865 ) |
1.08 |
tpch_q20/parquet |
368941239.35 ns (3561709.7337500155 ) |
345706455.85 ns (1255735.6424999833 ) |
1.07 |
tpch_q20/vortex-file-compressed |
409160771.75 ns (2758693.832499981 ) |
375926813.65 ns (907056.8824999928 ) |
1.09 |
tpch_q20/vortex-file-uncompressed |
401264470.05 ns (3522521.599999994 ) |
377629282.4 ns (631064.875 ) |
1.06 |
tpch_q21/vortex-in-memory-no-pushdown |
856134885.3 ns (9073509.764999986 ) |
817282143.3 ns (2448491.350000024 ) |
1.05 |
tpch_q21/vortex-in-memory-pushdown |
915742677.4 ns (6607757.503750026 ) |
872439065.5 ns (729677.8912500143 ) |
1.05 |
tpch_q21/arrow |
839057934.1 ns (5704750.399999976 ) |
809272300 ns (1285844.3912499547 ) |
1.04 |
tpch_q21/parquet |
975227938.5 ns (4049811.751249969 ) |
944800972.4 ns (1495333.6087499857 ) |
1.03 |
tpch_q21/vortex-file-compressed |
1221140078.9 ns (4768124.185000062 ) |
1210333787.2 ns (2048530.6812499762 ) |
1.01 |
tpch_q21/vortex-file-uncompressed |
1084796720.8 ns (5227527.850000024 ) |
1058461536.5 ns (3659947.0099999905 ) |
1.02 |
tpch_q22/vortex-in-memory-no-pushdown |
68379145.81097223 ns (513375.85478471965 ) |
66225045.47083334 ns (287177.0767291635 ) |
1.03 |
tpch_q22/vortex-in-memory-pushdown |
67733655.44964285 ns (240337.75190030038 ) |
65703235.51831349 ns (160617.2424503863 ) |
1.03 |
tpch_q22/arrow |
67933498.16742063 ns (340934.168194443 ) |
65189679.325238094 ns (220482.54561904818 ) |
1.04 |
tpch_q22/parquet |
94709499.13654761 ns (489956.4975595251 ) |
93157582.84702381 ns (563673.7252901718 ) |
1.02 |
tpch_q22/vortex-file-compressed |
106155791.11083335 ns (734560.3570833206 ) |
102542645.06178573 ns (570395.9921428636 ) |
1.04 |
tpch_q22/vortex-file-uncompressed |
103406283.44623016 ns (306420.66084524244 ) |
101254610.4936508 ns (283236.2650059536 ) |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
cc: @lwwmanning this is a summary of where throughputs, speeds, and ratios are with the soup of changes I've been working on. I want to understand why this PR is slowing down Taxi, Bimbo, and Food; however, it does deliver notable reductions in time for string heavy datasets like CMSprovider, HashTags, and l_comment. Sizes never change more than 3% in either direction and only one increases in size (HashTags). cleaned up summary of compression benchmark
dataset sizes & throughputs
|
Updated summary for 68faec3 Any ratio outside (0.8, 1.2) is bolded.
|
@@ -59,7 +59,7 @@ impl EncodingCompressor for FSSTCompressor { | |||
// between 2-3x depending on the text quality. | |||
// | |||
// It's not worth running a full compression step unless the array is large enough. | |||
if array.nbytes() < 10 * FSST_SYMTAB_MAX_SIZE { | |||
if array.nbytes() < 5 * FSST_SYMTAB_MAX_SIZE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why my changes encountered this, but I was getting samples that were hundreds of bytes too small for FSST which triggered this PR to compress poorly.
We may want to think more broadly about how to estimate FSST compression ratio on a tiny sample, but for now this seems reasonable unless we're directly calling compress
on a small array (rather than a sample thereof).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a10y any guidance on how you arrived at 10x multiplier or just a guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I had chatted 1:1 with Dan about this, it was fairly arbitrary
68faec3
to
8bde547
Compare
Will beat me to develop so I rebased. I'll re-run benchmarks. |
@danking I wouldn't obsses about benchmarks before merging. Running them to spot check locally is fine and if some new trend develops we can investigate |
|
||
let (arrays, trees) = array | ||
.children() | ||
.zip(children_trees) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I prefer zip_eq
from itertools personally, since it asserts that the two things being zipped are of equal length rather than truncating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
let ratio = (compressed_chunk.nbytes() as f32) / (chunk.nbytes() as f32); | ||
let exceeded_target_ratio = previous | ||
.as_ref() | ||
.map(|(_, target_ratio)| ratio > target_ratio * 1.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this magic constant 1.2 should be a field of ChunkedCompressor
, and we can have a default ChunkedCompressor
(look at BitpackedCompressor or RunEndCompressor for examples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ChunkedArray::try_new(compressed_chunks, array.dtype().clone())?.into_array(), | ||
Some(CompressionTree::new_with_metadata( | ||
self, | ||
vec![child], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if child is None, should this just be empty...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this is the diff. I don't find either of these particularly palatable but I think the fix is to sort out how compressors pass information from one invocation of compress to the next.
(docs) # g diff
diff --git a/vortex-sampling-compressor/src/compressors/chunked.rs b/vortex-sampling-compressor/src/compressors/chunked.rs
index c0f5aebe..9adf4ab4 100644
--- a/vortex-sampling-compressor/src/compressors/chunked.rs
+++ b/vortex-sampling-compressor/src/compressors/chunked.rs
@@ -78,16 +78,16 @@ fn like_into_parts(
vortex_bail!("chunked array compression tree must be ChunkedCompressorMetadata")
};
- if children.len() != 1 {
- vortex_bail!("chunked array compression tree must have one child")
+ if (children.len() == 1) != target_ratio.is_some() {
+ vortex_bail!("chunked array compression tree must have a child iff it has a ratio")
}
- let child = children.remove(0);
-
- match (child, target_ratio) {
- (None, None) => Ok(None),
- (Some(child), Some(ratio)) => Ok(Some((child, *ratio))),
- (..) => vortex_bail!("chunked array compression tree must have a child iff it has a ratio"),
+ if children.len() == 0 {
+ return Ok(None);
+ } else if children.len() == 1 {
+ return Ok(Some((children.remove(0).unwrap(), target_ratio.unwrap())));
+ } else {
+ vortex_bail!("chunked array compression tree must have at most one child")
}
}
@@ -141,16 +141,16 @@ impl ChunkedCompressor {
}
}
- let (child, ratio) = match previous {
- Some((child, ratio)) => (Some(child), Some(ratio)),
- None => (None, None),
+ let (children, ratio) = match previous {
+ Some((child, ratio)) => (vec![Some(child)], Some(ratio)),
+ None => (vec![], None),
};
Ok(CompressedArray::new(
ChunkedArray::try_new(compressed_chunks, array.dtype().clone())?.into_array(),
Some(CompressionTree::new_with_metadata(
self,
- vec![child],
+ children,
Arc::new(ChunkedCompressorMetadata(ratio)),
)),
))
} | ||
|
||
fn can_compress(&self, array: &Array) -> Option<&dyn EncodingCompressor> { | ||
ChunkedArray::try_from(array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array.is_encoding(&Chunked::ID).then_some(self)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
fn compress_array(&self, array: &Array) -> VortexResult<CompressedArray<'a>> { | ||
let mut rng = StdRng::seed_from_u64(self.options.rng_seed); | ||
|
||
if array.encoding().id() == Constant::ID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
array.is_encoding(&Constant::ID)?
The primary idea is that chunk n+1 more often than not has a distribution of values similar to chunk n. We ought to reuse chunk n's compression scheme if the ratio is "good" before attempting a full sampling pass. This has the potential to both increase throughput and also permit us to invest in a more extensive search on the first chunk.
This PR introduces
ChunkedCompressor
andStructCompressor
. Their existence means that compression trees now fully represent an array. For example, if I have aChunked(Struct(foo=Chunked(U64), ...))
, theChunkedCompressor
will attempt to compress all the U64 chunks similarly and then it will pass up the ratio and encoding tree of the last chunk to theStructCompressor
. Eventually the outerChunkedCompressor
can attempt to reuse on the second outer chunk all the encodings from all the fields of the first outer chunk.This PR looks best with whitespace ignored.
The
CompressionTree
(particularly the metadata) is not so ergonomic, but I focused on throughput improvement rather than a refactor.benchmarks
Any ratio outside (0.8, 1.2) is bolded.