Release v0.4.0 · kuzudb/kuzu

Changes

Breaking Changes

Renaming of list-like types:
- FIXED-LIST to ARRAY
- VAR-LIST to LIST
Import/Export database
Copy from subquery
Bulk insertion into non-empty database
External database extensions
- DuckDB
- Postgres
Scan from pandas pyarrow backend (#3058)

Usability Improvements

CLI improvements (#2869, #2876, #2930, #2953, #3253)
Functions
- List_reverse (#2927)
- Levenshtein (#2950)
- Array functions for similarity search (#3087)
- initCap & concat (#3161)
- Coalesce and ifnull (#3235)
- Utility hash functions (#3212)
Export query result to Polars (#2985, contributed by @alexander-beedie)
Python API linting improvemetns (#3023, contributed by @alexander-beedie))
Progress bar (#3051)
Support read after update in the same statement (#3126)

Performance Improvements

Python import cache (#2905)
Internal ID compression (#3116)
Avoid busy loop when max threads has reached (#3233, contributed by @ted-wq-x)

What's Changed

move apis from connection to clientcontext by @hououou in #2951
Allow quotes on struct keys by @acquamarin in #2967
Fix functions for casting string to var-list by @manh9203 in #2970
Implemented start_node and end_node functions by @MSebanc in #2978
Fix issue-2942 by @andyfengHKU in #2977
Import database by @hououou in #2964
Minor CLI Truncation Fix by @MSebanc in #2980
Fixed start and end node tests by @MSebanc in #2981
Refactor: unify many_one and many_many storage by @ray6080 in #2912
Refactor: unify CopyNode and CopyRel operator by @ray6080 in #2955
Support Polars DataFrame export from QueryResult by @alexander-beedie in #2985
Clean up transaction pointer in physical operator by @ray6080 in #2990
More efficient ColumnChunk string dictionary caching by @benjaminwinger in #2994
Fix setting of column chunk capacity by @ray6080 in #2996
Rework CSV_TO_PARQUET testing feature by @manh9203 in #2993
Avoid moving DictionaryChunks by @benjaminwinger in #2999
Fix broken links in README to website due to sub-domain changes by @ray6080 in #3000
Re-write partitioner to use ColumnChunks instead of ValueVectors by @benjaminwinger in #2979
Abstract client config by @andyfengHKU in #3010
Support use of QueryResult as a context manager, and add a get_schema method by @alexander-beedie in #3009
Pass client context to binder by @andyfengHKU in #3015
Refactor cast functions by @andyfengHKU in #3016
Clean up unique_ptr of LogicalType in NodeGroup and BatchInsert by @ray6080 in #3018
Combine append(ValueVector) with appendOne by @ray6080 in #3017
Import cache fix and revert revert by @mxwli in #3025
Fix issue-2984 by @andyfengHKU in #3026
Add multiplaform test report bot by @mewim in #3027
Python API typing, lint, config/makefile by @alexander-beedie in #3023
Fix unicode conversion for pandas dataframe by @mewim in #3029
Update LICENSE by @semihsalihoglu-uw in #3031
Rewrite the Hash Index overflow file to support multiple copies by @benjaminwinger in #3012
Add copy from subquery by @andyfengHKU in #3020
Fix issue-3004 by @andyfengHKU in #3036
Optimise Python unit test runtime (~7x speedup) by @alexander-beedie in #3032
Add more parameter types for Node.js API by @mewim in #3037
Insert into the hash index builder one chunk at a time by @benjaminwinger in #2997
Allow CI workflow to be manually dispatched by @mewim in #3043
Bump extensions version to 0.2.0 by @mewim in #3041
First-pass lint/format for Python shell tests by @alexander-beedie in #3034
Bump master branch version to 0.3.2.1 by @mewim in #3044
Fixed failing shell tests by @MSebanc in #3045
Add shell tests to CI by @mewim in #3039
Fix rel csr sliding out-of-place commit and null strings by @ray6080 in #3055
Refactor: separate insertions and updates in rel table local storage by @ray6080 in #2982
Fix issue 3042 by @ray6080 in #3046
Update Debian version in build workflows by @mewim in #3056
Implement duckdb scanner extension by @acquamarin in #3052
Copy table function instead of passing raw pointer by @andyfengHKU in #3067
Add scalar_func_rewrite_t by @andyfengHKU in #3069
Remove the constraint on HashIndexBuilder's template parameter by @benjaminwinger in #3030
Remove unnecessary components for pip package by @mewim in #3074
Fix Hash index split slot ID when reserving a number of slots which are a power of two by @benjaminwinger in #3066
Implement catalog cache in postgres scanner by @acquamarin in #3071
Rework FIXED_LIST by @manh9203 in #3057
Implemented Progress Bar by @MSebanc in #3051
Replace ValueVector with ColumnChunk in LocalStorage by @ray6080 in #3028
Exclude extension files from the rust crate by @benjaminwinger in #3076
Add include for cstdint by @mewim in #3085
Fix rel insert and add sanityCheck for column chunk by @ray6080 in #3081
Fix node insert by @ray6080 in #3082
Refactor the registration of arithmetic functions by @manh9203 in #3079
Allowed for progress bar to be configurable by CALL by @MSebanc in #3080
Implement array functions by @acquamarin in #3087
Remove underscore from the badges in README by @mewim in #3094
Fix python prepared statement null value by @acquamarin in #3098
Refactor string functions by @manh9203 in #3091
Arrow chunk_size as keyword argument by @prrao87 in #3084
Update rustdoc to show how to enable parallel compilation by @prrao87 in #3099
Improve copy-to-parquet perf by @acquamarin in #3105
Refactor list functions by @manh9203 in #3100
Refactor cast functions by @manh9203 in #3107
QueryResult get_as_pl should always return a single chunk by @alexander-beedie in #3110
Add standard Python module __version__ attr by @alexander-beedie in #3111
Fix DuckDB build for macOS ARM and 32-bit by @mewim in #3115
Pandas pyarrow backend by @mxwli in #3058
Add pull request template by @andyfengHKU in #3118
Added customizable delay before displaying progress bar by @MSebanc in #3092
Hash index cleanup by @benjaminwinger in #3088
Fix launch database using homedir by @acquamarin in #3108
Replace DUMMY_TRANSACTION by @hououou in #3106
fix IMPORT_DATABASE path by @hououou in #3063
Enable compression for INTERNAL_ID by @ray6080 in #3116
Close issue 1646 by @ray6080 in #3122
Refactor Partitioner to use ChunkedNodeGroupCollection by @ray6080 in #3123
Replace with client context by @hououou in #3121
Improve the performance of VAR_LIST storage layout by @hououou in #3093
Fix issue #3127 by @acquamarin in #3130
Fix issue-3129 by @andyfengHKU in #3131
Refactor scalar function registration by @manh9203 in #3119
Support multiple COPY statements on rel tables by @ray6080 in #2989
initialize readfds via FD_ZERO before use by @neeraj9 in #3132
Table states by @ray6080 in #3072
Support read after update by @andyfengHKU in #3126
Factor out benchmark workflow and enable manual trigger for it by @mewim in #3144
Implement postgres-scanner by @acquamarin in #3139
Python List and Map Parameter Support by @mxwli in #3090
Cache DiskArray write header in-memory by @benjaminwinger in #3109
Fix postgres scanner on windows by @acquamarin in #3148
Refactor path functions and RDF functions by @manh9203 in #3134
Refactor aggregate functions by @manh9203 in #3136
Pandas Pyarrow Backend Bugfix and Tests by @mxwli in #3152
List Auxiliary Buffer NullMask Fix by @mxwli in #3156
Add support to compute hash on list of struct by @acquamarin in #3157
Prepare Statement Improvement by @hououou in #3140
Resolve ANY Resolution Occurring at End of Python List Parameter by @mxwli in #3160
Fix export test by @hououou in #3164
Implement initcap/concat functions by @acquamarin in #3161
Support extend from unwind node by @andyfengHKU in #3153
Add Pyarrow Map Scanning by @mxwli in #3158
Fix export database regression by @andyfengHKU in #3171
Fix hash aggregate edge case by @andyfengHKU in #3172
Added progress for in_query_call operators by @MSebanc in #3120
Fixed shell incorrect command seg fault by @MSebanc in #3173
Cache FileInfo when replaying WAL by @benjaminwinger in #3137
Support join hash table on aggregate types by @acquamarin in #3174
Fix scan after delete bug by @andyfengHKU in #3176
Refactor sel vector interface by @andyfengHKU in #3177
Fix issue 3151: disable null on internalID columns by @ray6080 in #3165
Rework DDL operators by @ray6080 in #3178
Refactor table functions by @manh9203 in #3155
Rename VAR_LIST to LIST by @manh9203 in #3170
Remove unused keywords in test runner by @hououou in #3193
Reorder extension tests for CI pipeline by @mewim in #2987
Added progress for aggregate scan and order by scan by @MSebanc in #3192
Fix is null executor bug by @andyfengHKU in #3197
Fix order by radix sort bug by @acquamarin in #3201
Updated shell result truncation by @MSebanc in #3206
Fix broken links in README.md by @prrao87 in #3203
skip empty history file line by @neeraj9 in #3184
Merge duplicate key fix by @acquamarin in #3207
Implemented progress for in memory RDF scan by @MSebanc in #3208
Rework multiple query result by @hououou in #3191
Fix constant compression in-place check for bools by @benjaminwinger in #3211
Replace Slack link with Discord in contributing guideline by @mewim in #3217
Fix pyarrow segfaulting on fedora 39 by @mxwli in #3213
Bump clang-format to v18 and enable auto format by @mewim in #3222
Check for format changes on master branch by @mewim in #3223
CMAKE_CXX_FLAGS handling fails when variable is empty by @zaddach in #3228
Remove extension test from clang-build-test job by @mewim in #3231
Add list look up test by @hououou in #3210
Fix optional match merge by @andyfengHKU in #3216
List offset Column Refactor by @hououou in #3219
change LogicalType.toString for nested types by @mxwli in #3209
Add utility hash functions by @manh9203 in #3212
Add DATE TO DATE and TIMESTAMP TO DATE casting functions. by @mxwli in #3220
Separate shadow pages from wal records and rework wal to use serializer by @ray6080 in #3204
Add distinct aggregate over node and relationships by @andyfengHKU in #3236
Optimize task scheduler by @ted-wq-x in #3233
remove DataTypeInfo and use LogicalType and column names by @russell-liu in #2539
Add Physical Type ARRAY by @manh9203 in #3175
Update CONTRIBUTING.md by @semihsalihoglu-uw in #3241
Add configuration to optimize recursive computation by @andyfengHKU in #3242
Add format for tools by @mewim in #3244
Support allocations of larger-than-256KB memory buffers by @ray6080 in #3243
Fix canCommitInPlace for string dict offsets by @ray6080 in #3249
Fix clang tidy and clangd diagnostics check workflow by @mewim in #3254
Binder copy read rework by @andyfengHKU in #3251
Rewrite transaction copy/ddl tests as end to end tests by @ray6080 in #3255
Windows shell improvements by @MSebanc in #3253
Add back skipped tests on update/copy by @ray6080 in #3256
Remove file system from catalog and statistics by @andyfengHKU in #3258
Fix #3154 by @mewim in #3263
Fix problematic to_arrow tests by @mxwli in #3257
Fix Pyarrow Backend Scanning by @mxwli in #3265
Remove 256K page size limit for ftable by @andyfengHKU in #3266
Fix NullMask setNullRange by @ray6080 in #3267
Update duckdb scanning grammar by @andyfengHKU in #3271
Fix extension ci by @acquamarin in #3272
Support scan duckdb array column by @acquamarin in #3269
fix: #3276 by @phf-1 in #3277
Add sync interface to file system and sync wal file when flushed by @ray6080 in #3261
Handle exceptions when flushing WAL by @benjaminwinger in #3283
Fix #3274 by @ted-wq-x in #3285
Fix issue 2469, 2986 & 3185 by @andyfengHKU in #3284
Optimize jni calling overhead by @ted-wq-x in #3288
Implement Implicit Casting of Nested Types and Type Combination with MaxLogicalType by @mxwli in #3234
Move tests into system temporary directory by @benjaminwinger in #3290
String utf8 test by @hououou in #3287
Add Missing get_as_arrow and get_as_df Types by @mxwli in #3296
Hash index multi copy by @benjaminwinger in #3189
Parquet compression by @acquamarin in #3286
Add database close methods for Node.js and Python APIs by @mewim in #3289
Implement use database by @acquamarin in #3300
Fix brotli build issue by @mewim in #3303
Pass nullMask to setValuesFromUncompressed by @benjaminwinger in #3247
Remove parameter map for Node.js API by @mewim in #3304
CALL to be readonly=true by @OTooleMichael in #3302
Add Arrow scanning for fixed size list by @manh9203 in #3259
Refactor ATTACH/DETACH/IMPORT/EXPORT operators (add output msg) by @hououou in #3299
Add check for duplicate map keys by @acquamarin in #3307
Add SingleQueryHasNextQueryResult test case by @mewim in #3311
Optimize concurrent query performance by @ted-wq-x in #3309
Fix cleaning up the test directory on windows CI by @benjaminwinger in #3297
Change MaxLogicalType to Work Better by @mxwli in #3316
Fix list unique and distinct by @acquamarin in #3310
Node table read state by @ray6080 in #3313
Enable RelGroup and disable RDFGraph in EXPORT/IMPORT DATABASE by @hououou in #3319
Implement array to string function by @acquamarin in #3320
Rel table column scan state by @ray6080 in #3317
Output multiple query results for Python and Node.js APIs by @mewim in #3322
Multi Copy for Node Tables by @benjaminwinger in #3298
Optimize csr header in place update by @ray6080 in #3314
Fix duckdb catalog name by @acquamarin in #3324
Move var length field to function by @andyfengHKU in #3328
Optimize Hash Index slot splitting by @benjaminwinger in #3325
Fix multi query with filter result error by @ted-wq-x in #3323
Fix scan multi label from local storage; fix list column chunk lookup by @ray6080 in #3332
Fix format on master by @ray6080 in #3334
Fix updating the diskArray nextPipPageIdx when multiple new PIPs are added by @benjaminwinger in #3329
Apply post binding csating by @andyfengHKU in #3330
Add clang-tidy to extension source code by @acquamarin in #3321
Remove unused code by @andyfengHKU in #3335
Fix multi-statement error handling in node query result by @mewim in #3340
Improve duckdb scanner extension by @acquamarin in #3338
Delete third_party/brotli/research directory by @mewim in #3342
Implement Pyarrow Union Scanning by @mxwli in #3315
Added highlighting for ATTACH keyword by @MSebanc in #3343
Optimize filter push down by @andyfengHKU in #3336
Fix distinct hash table resizing by @acquamarin in #3348
Remove logical id based implicit cast by @andyfengHKU in #3347
Fix distinct hash table performance bug by @acquamarin in #3350
Add extension version and system arch to local extension path by @mewim in #3354
Fix MaxLogicalType error and use it for python parameters by @mxwli in #3346
Fix output message text in db extensions by @prrao87 in #3357
Fix macOS issue for new Python version by @mewim in #3362
Fix export/import tests by @acquamarin in #3366
Fix #3089 by @manh9203 in #3344
Fix extension bugs by @acquamarin in #3364
Fix create dir ending with slash by @acquamarin in #3369
Fix csr sliding: wrong gap size of left side by @ray6080 in #3361
Change DuckDB to static linking by @mewim in #3370
Split InMemHashIndex in-place by @benjaminwinger in #3345
Add coalesce function by @manh9203 in #3235
Add Implicit Casting from List to Array by @mxwli in #3375
Fix issue 2265 by @andyfengHKU in #3376
Add extension utils function by @acquamarin in #3371
Resolve default any type by @andyfengHKU in #3374
Rename duckdb/postgres extension by @acquamarin in #3380
Don't reload hash index after copy by @benjaminwinger in #3377
Fix issue 3262 by @andyfengHKU in #3384
Add read state to string column by @ray6080 in #3381
Clean up list function implementations by @acquamarin in #3385
Fix #2704 by @manh9203 in #3379
Add primary key information to show_connection by @manh9203 in #3372
Fix concurrency issue on PageState; add chunk state for prepareCommit by @ray6080 in #3388
Remove size requirements for the in memory hash index by @benjaminwinger in #3373
Refactor list_range and list_sort functions by @acquamarin in #3393
Remove redundant computation of isNewNodeGroup by @ray6080 in #3396
Fix issue 3248 by @andyfengHKU in #3394
Simplify prepare commit of null columns and nested columns with chunk states by @ray6080 in #3398
Better error message for extensions by @acquamarin in #3397
Refactor table read state by @andyfengHKU in #3392
Fix warnings from GCC 13 by @benjaminwinger in #3387
Support unwind array by @acquamarin in #3402
Improve efficiency of merging bulk insertions into the hash index by @benjaminwinger in #3403
Fix export database on nodetable with serial property by @acquamarin in #3408
V0.4.0 example bug fixes by @andyfengHKU in #3419
Extend show_tables/table_info to attached databases by @acquamarin in #3420
Add doc example as test by @andyfengHKU in #3421
Fix minor issues by @acquamarin in #3423
Fix capacity of columnChunk when commitColumnChunkOutOfPlace by @ray6080 in #3424
Unify resetAuxBuffer in table read by @ray6080 in #3409
Add doc example to test framework by @acquamarin in #3426
Remove unnecessary calls to WAL::flushAllPages and clear the dirty flag when flushing pages by @benjaminwinger in #3427
Fix scan rel under very sparse graphs by @ray6080 in #3412
Avoid modifying vector in HashIndex::mergeSlot when iterating by @benjaminwinger in #3429
Fix asan issue during multi copy by @ray6080 in #3431
Add Node.js close bindings for query results and connections by @mewim in #3436
Bind manual database close methods to Python APIs by @mewim in #3435
Bump version to 0.4.0 by @mewim in #3433

New Contributors

@alexander-beedie made their first contribution in #2985
@neeraj9 made their first contribution in #3132
@zaddach made their first contribution in #3228
@ted-wq-x made their first contribution in #3233
@phf-1 made their first contribution in #3277

Full Changelog: v0.3.2...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0

Changes

Breaking Changes

Usability Improvements

Performance Improvements

What's Changed

New Contributors

Contributors