v0.4.0
Changes
Breaking Changes
- Renaming of list-like types:
FIXED-LIST
toARRAY
VAR-LIST
toLIST
- Import/Export database
- Copy from subquery
- Bulk insertion into non-empty database
- External database extensions
- DuckDB
- Postgres
- Scan from pandas pyarrow backend (#3058)
Usability Improvements
- CLI improvements (#2869, #2876, #2930, #2953, #3253)
- Functions
- Export query result to Polars (#2985, contributed by @alexander-beedie)
- Python API linting improvemetns (#3023, contributed by @alexander-beedie))
- Progress bar (#3051)
- Support read after update in the same statement (#3126)
Performance Improvements
- Python import cache (#2905)
- Internal ID compression (#3116)
- Avoid busy loop when max threads has reached (#3233, contributed by @ted-wq-x)
What's Changed
- move apis from connection to clientcontext by @hououou in #2951
- Allow quotes on struct keys by @acquamarin in #2967
- Fix functions for casting string to var-list by @manh9203 in #2970
- Implemented start_node and end_node functions by @MSebanc in #2978
- Fix issue-2942 by @andyfengHKU in #2977
- Import database by @hououou in #2964
- Minor CLI Truncation Fix by @MSebanc in #2980
- Fixed start and end node tests by @MSebanc in #2981
- Refactor: unify many_one and many_many storage by @ray6080 in #2912
- Refactor: unify CopyNode and CopyRel operator by @ray6080 in #2955
- Support Polars DataFrame export from QueryResult by @alexander-beedie in #2985
- Clean up transaction pointer in physical operator by @ray6080 in #2990
- More efficient ColumnChunk string dictionary caching by @benjaminwinger in #2994
- Fix setting of column chunk capacity by @ray6080 in #2996
- Rework CSV_TO_PARQUET testing feature by @manh9203 in #2993
- Avoid moving DictionaryChunks by @benjaminwinger in #2999
- Fix broken links in README to website due to sub-domain changes by @ray6080 in #3000
- Re-write partitioner to use ColumnChunks instead of ValueVectors by @benjaminwinger in #2979
- Abstract client config by @andyfengHKU in #3010
- Support use of
QueryResult
as a context manager, and add aget_schema
method by @alexander-beedie in #3009 - Pass client context to binder by @andyfengHKU in #3015
- Refactor cast functions by @andyfengHKU in #3016
- Clean up unique_ptr of LogicalType in NodeGroup and BatchInsert by @ray6080 in #3018
- Combine append(ValueVector) with appendOne by @ray6080 in #3017
- Import cache fix and revert revert by @mxwli in #3025
- Fix issue-2984 by @andyfengHKU in #3026
- Add multiplaform test report bot by @mewim in #3027
- Python API typing, lint, config/makefile by @alexander-beedie in #3023
- Fix unicode conversion for pandas dataframe by @mewim in #3029
- Update LICENSE by @semihsalihoglu-uw in #3031
- Rewrite the Hash Index overflow file to support multiple copies by @benjaminwinger in #3012
- Add copy from subquery by @andyfengHKU in #3020
- Fix issue-3004 by @andyfengHKU in #3036
- Optimise Python unit test runtime (~7x speedup) by @alexander-beedie in #3032
- Add more parameter types for Node.js API by @mewim in #3037
- Insert into the hash index builder one chunk at a time by @benjaminwinger in #2997
- Allow CI workflow to be manually dispatched by @mewim in #3043
- Bump extensions version to 0.2.0 by @mewim in #3041
- First-pass lint/format for Python
shell
tests by @alexander-beedie in #3034 - Bump master branch version to 0.3.2.1 by @mewim in #3044
- Fixed failing shell tests by @MSebanc in #3045
- Add shell tests to CI by @mewim in #3039
- Fix rel csr sliding out-of-place commit and null strings by @ray6080 in #3055
- Refactor: separate insertions and updates in rel table local storage by @ray6080 in #2982
- Fix issue 3042 by @ray6080 in #3046
- Update Debian version in build workflows by @mewim in #3056
- Implement duckdb scanner extension by @acquamarin in #3052
- Copy table function instead of passing raw pointer by @andyfengHKU in #3067
- Add scalar_func_rewrite_t by @andyfengHKU in #3069
- Remove the constraint on HashIndexBuilder's template parameter by @benjaminwinger in #3030
- Remove unnecessary components for pip package by @mewim in #3074
- Fix Hash index split slot ID when reserving a number of slots which are a power of two by @benjaminwinger in #3066
- Implement catalog cache in postgres scanner by @acquamarin in #3071
- Rework FIXED_LIST by @manh9203 in #3057
- Implemented Progress Bar by @MSebanc in #3051
- Replace ValueVector with ColumnChunk in LocalStorage by @ray6080 in #3028
- Exclude extension files from the rust crate by @benjaminwinger in #3076
- Add include for cstdint by @mewim in #3085
- Fix rel insert and add sanityCheck for column chunk by @ray6080 in #3081
- Fix node insert by @ray6080 in #3082
- Refactor the registration of arithmetic functions by @manh9203 in #3079
- Allowed for progress bar to be configurable by CALL by @MSebanc in #3080
- Implement array functions by @acquamarin in #3087
- Remove underscore from the badges in README by @mewim in #3094
- Fix python prepared statement null value by @acquamarin in #3098
- Refactor string functions by @manh9203 in #3091
- Arrow chunk_size as keyword argument by @prrao87 in #3084
- Update rustdoc to show how to enable parallel compilation by @prrao87 in #3099
- Improve copy-to-parquet perf by @acquamarin in #3105
- Refactor list functions by @manh9203 in #3100
- Refactor cast functions by @manh9203 in #3107
- QueryResult
get_as_pl
should always return a single chunk by @alexander-beedie in #3110 - Add standard Python module
__version__
attr by @alexander-beedie in #3111 - Fix DuckDB build for macOS ARM and 32-bit by @mewim in #3115
- Pandas pyarrow backend by @mxwli in #3058
- Add pull request template by @andyfengHKU in #3118
- Added customizable delay before displaying progress bar by @MSebanc in #3092
- Hash index cleanup by @benjaminwinger in #3088
- Fix launch database using homedir by @acquamarin in #3108
- Replace DUMMY_TRANSACTION by @hououou in #3106
- fix IMPORT_DATABASE path by @hououou in #3063
- Enable compression for INTERNAL_ID by @ray6080 in #3116
- Close issue 1646 by @ray6080 in #3122
- Refactor Partitioner to use ChunkedNodeGroupCollection by @ray6080 in #3123
- Replace with client context by @hououou in #3121
- Improve the performance of VAR_LIST storage layout by @hououou in #3093
- Fix issue #3127 by @acquamarin in #3130
- Fix issue-3129 by @andyfengHKU in #3131
- Refactor scalar function registration by @manh9203 in #3119
- Support multiple COPY statements on rel tables by @ray6080 in #2989
- initialize readfds via FD_ZERO before use by @neeraj9 in #3132
- Table states by @ray6080 in #3072
- Support read after update by @andyfengHKU in #3126
- Factor out benchmark workflow and enable manual trigger for it by @mewim in #3144
- Implement postgres-scanner by @acquamarin in #3139
- Python List and Map Parameter Support by @mxwli in #3090
- Cache DiskArray write header in-memory by @benjaminwinger in #3109
- Fix postgres scanner on windows by @acquamarin in #3148
- Refactor path functions and RDF functions by @manh9203 in #3134
- Refactor aggregate functions by @manh9203 in #3136
- Pandas Pyarrow Backend Bugfix and Tests by @mxwli in #3152
- List Auxiliary Buffer NullMask Fix by @mxwli in #3156
- Add support to compute hash on list of struct by @acquamarin in #3157
- Prepare Statement Improvement by @hououou in #3140
- Resolve ANY Resolution Occurring at End of Python List Parameter by @mxwli in #3160
- Fix export test by @hououou in #3164
- Implement initcap/concat functions by @acquamarin in #3161
- Support extend from unwind node by @andyfengHKU in #3153
- Add Pyarrow Map Scanning by @mxwli in #3158
- Fix export database regression by @andyfengHKU in #3171
- Fix hash aggregate edge case by @andyfengHKU in #3172
- Added progress for in_query_call operators by @MSebanc in #3120
- Fixed shell incorrect command seg fault by @MSebanc in #3173
- Cache FileInfo when replaying WAL by @benjaminwinger in #3137
- Support join hash table on aggregate types by @acquamarin in #3174
- Fix scan after delete bug by @andyfengHKU in #3176
- Refactor sel vector interface by @andyfengHKU in #3177
- Fix issue 3151: disable null on internalID columns by @ray6080 in #3165
- Rework DDL operators by @ray6080 in #3178
- Refactor table functions by @manh9203 in #3155
- Rename VAR_LIST to LIST by @manh9203 in #3170
- Remove unused keywords in test runner by @hououou in #3193
- Reorder extension tests for CI pipeline by @mewim in #2987
- Added progress for aggregate scan and order by scan by @MSebanc in #3192
- Fix is null executor bug by @andyfengHKU in #3197
- Fix order by radix sort bug by @acquamarin in #3201
- Updated shell result truncation by @MSebanc in #3206
- Fix broken links in README.md by @prrao87 in #3203
- skip empty history file line by @neeraj9 in #3184
- Merge duplicate key fix by @acquamarin in #3207
- Implemented progress for in memory RDF scan by @MSebanc in #3208
- Rework multiple query result by @hououou in #3191
- Fix constant compression in-place check for bools by @benjaminwinger in #3211
- Replace Slack link with Discord in contributing guideline by @mewim in #3217
- Fix pyarrow segfaulting on fedora 39 by @mxwli in #3213
- Bump clang-format to v18 and enable auto format by @mewim in #3222
- Check for format changes on master branch by @mewim in #3223
- CMAKE_CXX_FLAGS handling fails when variable is empty by @zaddach in #3228
- Remove extension test from
clang-build-test
job by @mewim in #3231 - Add list look up test by @hououou in #3210
- Fix optional match merge by @andyfengHKU in #3216
- List offset Column Refactor by @hououou in #3219
- change LogicalType.toString for nested types by @mxwli in #3209
- Add utility hash functions by @manh9203 in #3212
- Add DATE TO DATE and TIMESTAMP TO DATE casting functions. by @mxwli in #3220
- Separate shadow pages from wal records and rework wal to use serializer by @ray6080 in #3204
- Add distinct aggregate over node and relationships by @andyfengHKU in #3236
- Optimize task scheduler by @ted-wq-x in #3233
- remove DataTypeInfo and use LogicalType and column names by @russell-liu in #2539
- Add Physical Type
ARRAY
by @manh9203 in #3175 - Update CONTRIBUTING.md by @semihsalihoglu-uw in #3241
- Add configuration to optimize recursive computation by @andyfengHKU in #3242
- Add format for tools by @mewim in #3244
- Support allocations of larger-than-256KB memory buffers by @ray6080 in #3243
- Fix canCommitInPlace for string dict offsets by @ray6080 in #3249
- Fix clang tidy and clangd diagnostics check workflow by @mewim in #3254
- Binder copy read rework by @andyfengHKU in #3251
- Rewrite transaction copy/ddl tests as end to end tests by @ray6080 in #3255
- Windows shell improvements by @MSebanc in #3253
- Add back skipped tests on update/copy by @ray6080 in #3256
- Remove file system from catalog and statistics by @andyfengHKU in #3258
- Fix #3154 by @mewim in #3263
- Fix problematic to_arrow tests by @mxwli in #3257
- Fix Pyarrow Backend Scanning by @mxwli in #3265
- Remove 256K page size limit for ftable by @andyfengHKU in #3266
- Fix NullMask setNullRange by @ray6080 in #3267
- Update duckdb scanning grammar by @andyfengHKU in #3271
- Fix extension ci by @acquamarin in #3272
- Support scan duckdb array column by @acquamarin in #3269
- fix: #3276 by @phf-1 in #3277
- Add sync interface to file system and sync wal file when flushed by @ray6080 in #3261
- Handle exceptions when flushing WAL by @benjaminwinger in #3283
- Fix #3274 by @ted-wq-x in #3285
- Fix issue 2469, 2986 & 3185 by @andyfengHKU in #3284
- Optimize jni calling overhead by @ted-wq-x in #3288
- Implement Implicit Casting of Nested Types and Type Combination with MaxLogicalType by @mxwli in #3234
- Move tests into system temporary directory by @benjaminwinger in #3290
- String utf8 test by @hououou in #3287
- Add Missing get_as_arrow and get_as_df Types by @mxwli in #3296
- Hash index multi copy by @benjaminwinger in #3189
- Parquet compression by @acquamarin in #3286
- Add database close methods for Node.js and Python APIs by @mewim in #3289
- Implement use database by @acquamarin in #3300
- Fix brotli build issue by @mewim in #3303
- Pass nullMask to setValuesFromUncompressed by @benjaminwinger in #3247
- Remove parameter map for Node.js API by @mewim in #3304
- CALL to be readonly=true by @OTooleMichael in #3302
- Add Arrow scanning for fixed size list by @manh9203 in #3259
- Refactor
ATTACH/DETACH/IMPORT/EXPORT
operators (add output msg) by @hououou in #3299 - Add check for duplicate map keys by @acquamarin in #3307
- Add
SingleQueryHasNextQueryResult
test case by @mewim in #3311 - Optimize concurrent query performance by @ted-wq-x in #3309
- Fix cleaning up the test directory on windows CI by @benjaminwinger in #3297
- Change MaxLogicalType to Work Better by @mxwli in #3316
- Fix list unique and distinct by @acquamarin in #3310
- Node table read state by @ray6080 in #3313
- Enable
RelGroup
and disableRDFGraph
inEXPORT/IMPORT DATABASE
by @hououou in #3319 - Implement array to string function by @acquamarin in #3320
- Rel table column scan state by @ray6080 in #3317
- Output multiple query results for Python and Node.js APIs by @mewim in #3322
- Multi Copy for Node Tables by @benjaminwinger in #3298
- Optimize csr header in place update by @ray6080 in #3314
- Fix duckdb catalog name by @acquamarin in #3324
- Move var length field to function by @andyfengHKU in #3328
- Optimize Hash Index slot splitting by @benjaminwinger in #3325
- Fix multi query with filter result error by @ted-wq-x in #3323
- Fix scan multi label from local storage; fix list column chunk lookup by @ray6080 in #3332
- Fix format on master by @ray6080 in #3334
- Fix updating the diskArray nextPipPageIdx when multiple new PIPs are added by @benjaminwinger in #3329
- Apply post binding csating by @andyfengHKU in #3330
- Add clang-tidy to extension source code by @acquamarin in #3321
- Remove unused code by @andyfengHKU in #3335
- Fix multi-statement error handling in node query result by @mewim in #3340
- Improve duckdb scanner extension by @acquamarin in #3338
- Delete third_party/brotli/research directory by @mewim in #3342
- Implement Pyarrow Union Scanning by @mxwli in #3315
- Added highlighting for ATTACH keyword by @MSebanc in #3343
- Optimize filter push down by @andyfengHKU in #3336
- Fix distinct hash table resizing by @acquamarin in #3348
- Remove logical id based implicit cast by @andyfengHKU in #3347
- Fix distinct hash table performance bug by @acquamarin in #3350
- Add extension version and system arch to local extension path by @mewim in #3354
- Fix MaxLogicalType error and use it for python parameters by @mxwli in #3346
- Fix output message text in db extensions by @prrao87 in #3357
- Fix macOS issue for new Python version by @mewim in #3362
- Fix export/import tests by @acquamarin in #3366
- Fix #3089 by @manh9203 in #3344
- Fix extension bugs by @acquamarin in #3364
- Fix create dir ending with slash by @acquamarin in #3369
- Fix csr sliding: wrong gap size of left side by @ray6080 in #3361
- Change DuckDB to static linking by @mewim in #3370
- Split InMemHashIndex in-place by @benjaminwinger in #3345
- Add coalesce function by @manh9203 in #3235
- Add Implicit Casting from List to Array by @mxwli in #3375
- Fix issue 2265 by @andyfengHKU in #3376
- Add extension utils function by @acquamarin in #3371
- Resolve default any type by @andyfengHKU in #3374
- Rename duckdb/postgres extension by @acquamarin in #3380
- Don't reload hash index after copy by @benjaminwinger in #3377
- Fix issue 3262 by @andyfengHKU in #3384
- Add read state to string column by @ray6080 in #3381
- Clean up list function implementations by @acquamarin in #3385
- Fix #2704 by @manh9203 in #3379
- Add primary key information to
show_connection
by @manh9203 in #3372 - Fix concurrency issue on PageState; add chunk state for prepareCommit by @ray6080 in #3388
- Remove size requirements for the in memory hash index by @benjaminwinger in #3373
- Refactor list_range and list_sort functions by @acquamarin in #3393
- Remove redundant computation of isNewNodeGroup by @ray6080 in #3396
- Fix issue 3248 by @andyfengHKU in #3394
- Simplify prepare commit of null columns and nested columns with chunk states by @ray6080 in #3398
- Better error message for extensions by @acquamarin in #3397
- Refactor table read state by @andyfengHKU in #3392
- Fix warnings from GCC 13 by @benjaminwinger in #3387
- Support unwind array by @acquamarin in #3402
- Improve efficiency of merging bulk insertions into the hash index by @benjaminwinger in #3403
- Fix export database on nodetable with serial property by @acquamarin in #3408
- V0.4.0 example bug fixes by @andyfengHKU in #3419
- Extend show_tables/table_info to attached databases by @acquamarin in #3420
- Add doc example as test by @andyfengHKU in #3421
- Fix minor issues by @acquamarin in #3423
- Fix capacity of columnChunk when commitColumnChunkOutOfPlace by @ray6080 in #3424
- Unify resetAuxBuffer in table read by @ray6080 in #3409
- Add doc example to test framework by @acquamarin in #3426
- Remove unnecessary calls to WAL::flushAllPages and clear the dirty flag when flushing pages by @benjaminwinger in #3427
- Fix scan rel under very sparse graphs by @ray6080 in #3412
- Avoid modifying vector in HashIndex::mergeSlot when iterating by @benjaminwinger in #3429
- Fix asan issue during multi copy by @ray6080 in #3431
- Add Node.js close bindings for query results and connections by @mewim in #3436
- Bind manual database close methods to Python APIs by @mewim in #3435
- Bump version to 0.4.0 by @mewim in #3433
New Contributors
- @alexander-beedie made their first contribution in #2985
- @neeraj9 made their first contribution in #3132
- @zaddach made their first contribution in #3228
- @ted-wq-x made their first contribution in #3233
- @phf-1 made their first contribution in #3277
Full Changelog: v0.3.2...v0.4.0