v0.5.0
Version 0.5.0 introduces several major changes:
Performance improvements
- MVCC-based transaction manager.
- Remote file system cache in httpfs extension.
New features
- Attach remote Kùzu databases.
- Python UDFs.
- List lambda functions.
- Scan and copy from DataFrames.
- New DDL statements: create table if not exists; drop table if exists.
- Progress bar in CLI and Explorer.
- Join order hints. Specify join order in Cypher.
New extensions and API improvements
- SQLite scanner.
- Support copying from and to JSON files.
- Decimal data type.
- Numerous improvements on C API.
Please see our release post for more details!
What's Changed
- Allow fuzzy matching on test result by @yiyun-sj in #3432
- Support asserting RETURN result column names in testing framework by @yiyun-sj in #3417
- Remove unique_ptr of value in literal expression by @andyfengHKU in #3440
- Replace const pointer with const reference in type functions by @manh9203 in #3430
- Infer test group name directly from test file path by @yiyun-sj in #3418
- Use a lockfree data structure to store page states by @benjaminwinger in #3425
- Move length function as a rewrite function by @andyfengHKU in #3442
- Remove shared_ptr of value in parameter expression by @andyfengHKU in #3443
- Optimize InMemoryHashIndex lookups by @benjaminwinger in #3378
- Add Python UDF for Primitive Types by @mxwli in #3390
- Upgrade runner to Ubuntu 24.04 by @mewim in #3445
- Support multiple query statements in e2e test framework by @yiyun-sj in #3437
- Issue 2385 by @andyfengHKU in #3444
- Rework the public interface of SelectionVector by @ray6080 in #3447
- Fix python empty dict parameter bug by @andyfengHKU in #3452
- Update project version to 0.4.1 by @mewim in #3455
- Allow numeric value comparison with precision by @yiyun-sj in #3453
- Fix CSV file answers tuple count bug by @yiyun-sj in #3454
- Add mvcc support for catalog by @ray6080 in #3301
- Fix calculation of hash slots on 32bit env by @acquamarin in #3460
- Implement Polars Scanning by @mxwli in #3451
- Pass transaction pointer to function by @hououou in #3239
- Support initialize test case on existing binary db directory by @yiyun-sj in #3428
- Reclaim empty overflow slots in memory hash index by @benjaminwinger in #3438
- Fix pandas UTF-8 scan by @acquamarin in #3468
- Remove logger from Database by @ray6080 in #3270
- Add Nested Types List and Map for Python UDF support by @mxwli in #3450
- Fix #2888: error when commit/rollback on invalid transaction; fix error msg on nested transaction by @ray6080 in #3469
- Fix some minor hash index issues by @benjaminwinger in #3471
- Backtraces by @benjaminwinger in #3456
- Fixed issue with commit error not showing in shell by @MSebanc in #3472
- Add generic utility functions by @manh9203 in #3282
- Attach remote kuzu database by @acquamarin in #3467
- Refactor ftable schema by @andyfengHKU in #3479
- Merge the HashIndex bulkstorage with the local storage for inserts by @benjaminwinger in #3482
- Fix string hash and add more hash tests by @benjaminwinger in #3473
- Migrate benchmark to new server by @mewim in #3487
- Added missing algorithm includes needed for gcc 14 by @benjaminwinger in #3490
- Port changes from 0.4.2 to master by @mewim in #3493
- Enable -Werror for GCC Build & Test Job by @mxwli in #3494
- Implement attach options by @acquamarin in #3485
- Added current_timestamp and current_date functions by @MSebanc in #3497
- Fix serial csv reader by @acquamarin in #3505
- Make serializer tool able to be run standalone by @benjaminwinger in #3501
- Fix issue-3488 by @andyfengHKU in #3506
- Always build rust integration with release runtime library on Windows by @zaddach in #3226
- Support CREATE SEQUENCE functionality by @yiyun-sj in #3474
- Read primary key for delete by @andyfengHKU in #3512
- Disk array builder cleanup by @benjaminwinger in #3498
- Add issue and PR templates by @prrao87 in #3515
- Remote file system cache by @acquamarin in #3516
- Add storage version info to the single file header by @benjaminwinger in #3519
- Propagate chunk state to commit out of place in column by @ray6080 in #3522
- Implement file cache for s3 filesystem by @acquamarin in #3526
- Graph function framework by @andyfengHKU in #3486
- Support contains function by @acquamarin in #3531
- rename InQueryCall to TableFunctionCall by @andyfengHKU in #3533
- File cache optimization by @acquamarin in #3530
- Add issue template for performance optimization category by @ray6080 in #3535
- Preliminary Decimal Datatype by @mxwli in #3521
- Remove property stats by @ray6080 in #3534
- Rework scan node by @ray6080 in #3524
- Automatically merge pull request upon extension build by @mewim in #3539
- Implement string_split and split_part functions by @acquamarin in #3537
- Scan primary key column before updating by @andyfengHKU in #3542
- Report LSQB results to benchmark server by @mewim in #3545
- C Api Enhancements by @MSebanc in #3457
- Add default value to CREATE by @yiyun-sj in #3523
- Fix shell printing issue by @MSebanc in #3547
- Python UDF and C++ UDF improvements by @mxwli in #3483
- Fix create rel table group parser exception by @acquamarin in #3549
- Fix disabled test for 3524 by @andyfengHKU in #3546
- Refactor FinBench CI pipeline and report results to server by @mewim in #3551
- Refactor InteractiveV1 CI pipeline and report results to server by @mewim in #3552
- Fix join order for 3524 by @andyfengHKU in #3553
- Fix issue 3166 by @andyfengHKU in #3404
- Fix filesearchpath in localFileSystem glob by @acquamarin in #3550
- ColumnChunk statistics for zone mapping by @benjaminwinger in #2611
- Refactor BI pipeline and add report to server by @mewim in #3555
- Turn on primary key scan by @andyfengHKU in #3556
- Support populating DEFAULT values in COPY FROM statements by @yiyun-sj in #3554
- Apply zone map to scan by @andyfengHKU in #3561
- fix sequence batch insert test by @yiyun-sj in #3563
- Auto report internal benchmark results to PR runs by @mewim in #3568
- Fix return type by @sapalli2989 in #3567
- Separate larger benchmark machines for LDBC benchmarks by @mewim in #3571
- Fix rel multiplicity parsing by @acquamarin in #3574
- Reworked progress bar to keep display handling separate by @MSebanc in #3566
- Disk array packed headers by @benjaminwinger in #3557
- Fix stats updates by @benjaminwinger in #3582
- Fix issue-3570 by @andyfengHKU in #3584
- Refactor hash function execution framework by @acquamarin in #3583
- Apply zone map to rel scan by @andyfengHKU in #3573
- Track variable sized memory manager allocations through the buffer manager by @benjaminwinger in #3564
- Added query progress callbacks for nodejs api by @MSebanc in #3591
- Implement user defined types by @acquamarin in #3586
- Pandas UUID by @mxwli in #3590
- Support udt on copy/load from by @acquamarin in #3592
- Add create subgraph statement by @andyfengHKU in #3581
- New grammar for casting functions by @mxwli in #3596
- Forward declare memory manager by @andyfengHKU in #3599
- Decimal datatype by @mxwli in #3580
- Cleanup hash index initialization by @benjaminwinger in #3577
- Materialize SERIAL by @yiyun-sj in #3565
- Clean up ColumnChunk by @ray6080 in #3585
- Fix keyword 'as" as table name by @acquamarin in #3611
- Temp Fix for SERIAL RDF failing test case by @yiyun-sj in #3615
- Create if not exists by @acquamarin in #3610
- Add an additional field for bug template to specify OS by @prrao87 in #3614
- Keyword improvement by @mxwli in #3603
- Fix windows extension CI to fail if the command fails by @benjaminwinger in #3600
- Check if python module exists to fix #3613 by @mxwli in #3622
- Support DEFAULT for REL tables by @yiyun-sj in #3625
- Apply node semi mask to gds by @andyfengHKU in #3621
- Column scan with chunk state by @ray6080 in #3628
- String column refactor by @royi-luo in #3617
- Generate binary datasets in CI instead of storing them in the repo by @benjaminwinger in #3540
- Add PostgresSQL Non Reserved Keywords by @mxwli in #3626
- add decimal datatype to Python and Java API by @mxwli in #3618
- Take null values into account for copy statistics by @benjaminwinger in #3589
- Copy func framework by @acquamarin in #3629
- Remove CMAKE_BUILD_TYPE and add CI job for MSVC generators by @benjaminwinger in #3633
- Change default escape character from \ to " by @acquamarin in #3639
- Fix copy to state bug by @acquamarin in #3638
- Avoid detach delete when node is already deleted in the same transaction by @ray6080 in #3637
- Add SEQUENCE to import/export framework by @yiyun-sj in #3641
- Fix create if not exist return message by @andyfengHKU in #3643
- List Column refactor by @royi-luo in #3631
- Multi label graph interface by @andyfengHKU in #3635
- Disable support for non-constant default values on add column by @yiyun-sj in #3645
- Fix extension build workflow for x86 by @mewim in #3654
- Make canUpdateInPlace operate on multiple values at once by @benjaminwinger in #3642
- Sequence MVCC Support by @yiyun-sj in #3648
- Fix generator case in irregular CI pipelines by @benjaminwinger in #3655
- Algorithm return node properties by @andyfengHKU in #3649
- Refactor Comment On to Alter Framework by @yiyun-sj in #3656
- Enable semi mask by @andyfengHKU in #3651
- Use a clock-based Buffer Manager eviction strategy by @benjaminwinger in #3620
- Fix calib tree region by @benjaminwinger in #3663
- Add op print info by @andyfengHKU in #3662
- Fix 3652 by @andyfengHKU in #3668
- Fix issue-3653 by @andyfengHKU in #3669
- Fix Python Tests by @mxwli in #3672
- Change uses of unique_ptr to LogicalType to plain LogicalType by @mxwli in #3647
- Add compression for int128 by @royi-luo in #3658
- Support other catalog entry wal activity by @yiyun-sj in #3661
- Upgrade Node.js dependencies by @mewim in #3677
- Updated progress bar for asynchronous queries by @MSebanc in #3665
- Fix zero column node COPY by @yiyun-sj in #3680
- Set buffer pool size during python test initialization to speed up tests by @mxwli in #3684
- Fix optimistic read on MARKED page by @ray6080 in #3676
- Increase Node.js test timeout to 20 sec by @mewim in #3685
- Remove redundant CSRListEntries data by @benjaminwinger in #3670
- Rust value tests by @benjaminwinger in #2708
- Use 8-byte atomics in the eviction queue by @benjaminwinger in #3687
- Sqlite extension by @acquamarin in #3693
- Fix recursive relationship with filter parser issue by @acquamarin in #3694
- Upgrade dependencies for CI pipelines by @mewim in #3696
- Allow integer packing and unpacking of partial chunks by @royi-luo in #3681
- Skip removing page candidates in the Database destructor by @benjaminwinger in #3697
- Clean up mm-256KB file created on local file system by @ray6080 in #3702
- Clean up include of client_context.h under catalog_entry.h by @ray6080 in #3701
- Rust multithreaded tests by @benjaminwinger in #3690
- Refactor undo record as struct by @ray6080 in #3707
- Rename column_chunk file to column_chunk_data by @ray6080 in #3708
- Do not delete generated grammar by @mxwli in #3710
- Removes unnecessary calls to Reset by @yiyun-sj in #3711
- Add PyArrow.lib.Table Scanning by @mxwli in #3723
- Added nodejs progress tests by @MSebanc in #3719
- Fix writing shared node groups to partially filled groups on disk in node batch insert by @benjaminwinger in #3724
- Updated shell to only create one database object by @MSebanc in #3725
- Add join hint by @andyfengHKU in #3709
- Updated printing for a few physical operators by @MSebanc in #3722
- Local hash index by @ray6080 in #3705
- Disable unnecessary null chunk data by @ray6080 in #3726
- Reduce initial capacity of ChunkedCSRHeader by @benjaminwinger in #3704
- GDS Parallelism Infrastructure by @semihsalihoglu-uw in #3713
- Remove unused functions by @acquamarin in #3731
- Implement negative array index in list extract by @acquamarin in #3733
- Deprecate CentOS 7 builder and restructure the build pipeline by @mewim in #3734
- Fix list slice index issue by @acquamarin in #3735
- Add list transform with lambda by @andyfengHKU in #3736
- Fix CI after lambda expression pr by @acquamarin in #3737
- Remove unnecessary fwd/bwd relTableIDs from node entry by @andyfengHKU in #3740
- Fix data race in eviction queue by @benjaminwinger in #3742
- Implement list filter by @acquamarin in #3741
- Implement list_reduce lambda function by @acquamarin in #3748
- More operator printing by @MSebanc in #3746
- Fix CrossProduct after WCOJ bug by @andyfengHKU in #3755
- Fix distinct aggregation on recursive rel by @acquamarin in #3750
- Shell history and help flag improvements by @MSebanc in #3757
- Fix catalogExtension cast by @acquamarin in #3760
- Fix building on gcc 14 by @benjaminwinger in #3753
- Basic Json Support [NOT FINAL FEATURES] by @mxwli in #3739
- Fix issue 3751 by @andyfengHKU in #3765
- Fix Linux CLI upload by @mewim in #3769
- Skip scanning null lists by @benjaminwinger in #3766
- Fix issue-3730 by @andyfengHKU in #3780
- Fix rollback of index overflow file by @ray6080 in #3779
- Completed UseDatabase and TableFunctionCall Operators by @hamzakammar in #3781
- Remove hasAtMostOneNbr by @andyfengHKU in #3788
- Updated printing for more physical operators by @MSebanc in #3794
- Fix creation of new overflow slots when reserving space in the hash index by @benjaminwinger in #3791
- Skip daily build (for upcoming demo) by @mewim in #3797
- Making up for the lost worker thread in GDSTasks by @semihsalihoglu-uw in #3792
- Fix issue-3386 by @andyfengHKU in #3790
- Clean up unused functions in base_graph_test by @acquamarin in #3801
- Support more udf types by @acquamarin in #3802
- Re-enable dev builds by @prrao87 in #3806
- Fix issue-3785 by @andyfengHKU in #3799
- Fix issue-3691 by @andyfengHKU in #3807
- Fix issue-3686 by @andyfengHKU in #3808
- Fix explain export database statement by @acquamarin in #3811
- Drop if exists by @acquamarin in #3800
- Fix issue 3097 by @andyfengHKU in #3810
- Fix deadlock issue in BMFileHandle by @ray6080 in #3820
- Adding basic single shortest path that finds paths by @semihsalihoglu-uw in #3819
- Fix issue-3616 by @andyfengHKU in #3821
- Operator printing by @hamzakammar in #3805
- Add arm64 macOS CI workflow by @mewim in #3825
- Fix python struct and map type interpretation by @acquamarin in #3824
- Fix lambda dependency analyze by @andyfengHKU in #3829
- Change max line width to 16k by @acquamarin in #3833
- Set number of jobs in rust build using Cargo's parallel level by @benjaminwinger in #3826
- Copy to parquet options by @acquamarin in #3834
- Add error message when kuzu detects numpy version at or above 2.0.0 by @mxwli in #3828
- Fix depandabot security alert by @mewim in #3837
- Add DECIMAL conversion for Node.js API by @mewim in #3835
- Pass transaction in hash index by @ray6080 in #3822
- Fix 3002 by @mxwli in #3838
- NetworkX multi-edge support by @mewim in #3836
- Fix demo-db test by @acquamarin in #3843
- Issue 3744 by @andyfengHKU in #3841
- Implement copy from external table by @acquamarin in #3844
- Update pybind11 version by @mxwli in #3839
- Issue 3507 by @andyfengHKU in #3851
- Json Loading Options, COPY TO, COPY FROM by @mxwli in #3789
- Operator printing by @hamzakammar in #3848
- removing memory_order arguments in bm by @semihsalihoglu-uw in #3857
- Fix clang-format ci by @mewim in #3858
- Add sqlite to official extension by @acquamarin in #3860
- Expose rel ids to Node.js api by @mewim in #3867
- Added support for shortest] keyword by @hamzakammar in #3866
- Optimize small rel table lookups by @benjaminwinger in #3813
- Merge source scan to table function call by @andyfengHKU in #3874
- Add decimal support to rust API by @benjaminwinger in #3827
- Turn on primary key scan while using PreparedStatement by @ted-wq-x in #3863
- implement skip row option by @acquamarin in #3877
- Add pyarrow
COPY FROM
testing by @mxwli in #3873 - Show functions call by @yiyun-sj in #3882
- Solve copy casting by @andyfengHKU in #3887
- Add object copy casting by @andyfengHKU in #3891
- Add support for passing literals to array functions by @mxwli in #3898
- MVCC for storage by @ray6080 in #3718
- CLI case-insensitive and snake_case shell flags by @MSebanc in #3888
- Fix storage driver by @andyfengHKU in #3903
- Unskip tests by @ray6080 in #3908
- Updated shell help docs url to a clickable link by @MSebanc in #3914
- Fix node batch insert: incorrect slicing by @ray6080 in #3920
- Rework update by @andyfengHKU in #3918
- Fix copy rollback leading to incorrect scan result by @ray6080 in #3922
- Fix duplicate primary key update by @andyfengHKU in #3924
- Enable attach s3 tests by @acquamarin in #3923
- UDF Map and Struct Input/Output by @mxwli in #3854
- Fix string function bugs by @mxwli in #3902
- Rework ChunkedNodeGroupCollection by @ray6080 in #3925
- Implemented progress for scan_node_table by @MSebanc in #3901
- Updated printing for more physical operators by @MSebanc in #3926
- Fix Sequence WAL and rollback behaviour by @yiyun-sj in #3919
- Fix Detach Delete Logic by @yiyun-sj in #3933
- Correctly reset active transaction after checkpoint by @royi-luo in #3934
- Fix bugs in node group checkpoint by @royi-luo in #3917
- Fix manual tx in table lookup by @andyfengHKU in #3929
- Fix RDF and Rel Group Entries in WAL Recovery by @yiyun-sj in #3912
- Fix duckdb extension parallel issue by @acquamarin in #3937
- Fix UDT and CI by @acquamarin in #3938
- Merge distinct fix by @acquamarin in #3905
- Reorganize test files by @ray6080 in #3928
- Shell printing update by @MSebanc in #3936
- Fix issue-3943 by @andyfengHKU in #3945
- Allowed physical operator boxes to connect by @hamzakammar in #3941
- Fix more checkpoint-related issues by @royi-luo in #3944
- Fix local node delete leading to inconsistent node offsets by @ray6080 in #3946
- Add mvcc ddl tests by @yiyun-sj in #3935
- Fix checkpoint version of node tables by @ray6080 in #3942
- Fix user defined type casting by @acquamarin in #3948
- Fix node create-delete in single statement by @andyfengHKU in #3955
- Fix the case of multiple same key exists in the same hash index slot by @ray6080 in #3956
- Fix update by @ray6080 in #3954
- Add codecov token by @mewim in #3958
- Fix nested column numValues sync by @andyfengHKU in #3959
- Add parameters to ci test by @mewim in #3957
- Fix drop table leading to incorrect reload tables by @ray6080 in #3962
- Add ldbc-1 dataset and download script by @mewim in #3960
- Rework rel table checkpoint by @ray6080 in #3963
- Fix mscv compilation by @ray6080 in #3964
- Fix wal read of larger than BUFFER_SIZE by @ray6080 in #3965
- Fix serial-update-trx by @andyfengHKU in #3966
- Allow map duplicate key by @acquamarin in #3967
- add sequence tests for docs by @yiyun-sj in #3971
- Map duplicate key copy by @acquamarin in #3970
- Added CopyFrom Physical Operator by @hamzakammar in #3973
- Avoid logging null mask in wal by @ray6080 in #3977
- Add auto checkpoint test to CI workflow by @ray6080 in #3978
- Unskip csv_to_parquet conversion tests on ldbc and lsqb by @ray6080 in #3972
- Re-enable primary key lookup under write transactions and turn on nightly build by @ray6080 in #3980
- Fix build-and-deploy workflow by @mewim in #3986
- More Tests Cases for MVCC by @yiyun-sj in #3953
- Add checkpoint statement by @andyfengHKU in #3983
- Add missing types to copy to parquet by @acquamarin in #3982
- Disable COPY FROM in manual transactions by @ray6080 in #3981
- Remove lock from checking visibility during index lookup by @ray6080 in #3990
- Fix JSON issues by @mxwli in #3974
- Reduce pandas scan sample size by @andyfengHKU in #3991
- Fix rollback replay by @ray6080 in #3999
- Improve extension file scan by @acquamarin in #3993
- Add JSON tests that reflect documentation examples by @mxwli in #3995
- Fix typo in JSON by @mxwli in #4003
- Fix parameter expression alias by @andyfengHKU in #4002
- Release cpp unique_ptr and manually create opaque pointer for C API tests by @mewim in #4005
- Fix extension issues by @acquamarin in #4011
- Fix attach database error message by @acquamarin in #4014
- Force npm to build from source when flag is set by @mewim in #4015
- Fix overflow of
sel_t
by @ray6080 in #4018 - Fix import export relative path by @andyfengHKU in #4016
- Fix bugs caught by tests by @mxwli in #4008
- Update join hint error message by @andyfengHKU in #4026
- Bump version to 0.5.0 by @mewim in #4027
New Contributors
- @sapalli2989 made their first contribution in #3567
Full Changelog: v0.4.0...v0.5.0