Releases: KxSystems/arrowkdb
Release candidate for 1.4.1
Note: the 1.4.1-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Arrow only supports a single string array containing up to 2GB of data. If the kdb string/symbol list contains more than this amount of data then it has to be populated into an Arrow chunked array. Chunked arrays were already support by arrowkdb
when writing Arrow IPC files or streams, but not when when writing Parquet files.
Therefore, in order to support the used of chunked arrays when writing Parquet files, the ARROW_CHUNK_ROWS
option has been added to:
- pq.writeParquet
- pq.writeParquetFromTable
Note: This only applies to how kdb lists are chunked internally to the Parquet file writer. This is different to the row groups configuration (set using PARQUET_CHUNK_SIZE
) which controls how the Parquet file is structured when written.
Release candidate for 1.4.0
Note: the 1.4.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Enhancements include:
- New
COMPRESSION
option to specify the codec to use when writing Parquet files, IPC files or IPC streams. - Bug fix for handling float32 and float64 nulls when mapping to/from
0n
and0nf
.
Release candidate for 1.3.0
Note: the 1.3.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Enhancements include:
- New APIs for reading and writing Apache ORC files (Linux and macOS only). This includes NULL support via the
NULL_MAPPING
andWITH_NULL_BITMAP
options. - When building from source, arrrowkdb detects your libarrow version and selects c++14 (libarrow < 10.0) or c++17 (libarrow >= 10.0) as appropriate.
Release candidate for 1.2.0
Note: the 1.2.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Enhancements include:
- Support for converting kdbs null to arrow nulls when reading and writing via a new
NULL_MAPPING
option when:
- Reading and writing Parquet files
- Reading and writing Arrow IPC files
- Reading and writing Arrow IPC streams
- Support for reading the arrow bitmap as a separate structure via a new
WITH_NULL_BITMAP
option when:
- Reading Parquet files
- Reading Arrow IPC files
- Reading Arrow IPC streams
- Arrow IPC files and streams can be written with chunking via a new
ARROW_CHUNK_ROWS
option
Release candidate for 1.1.0
Note: the 1.1.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Enhancements include:
- Support multithreaded use of arrowkdb with peach
- Add support for reading Parquet files with row groups (chunking)
- Upgrade build to use libarrow and libparquet 9.0.0
- Support latest v2 Parquet file formats
New functions:
- pq.readParquetNumRowGroups
- pq.readParquetRowGroups
- pq.readParquetRowGroupsToTable
Release candidiate for 1.0.0
Note: the 1.0.0-rc.1 arrowkdb package was built against Apache Arrow version 5.0.0. If you have a different version of the libarrow runtime installed, it may be nessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Arrowkdb enhancements:
- Make the API more future proof and extensible by adding an options parameter to the read and write functions where it was not already present:
- pq.readParquetColumn
- ipc.writeArrow
- ipc.writeArrowFromTable
- ipc.serializeArrow
- ipc.serializeArrowFromTable
- ipc.parseArrowData
- ipc.parseArrowToTable
- ar.prettyPrintArray
- ar.prettyPrintArrayFromList
- tb.prettyPrintTable
- tb.prettyPrintTableFromTable
- Support mapping the Arrow decimal128 datatype to and from a kdb+ 9h list via the new option DECIMAL128_AS_DOUBLE.
Initial alpha release for version 1.0.0
Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like datasets in memory. This data format has a rich datatype system (included nested data types) designed to support the needs of analytic database systems, dataframe libraries, and more.
The arrowkdb
integration enables kdb+ users to read and write Arrow tables created from kdb+ data using:
- Parquet file format
- Arrow IPC record batch file format
- Arrow IPC record batch stream format
Currently Arrow supports over 35 datatypes including concrete, parameterized and nested datatypes. Each Arrow datatype is mapped to a kdb+ type and arrowkdb
can seamlessly convert between both representations.
Separate APIs are provided where the Arrow table is either created from a kdb+ table using an inferred schema or from an Arrow schema and the table’s list of array data.
- Inferred schemas. If you are less familiar with Arrow or do not wish to use the more complex or nested Arrow datatypes,
arrowkdb
can infer the schema from a kdb+ table where each column in the table is mapped to a field in the schema. - Constructed schemas. Although inferred schemas are easy to use, they support only a subset of the Arrow datatypes and are considerably less flexible. Where more complex schemas are required then these should be manually constructed using the datatype/field/schema constructor functions which
arrowkdb
exposes, similar to the C++ Arrow library and PyArrow.