Releases · KxSystems/arrowkdb

21 May 17:34

1.4.1-rc.1

33f5c95

Latest

Note: the 1.4.1-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Arrow only supports a single string array containing up to 2GB of data. If the kdb string/symbol list contains more than this amount of data then it has to be populated into an Arrow chunked array. Chunked arrays were already support by arrowkdb when writing Arrow IPC files or streams, but not when when writing Parquet files.

Therefore, in order to support the used of chunked arrays when writing Parquet files, the ARROW_CHUNK_ROWS option has been added to:

pq.writeParquet
pq.writeParquetFromTable

Note: This only applies to how kdb lists are chunked internally to the Parquet file writer. This is different to the row groups configuration (set using PARQUET_CHUNK_SIZE) which controls how the Parquet file is structured when written.

Assets 5

12 Sep 13:55

nmcdonnell-kx

1.4.0-rc.1

72d7290

Release candidate for 1.4.0

Note: the 1.4.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

New COMPRESSION option to specify the codec to use when writing Parquet files, IPC files or IPC streams.
Bug fix for handling float32 and float64 nulls when mapping to/from 0n and 0nf.

Assets 5

04 Apr 09:03

nmcdonnell-kx

1.3.0-rc.1

c864f83

Release candidate for 1.3.0

Note: the 1.3.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

New APIs for reading and writing Apache ORC files (Linux and macOS only). This includes NULL support via the NULL_MAPPING and WITH_NULL_BITMAP options.
When building from source, arrrowkdb detects your libarrow version and selects c++14 (libarrow < 10.0) or c++17 (libarrow >= 10.0) as appropriate.

Assets 5

15 Mar 08:56

nmcdonnell-kx

1.2.0-rc.1

0c4e452

Release candidate for 1.2.0

Note: the 1.2.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

Support for converting kdbs null to arrow nulls when reading and writing via a new NULL_MAPPING option when:

Reading and writing Parquet files
Reading and writing Arrow IPC files
Reading and writing Arrow IPC streams

Support for reading the arrow bitmap as a separate structure via a new WITH_NULL_BITMAP option when:

Reading Parquet files
Reading Arrow IPC files
Reading Arrow IPC streams

Arrow IPC files and streams can be written with chunking via a new ARROW_CHUNK_ROWS option

Assets 5

01 Nov 15:04

nmcdonnell-kx

1.1.0-rc.1

73ef9fc

Release candidate for 1.1.0

Note: the 1.1.0-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Enhancements include:

Support multithreaded use of arrowkdb with peach
Add support for reading Parquet files with row groups (chunking)
Upgrade build to use libarrow and libparquet 9.0.0
Support latest v2 Parquet file formats

New functions:

pq.readParquetNumRowGroups
pq.readParquetRowGroups
pq.readParquetRowGroupsToTable

Assets 5

29 Jul 16:03

nmcdonnell-kx

1.0.0-rc.1

751d6e6

Release candidiate for 1.0.0

Note: the 1.0.0-rc.1 arrowkdb package was built against Apache Arrow version 5.0.0. If you have a different version of the libarrow runtime installed, it may be nessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).

Arrowkdb enhancements:

Make the API more future proof and extensible by adding an options parameter to the read and write functions where it was not already present:

pq.readParquetColumn
ipc.writeArrow
ipc.writeArrowFromTable
ipc.serializeArrow
ipc.serializeArrowFromTable
ipc.parseArrowData
ipc.parseArrowToTable
ar.prettyPrintArray
ar.prettyPrintArrayFromList
tb.prettyPrintTable
tb.prettyPrintTableFromTable

Support mapping the Arrow decimal128 datatype to and from a kdb+ 9h list via the new option DECIMAL128_AS_DOUBLE.

Assets 5

25 Feb 13:49

nmcdonnell-kx

1.0.0-alpha

4ddd019

Initial alpha release for version 1.0.0 Pre-release

Pre-release

Apache Arrow is its in-memory columnar format, a standardized, language-agnostic specification for representing structured, table-like datasets in memory. This data format has a rich datatype system (included nested data types) designed to support the needs of analytic database systems, dataframe libraries, and more.

The arrowkdb integration enables kdb+ users to read and write Arrow tables created from kdb+ data using:

Parquet file format
Arrow IPC record batch file format
Arrow IPC record batch stream format

Currently Arrow supports over 35 datatypes including concrete, parameterized and nested datatypes. Each Arrow datatype is mapped to a kdb+ type and arrowkdb can seamlessly convert between both representations.

Separate APIs are provided where the Arrow table is either created from a kdb+ table using an inferred schema or from an Arrow schema and the table’s list of array data.

Inferred schemas. If you are less familiar with Arrow or do not wish to use the more complex or nested Arrow datatypes, arrowkdb can infer the schema from a kdb+ table where each column in the table is mapped to a field in the schema.
Constructed schemas. Although inferred schemas are easy to use, they support only a subset of the Arrow datatypes and are considerably less flexible. Where more complex schemas are required then these should be manually constructed using the datatype/field/schema constructor functions which arrowkdb exposes, similar to the C++ Arrow library and PyArrow.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: KxSystems/arrowkdb

Release candidate for 1.4.1

Release candidate for 1.4.0

Release candidate for 1.3.0

Release candidate for 1.2.0

Release candidate for 1.1.0

Release candidiate for 1.0.0

Initial alpha release for version 1.0.0