Note: the 1.4.1-rc.1 arrowkdb package was built against Apache Arrow version 9.0.0. If you have a different version of the libarrow runtime installed, it may be necessary to build arrowkdb from source in order to support that version (instructions to build arrowkdb from source are in the README.md).
Arrow only supports a single string array containing up to 2GB of data. If the kdb string/symbol list contains more than this amount of data then it has to be populated into an Arrow chunked array. Chunked arrays were already support by arrowkdb
when writing Arrow IPC files or streams, but not when when writing Parquet files.
Therefore, in order to support the used of chunked arrays when writing Parquet files, the ARROW_CHUNK_ROWS
option has been added to:
- pq.writeParquet
- pq.writeParquetFromTable
Note: This only applies to how kdb lists are chunked internally to the Parquet file writer. This is different to the row groups configuration (set using PARQUET_CHUNK_SIZE
) which controls how the Parquet file is structured when written.