Skip to content

Commit

Permalink
Set max number of chunks per layer (#332)
Browse files Browse the repository at this point in the history
Set the maximum number of separate chunks/row groups to allow splitting
an input layer into.

Deck.gl can pick from a maximum of 256 layers, and a user could have
many layers, so we don't want to use too many layers per data file.

This is for #282
though isn't a complete solution, because many separate user-facing
layers could still add up to more than 256 pickable deck layers.
  • Loading branch information
kylebarron authored Jan 26, 2024
1 parent 2202230 commit b708317
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions lonboard/_serialization.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
# Target chunk size for Arrow (uncompressed) per Parquet chunk
DEFAULT_ARROW_CHUNK_BYTES_SIZE = 5 * 1024 * 1024 # 5MB

# Maximum number of separate chunks/row groups to allow splitting an input layer into
# Deck.gl can pick from a maximum of 256 layers, and a user could have many layers, so
# we don't want to use too many layers per data file.
DEFAULT_MAX_NUM_CHUNKS = 32


def serialize_table_to_parquet(table: pa.Table, *, max_chunksize: int) -> List[bytes]:
buffers: List[bytes] = []
Expand Down Expand Up @@ -75,7 +80,12 @@ def serialize_table(data, obj):


def infer_rows_per_chunk(table: pa.Table) -> int:
# At least one chunk
num_chunks = max(round(table.nbytes / DEFAULT_ARROW_CHUNK_BYTES_SIZE), 1)

# Clamp to the maximum number of chunks
num_chunks = min(num_chunks, DEFAULT_MAX_NUM_CHUNKS)

rows_per_chunk = math.ceil((table.num_rows / num_chunks))
return rows_per_chunk

Expand Down

0 comments on commit b708317

Please sign in to comment.