You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.
The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.
Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.
The text was updated successfully, but these errors were encountered:
jleibs
changed the title
DataStore cannot be re-saved once component columns exceed 1B elements
Saving DataStore leads to crash when component columns exceed 1B elements
Aug 24, 2023
We still have this issue, although now that we're on arrow1 the issue looks a bit different. Here's an example
File "/home/pablo/0Dev/personal/assembly-hands/.pixi/envs/default/lib/python3.11/site-packages/rerun_sdk/rerun/datatypes/blob_ext.py", line 75, in native_to_pa_array_override
return pa.ListArray.from_arrays(offsets, inner, type=data_type)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/array.pxi", line 2557, in pyarrow.lib.ListArray.from_arrays
File "pyarrow/array.pxi", line 402, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 372, in pyarrow.lib.array
File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value 2795187770 too large to fit in C integer type
Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:
See: #3010
This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.
The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.
Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.
The text was updated successfully, but these errors were encountered: