Saving DataStore leads to crash when component columns exceed 1B elements #3097

jleibs · 2023-08-24T19:18:54Z

Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:

Err value: Overflow', arrow2-0.17.1/src/array/growable/binary.rs:73

See: #3010

This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.

The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.

Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.

The text was updated successfully, but these errors were encountered:

Wumpf · 2025-02-17T13:06:40Z

We still have this issue, although now that we're on arrow1 the issue looks a bit different. Here's an example

File "/home/pablo/0Dev/personal/assembly-hands/.pixi/envs/default/lib/python3.11/site-packages/rerun_sdk/rerun/datatypes/blob_ext.py", line 75, in native_to_pa_array_override
    return pa.ListArray.from_arrays(offsets, inner, type=data_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 2557, in pyarrow.lib.ListArray.from_arrays
  File "pyarrow/array.pxi", line 402, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 372, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value 2795187770 too large to fit in C integer type

jleibs added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself labels Aug 24, 2023

jleibs changed the title ~~DataStore cannot be re-saved once component columns exceed 1B elements~~ Saving DataStore leads to crash when component columns exceed 1B elements Aug 24, 2023

This was referenced Aug 24, 2023

Blueprints file should be flattened during save #3098

Closed

Rerun visualizer crashes shortly after start with Err value: Overflow', arrow2-0.17.1/src/array/growable/binary.rs:73 #3010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saving DataStore leads to crash when component columns exceed 1B elements #3097

Saving DataStore leads to crash when component columns exceed 1B elements #3097

jleibs commented Aug 24, 2023

Wumpf commented Feb 17, 2025

Saving DataStore leads to crash when component columns exceed 1B elements #3097

Saving DataStore leads to crash when component columns exceed 1B elements #3097

Comments

jleibs commented Aug 24, 2023

Wumpf commented Feb 17, 2025