Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving DataStore leads to crash when component columns exceed 1B elements #3097

Open
jleibs opened this issue Aug 24, 2023 · 1 comment
Open
Labels
🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself

Comments

@jleibs
Copy link
Member

jleibs commented Aug 24, 2023

Describe the bug
When trying to write an rrd or blueprint to disk it is possible to end up in a state where we encounter an error such as:

Err value: Overflow', arrow2-0.17.1/src/array/growable/binary.rs:73

See: #3010

This can happen with anything that uses offset arrays, so either ListArray or BinaryArrays, though BinaryArrays are the most likely as it can be seen with only 1GB of data.

The root cause of the problem is that when we try to save from the viewer we concatenate all of the data-cell slices for a given component. The way in which the slice overflow check is done (https://github.com/jorgecarleitao/arrow2/blob/main/src/offset.rs#L288) considers the worst-case concatenation of the slice, even if the sliced region is significantly smaller and would not result in an actual overflow.

Additionally, because Arrow uses i32 for offsets, we overflow at 2B and so any 2 slices from source-arrays with 1B elements will overflow. We probably want to consider using LargeBinaryArrays (i64 offsets) to future proof against this. Alternatively we could guard against this by (1) only compacting down to 2B element chunks and (2) never re-compacting from large slices.

@jleibs jleibs added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself labels Aug 24, 2023
@jleibs jleibs changed the title DataStore cannot be re-saved once component columns exceed 1B elements Saving DataStore leads to crash when component columns exceed 1B elements Aug 24, 2023
@Wumpf
Copy link
Member

Wumpf commented Feb 17, 2025

We still have this issue, although now that we're on arrow1 the issue looks a bit different. Here's an example

File "/home/pablo/0Dev/personal/assembly-hands/.pixi/envs/default/lib/python3.11/site-packages/rerun_sdk/rerun/datatypes/blob_ext.py", line 75, in native_to_pa_array_override
    return pa.ListArray.from_arrays(offsets, inner, type=data_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 2557, in pyarrow.lib.ListArray.from_arrays
  File "pyarrow/array.pxi", line 402, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 372, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Value 2795187770 too large to fit in C integer type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

No branches or pull requests

2 participants