A rust crate providing a vector like struct that stores data as runs of identical values.
If your data consists of long stretches of identical values it can be beneficial to only store
the number of times each value occurs. This can result in significant space savings, but there
is a cost. Accessing an arbitrary index requires a binary search over the stored runs resulting
in a (log n)
complexity versus O(1)
for a normal vector. Other complexities are in the table
where n
is equal to the number of runs, not the length of a comparable Vec
.
push | index | set with breaking a run | set without breaking a run | insert with breaking a run | insert without breaking a run | |
---|---|---|---|---|---|---|
RleVec |
O(1) | O(log n) | O((log n) + 2n) | O(log n) | O((log n) + 2n) | O((log n) + n) |
Vec |
O(1) | O(1) | O(1)* | O(n) |
The RleVec
struct handles like a normal vector and supports a subset from the Vec
methods.
Put this in your Cargo.toml
:
[dependencies]
rle_vec = "0.4"
and this to your crate root:
extern crate rle_vec;
use rle_vec::RleVec;
let mut rle = RleVec::new();
rle.push(10);
rle.push(10);
rle.push(11);
assert_eq!(rle[1], 10);
assert_eq!(rle[2], 11);
rle.insert(1, 10);
assert_eq!(rle.runs_len(), 2);
rle.set(0, 10);
assert_eq!(rle.runs_len(), 3);
RleVec
can be constructed from Iterators
and be iterated over just like a Vec
.
use rle_vec::RleVec;
let v = vec![0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 4, 5, 4, 4, 4];
let mut rle: RleVec<_> = v.into_iter().collect();
assert_eq!(rle.len(), 15);
assert_eq!(rle.runs_len(), 7);
assert_eq!(rle.iter().nth(10), Some(&4));
You can get the value at an index.
use rle_vec::RleVec;
let v = vec![0, 0, 0, 1, 1, 1, 1, 2, 2, 3];
let mut rle: RleVec<_> = v.into_iter().collect();
rle.set(1, 2);
rle.insert(4, 4);
let v = rle.iter().cloned().collect::<Vec<_>>();
assert_eq!(v, vec![0, 2, 0, 1, 4, 1, 1, 1, 2, 2, 3]);
RleVec::set
and RleVec::insert
require T: Clone
.
Not all methods implemented on Vec
are implemented for RleVec
. All methods returning a slice
cannot work for RleVec
.
Serde support for serialization is available as a cargo
feature. You can specify the feature in the Cargo.toml
dependencies
section.
[dependencies]
rle_vec = { version = "0.4.0", features = ["serialize"] }
- Allocate gigantic vectors with a starting value and (randomly) update
positions under the assumption the data is going to remain sparse. The
update step will be slower than using a
Vec
. - Obviously the savings are bigger when the size of
T
inRleVec<T>
is large. - If you want to reduce an in memory structure before for example serializing.
Cargo bench
can be used to compare the real life difference of
get/set/insert/remove operations on a Vec
and RleVec
. Warning, some test
involves reallocations.
Create appears to be fastest from slice, unless the data is very sparse.
rle_create_1000_runs_of_10_values_from_iter ... bench: 19,561 ns/iter (+/- 342)
vec_create_1000_runs_of_10_values_from_iter ... bench: 22,221 ns/iter (+/- 582)
rle_create_1000_runs_of_10_values_from_slice ... bench: 6,076 ns/iter (+/- 324)
vec_create_1000_runs_of_10_values_from_slice ... bench: 894 ns/iter (+/- 45)
rle_create_10_000_equal_values_from_iter ... bench: 15 ns/iter (+/- 1)
vec_create_10_000_equal_values_from_iter ... bench: 7,683 ns/iter (+/- 547)
rle_create_10_000_equal_values_from_slice ... bench: 4,490 ns/iter (+/- 57)
vec_create_10_000_equal_values_from_slice ... bench: 898 ns/iter (+/- 48)
rle_create_10_000_unique_values_from_iter ... bench: 25,510 ns/iter (+/- 789)
vec_create_10_000_unique_values_from_slice ... bench: 891 ns/iter (+/- 47)
rle_create_10_000_unique_values_from_slice ... bench: 25,936 ns/iter (+/- 278)
vec_create_10_000_unique_values_from_iter ... bench: 921 ns/iter (+/- 25)
Random access takes the binary search penalty, but iterating performs reasonable.
rle_iterate_1000_runs_of_10_values ... bench: 19,773 ns/iter (+/- 481)
vec_iterate_1000_runs_of_10_values ... bench: 4,981 ns/iter (+/- 2,171)
rle_iterate_10_000_equal_values ... bench: 20,878 ns/iter (+/- 538)
vec_iterate_10_000_equal_values ... bench: 5,149 ns/iter (+/- 124)
rle_iterate_10_000_unique_values ... bench: 20,130 ns/iter (+/- 340)
vec_iterate_10_000_unique_values ... bench: 7,784 ns/iter (+/- 127)
rle_random_access_1000_runs_of_10_values ... bench: 34,999 ns/iter (+/- 632)
vec_random_access_1000_runs_of_10_values ... bench: 499 ns/iter (+/- 11)
Inserting data is competitive and can be faster if no run breaking is required.
rle_insert_runmids_breaking_1000_runs_of_10_values ... bench: 308,797 ns/iter (+/- 23,860)
rle_insert_runmids_non_breaking_1000_runs_of_10_values ... bench: 171,507 ns/iter (+/- 2,669)
vec_insert_runmids_1000_runs_of_10_values ... bench: 191,124 ns/iter (+/- 5,439)
Mutable indexing can have very different outcomes. Minimum cost is the binary
search, but depending on the inserted value Runs
can be merged or split.
test rle_set_runmids_breaking_1000_runs_of_10_values ... bench: 177,418 ns/iter (+/- 2,718)
test rle_set_runmids_non_breaking_1000_runs_of_10_values ... bench: 34,844 ns/iter (+/- 528)
test rle_set_runs_merging_1000_runs ... bench: 97,703 ns/iter (+/- 1,521)
test vec_set_runmids_1000_runs_of_10_values ... bench: 908 ns/iter (+/- 11)
test vec_set_runs_merging_1000_runs ... bench: 785 ns/iter (+/- 27)
Removing values can have very different outcomes. Minimum cost is the binary
search, but depending on the removed value Runs
can be merged or split. But
remove is also a lot more expensive for a Vec
.
test rle_remove_runmids_non_breaking_1000_runs_of_10_values ... bench: 184,023 ns/iter (+/- 5,367)
test rle_remove_runs_merging_1000_runs ... bench: 182,233 ns/iter (+/- 5,122)
test vec_remove_runmids_1000_runs_of_10_values ... bench: 270,981 ns/iter (+/- 6,258)
test vec_remove_runs_merging_1000_runs ... bench: 66,948 ns/iter (+/- 1,460)