Skip to content

Commit

Permalink
Root project is vortex-array (#67)
Browse files Browse the repository at this point in the history
  • Loading branch information
robert3005 authored Mar 5, 2024
1 parent db705e1 commit d784211
Show file tree
Hide file tree
Showing 113 changed files with 86 additions and 99 deletions.
58 changes: 29 additions & 29 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ members = [
"codecz-sys",
"fastlanez-sys",
"pyvortex",
"vortex",
"vortex-array",
"vortex-alloc",
"vortex-alp",
"vortex-dict",
Expand Down
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
# Vortex

[![Build Status](https://github.com/fulcrum-so/vortex/actions/workflows/rust.yml/badge.svg)](https://github.com/fulcrum-so/vortex/actions)
[![Crates.io](https://img.shields.io/crates/v/vortex-array.svg)](https://crates.io/crates/vortex-array)
[![Documentation](https://docs.rs/vortex-rs/badge.svg)](https://docs.rs/vortex-array)
[![Rust](https://img.shields.io/badge/rust-1.76.0%2B-blue.svg?maxAge=3600)](https://github.com/fulcrum-so/vortex)

An in-memory format for 1-dimensional array data.

Vortex is a maximally [Apache Arrow](https://arrow.apache.org/) compatible data format that aims to separate logical and physical representation of data, and allow pluggable physical layout.
Vortex is a maximally [Apache Arrow](https://arrow.apache.org/) compatible data format that aims to separate logical and
physical representation of data, and allow pluggable physical layout.

Array operations are separately defined in terms of their semantics, dealing only with logical types and physical layout that defines exact ways in which values are transformed.
Array operations are separately defined in terms of their semantics, dealing only with logical types and physical layout
that defines exact ways in which values are transformed.

# Logical Types

Vortex type system only conveys semantic meaning of the array data without prescribing physical layout. When operating over arrays you can focus on semantics of the operation. Separately you can provide low level implementation dependent on particular physical operation.
Vortex type system only conveys semantic meaning of the array data without prescribing physical layout. When operating
over arrays you can focus on semantics of the operation. Separately you can provide low level implementation dependent
on particular physical operation.

```
Null: all null array
Expand All @@ -27,10 +36,15 @@ Struct: Named tuple of types

# Physical Encodings

Vortex calls array implementations encodings, they encode the physical layout of the data. Encodings are recurisvely nested, i.e. encodings contain other encodings. For every array you have their value data type and the its encoding that defines how operations will be performed. By default necessary encodings to zero copy convert to and from Apache Arrow are included in the package.
Vortex calls array implementations encodings, they encode the physical layout of the data. Encodings are recurisvely
nested, i.e. encodings contain other encodings. For every array you have their value data type and the its encoding that
defines how operations will be performed. By default necessary encodings to zero copy convert to and from Apache Arrow
are included in the package.

When performing operations they're disptached on the encodings to provide specialized implementation.
When performing operations they're dispatched on the encodings to provide specialized implementation.

## Compression

The advantage of separating physical layout from the semantic of the data is compression. Vortex can compress data without requiring changes to the logical operations. To support efficient data access we focus on lightweight compression algorithms only falling back to general purpose compressors for binary data.
The advantage of separating physical layout from the semantic of the data is compression. Vortex can compress data
without requiring changes to the logical operations. To support efficient data access we focus on lightweight
compression algorithms only falling back to general purpose compressors for binary data.
2 changes: 1 addition & 1 deletion bench-vortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ workspace = true

[dependencies]
arrow-array = "50.0.0"
vortex = { path = "../vortex" }
vortex-array = { path = "../vortex-array" }
vortex-alp = { path = "../vortex-alp" }
vortex-dict = { path = "../vortex-dict" }
vortex-fastlanes = { path = "../vortex-fastlanes" }
Expand Down
8 changes: 2 additions & 6 deletions bench-vortex/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
use itertools::Itertools;
use vortex::array::bool::BoolEncoding;
use vortex::array::chunked::ChunkedEncoding;
use vortex::array::constant::ConstantEncoding;
Expand All @@ -18,7 +17,7 @@ use vortex_roaring::{RoaringBoolEncoding, RoaringIntEncoding};
use vortex_zigzag::ZigZagEncoding;

pub fn enumerate_arrays() -> Vec<&'static dyn Encoding> {
let encodings: Vec<&dyn Encoding> = vec![
vec![
// TODO(ngates): fix https://github.com/fulcrum-so/vortex/issues/35
// Builtins
&BoolEncoding,
Expand All @@ -41,9 +40,7 @@ pub fn enumerate_arrays() -> Vec<&'static dyn Encoding> {
&RoaringBoolEncoding,
&RoaringIntEncoding,
&ZigZagEncoding,
];
println!("{}", encodings.iter().map(|e| e.id()).format(", "));
encodings
]
}

#[cfg(test)]
Expand Down Expand Up @@ -96,7 +93,6 @@ mod test {

#[test]
fn compression_ratio() {
enumerate_arrays();
setup_logger();

let file = File::open(download_taxi_data()).unwrap();
Expand Down
2 changes: 1 addition & 1 deletion codecz-sys/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ workspace = true

[dependencies]
safe-transmute = "0.11.2"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }

[build-dependencies]
bindgen = "0.69.1"
Expand Down
4 changes: 2 additions & 2 deletions codecz/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@ enum-display = "0.1.3"
paste = "1.0.14"
safe-transmute = "0.11.2"
thiserror = "1.0.56"
codecz-sys = { version = "0.1.0", path = "../codecz-sys" }
codecz-sys = { path = "../codecz-sys" }
half = "2.3.1"
arrow-buffer = "50.0.0"
itertools = "0.12.1"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }

[dependencies.num-traits]
version = "0.2"
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ packages = ["dummy"] # Required for workspace project
[tool.rye]
managed = true
dev-dependencies = [
"pytest==7.4.0",
"pytest>=7.4.0",
"pytest-benchmark>=4.0.0",
"ruff>=0.1.11",
"pip>=23.3.2",
Expand Down
2 changes: 1 addition & 1 deletion pyvortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ crate-type = ["rlib", "cdylib"]

[dependencies]
arrow = { version = "50.0.0", features = ["ffi"] }
vortex = { path = "../vortex" }
vortex-array = { path = "../vortex-array" }
vortex-alp = { path = "../vortex-alp" }
vortex-dict = { path = "../vortex-dict" }
vortex-fastlanes = { path = "../vortex-fastlanes" }
Expand Down
1 change: 0 additions & 1 deletion pyvortex/test/test_array.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import pyarrow as pa
import pytest

import vortex


Expand Down
1 change: 0 additions & 1 deletion pyvortex/test/test_compress.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import numpy as np
import pyarrow as pa

import vortex


Expand Down
3 changes: 1 addition & 2 deletions pyvortex/test/test_serde.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import pyarrow as pa
from pyarrow import fs

import vortex
from pyarrow import fs

local = fs.LocalFileSystem()

Expand Down
4 changes: 1 addition & 3 deletions requirements-dev.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# pre: false
# features: []
# all-features: false
# with-sources: false

-e file:pyvortex
-e file:.
Expand Down Expand Up @@ -36,7 +37,6 @@ pathspec==0.12.1
platformdirs==4.2.0
pluggy==1.4.0
py-cpuinfo==9.0.0
py-spy==0.3.14
pyarrow==15.0.0
pygments==2.17.2
pymdown-extensions==10.7
Expand All @@ -50,8 +50,6 @@ regex==2023.12.25
requests==2.31.0
ruff==0.2.2
six==1.16.0
snakeviz==2.2.0
tornado==6.4
urllib3==2.2.1
verspec==0.1.0
watchdog==4.0.0
Expand Down
1 change: 1 addition & 0 deletions requirements.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
# pre: false
# features: []
# all-features: false
# with-sources: false

-e file:pyvortex
-e file:.
4 changes: 2 additions & 2 deletions vortex-alp/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ rust-version = { workspace = true }

[dependencies]
arrow = { version = "50.0.0" }
vortex = { "path" = "../vortex" }
vortex-array = { path = "../vortex-array" }
linkme = "0.3.22"
itertools = "0.12.1"
codecz = { version = "0.1.0", path = "../codecz" }
codecz = { path = "../codecz" }
log = { version = "0.4.20", features = [] }

[lints]
Expand Down
4 changes: 2 additions & 2 deletions vortex/Cargo.toml → vortex-array/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[package]
name = "vortex"
name = "vortex-array"
version = { workspace = true }
description = "Vortex in memory columnar data format"
homepage = { workspace = true }
Expand Down Expand Up @@ -37,5 +37,5 @@ polars-ops = { version = "0.37.0", features = ["search_sorted"] }
rand = { version = "0.8.5", features = [] }
rayon = "1.8.1"
roaring = "0.10.3"
vortex-alloc = { version = "0.1.0", path = "../vortex-alloc" }
vortex-alloc = { path = "../vortex-alloc" }
thiserror = "1.0.57"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,8 @@ impl<'a, T: NativePType> StatsCompute for NullableValues<'a, T> {

if first_non_null.is_none() {
return Ok(StatsSet::from(HashMap::from([
(Stat::Min, NullableScalar::None(T::PTYPE.into()).boxed()),
(Stat::Max, NullableScalar::None(T::PTYPE.into()).boxed()),
(Stat::Min, NullableScalar::none(T::PTYPE.into()).boxed()),
(Stat::Max, NullableScalar::none(T::PTYPE.into()).boxed()),
(Stat::IsConstant, true.into()),
(Stat::IsSorted, true.into()),
(Stat::IsStrictSorted, true.into()),
Expand Down Expand Up @@ -205,7 +205,7 @@ mod test {
bit_width_freq,
vec![
0u64, 1, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0,
]
);
assert_eq!(run_count, 5);
Expand All @@ -228,4 +228,13 @@ mod test {
assert_eq!(min, Some(1));
assert_eq!(max, Some(2));
}

#[test]
fn all_null() {
let arr = PrimitiveArray::from_iter(vec![Option::<i32>::None, None, None]);
let min: Option<i32> = arr.stats().get_or_compute_as(&Stat::Min);
let max: Option<i32> = arr.stats().get_or_compute_as(&Stat::Max);
assert_eq!(min, None);
assert_eq!(max, None);
}
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion vortex-dict/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ rust-version = { workspace = true }

[dependencies]
ahash = "0.8.7"
vortex = { "path" = "../vortex" }
vortex-array = { path = "../vortex-array" }
half = { version = "2.3.1", features = ["std", "num-traits"] }
hashbrown = "0.14.3"
linkme = "0.3.22"
Expand Down
Loading

0 comments on commit d784211

Please sign in to comment.