Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 894 Bytes

README.md

File metadata and controls

30 lines (21 loc) · 894 Bytes

uniq-ch

A Rust library for counting distinct elements in a stream, using ClickHouse uniq data structure.

This uses BJKST, a probabilistic algorithm that relies on adaptive sampling and provides fast, accurate and deterministic results. Two BJKSTs can be merged, making the data structure well suited for map-reduce settings.

Documentation

Examples

use uniq_ch::Bjkst;

let mut bjkst = Bjkst::new();

// Add some elements, with duplicates.
bjkst.extend(0..75_000);
bjkst.extend(25_000..100_000);

// Count the distinct elements.
assert!((99_000..101_000).contains(&bjkst.len()));