User-Defined Aggregate Functions (UDAFs) and non-aggregate (scalar) functions (UDFs) for BigQuery SQL engine.
DataSketches are probabilistic data structures that can process massive amounts of data and return very accurate results with a small memory footprint. Because of this, DataSketches are particularly useful for "big data" use cases such as streaming analytics and data warehousing.
Please visit the main Apache DataSketches website for more information about DataSketches library.
If you are interested in making contributions to this project please see our Community page for how to contact us.
- Requires Emscripten (emcc compiler)
git clone https://github.com/emscripten-core/emsdk.git \ && cd emsdk \ && ./emsdk install latest \ && ./emsdk activate latest \ && source ./emsdk_env.sh \ && cd ..
- Requires a link to datasketches-cpp in this repository
# Run the following if you've already cloned this repo git submodule update --init --recursive
# Otherwise clone this repo with --recursive flag git clone --recursive https://github.com/apache/datasketches-bigquery.git
- Requires make utility
- Requires Google Cloud CLI
curl https://sdk.cloud.google.com | bash
- Requires npm and @dataform/cli package
npm install -g @dataform/cli
- Requires setting the following environment variables to your own values:
export JS_BUCKET= # GCS bucket to hold compiled artifacts (must include gs://) export BQ_PROJECT= # location of stored SQL functions (routines) export BQ_DATASET= # location of stored SQL functions (routines) export BQ_LOCATION= # location of BQ_DATASET
On Google Cloud Build
Run the following steps in this repo's root directory to install everything via Cloud Build:
gcloud builds submit \
--project=$BQ_PROJECT \
--substitutions=_BQ_LOCATION=$BQ_LOCATION,_BQ_DATASET=$BQ_DATASET,_JS_BUCKET=$JS_BUCKET \
.
On your local machine
Run the following steps in this repo's root directory to install everything:
gcloud auth application-default login # for authentication
make # performs compilation
make install # upload to $JS_BUCKET & create functions in $BQ_PROJECT.$BQ_DATASET
make test # runs predefined tests in BQ
To install a specific sketch, change into an individual sketch directory and run the following:
gcloud auth application-default login # for authentication
make # performs compilation
make install # upload to $JS_BUCKET
make create # create functions in $BQ_PROJECT.$BQ_DATASET