Crunchy Bridge for Analytics allows users to query parquet files on S3 directly.
- Setup the cluster:
- Go to https://crunchybridge.com/
- Click
Create Cluster
->Create Analytics Cluster
- Choose Region (
eu-central-1
) wheres3://clickhouse-public-datasets/hits_compatible/hits.parquet
is - Click
Analytics-256
- Click
Create Cluster
- Step Two: Set Up Analytics Credentials: Click "Skip for now"
- Wait until the state of the machine becomes "Running"
- Setup a VM on
aws
in the same region as the clustereu-central-1
.
This is to make sure the latency between the server and the client is not high. We are going to need psql
on this VM, so you should install sudo yum install -y postgresql16
etc. depending on the linux distro.
- Get the application connection strings:
3.1) Application connection
- Click the "Connection" tab from the left menu
- Pick role: application, Format psql
- Click "Copy"
Set the APPCONNCMD
that we are going to use with what you copied above:
export APPCONNCMD='psql postgres://application:XXXX@XXXXX.postgresbridge.com:5432/postgres'
3.2) Get the postgres connection string:
- Click the "Connection" tab from the left menu
- Pick role: postgres, Format psql
- Click "Copy"
Set the SUPERUSERCONNCMD
that we are going to use with what you copied above:
export SUPERUSERCONNCMD='psql postgres://postgres:XXXX@XXXX.postgresbridge.com:5432/postgres'
- Run the script:
./run.sh
For the cold run, we directly access to S3 while running the queries. For the warm runs, we first download the file from S3 to a local cache drive, then run the queries. This logic is coded into run.sh
script.