Skip to content

hivtools/blob-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

blob-test

This repo contains a simple example of querying a duckdb file from an azure file store vs blob storage.

I'm interested how the performance between these storage options is different. So this has a speed test.

You can run the blob storage test locally but the file store is harder, really I want to test this in a deployed container so that we can see in a deployed production app what the performance difference is.

Running

Run locally with gradle

AZURE_STORAGE_CONNECTION_STRING="<connection string>" ./gradlew run path_file_store_duckdb $blob_container_name $blob_name

Build docker image

docker build -t ghcr.io/hivtools/blob-test:latest .

Run docker image

docker run --rm -it \
  -e AZURE_STORAGE_CONNECTION_STRING="<connection string>" \
  ghcr.io/hivtools/blob-test:latest \
  java -jar run.jar $path_file_store_duckdb $blob_container_name $blob_name

Running on Azure

It is probably easiest to run this on azure using a container instance via the CLI

az container create \
    --resource-group "nmHint-RG" \
    --azure-file-volume-account-name "<storage-account-name>" \
    --azure-file-volume-account-key "<storage-account-key>" \
    --azure-file-volume-share-name "results-share" \
    --azure-file-volume-mount-path "/path/in/container" \
    --name "blob-speed-test" \
    --os-type "Linux" \
    --cpu 2 \
    --memory 4 \
    --image ghcr.io/hivtools/blob-test:latest \
    --environment-variables AZURE_STORAGE_CONNECTION_STRING="connection string" \
    --command-line "java -jar run.jar /data/plot_data217ca92ffe.duckdb duckdb plot_data217ca92ffe.duckdb"

Results

This is from running on azure as a container instance

2025-01-02 17:27:38 [main] INFO  - File share vs blob store time comparison
2025-01-02 17:27:38 [main] INFO  - Timing file share
2025-01-02 17:27:39 [main] INFO  - Run #1: 592 ms
2025-01-02 17:27:39 [main] INFO  - Run #2: 16 ms
2025-01-02 17:27:39 [main] INFO  - Run #3: 18 ms
2025-01-02 17:27:39 [main] INFO  - Run #4: 23 ms
2025-01-02 17:27:39 [main] INFO  - Run #5: 17 ms
2025-01-02 17:27:39 [main] INFO  - Run #6: 22 ms
2025-01-02 17:27:39 [main] INFO  - Run #7: 18 ms
2025-01-02 17:27:39 [main] INFO  - Run #8: 21 ms
2025-01-02 17:27:39 [main] INFO  - Run #9: 12 ms
2025-01-02 17:27:39 [main] INFO  - Run #10: 13 ms
2025-01-02 17:27:39 [main] INFO  - Summary: mean=75.2ms, stddev=172.30020313394874ms, n=10 {meanTimeInMillis=75.2, stdDevInMillis=172.30020313394874}
2025-01-02 17:27:39 [main] INFO  - Timing blob store
2025-01-02 17:27:42 [main] INFO  - Run #1: 3153 ms
2025-01-02 17:27:43 [main] INFO  - Run #2: 436 ms
2025-01-02 17:27:43 [main] INFO  - Run #3: 481 ms
2025-01-02 17:27:43 [main] INFO  - Run #4: 376 ms
2025-01-02 17:27:44 [main] INFO  - Run #5: 414 ms
2025-01-02 17:27:44 [main] INFO  - Run #6: 378 ms
2025-01-02 17:27:45 [main] INFO  - Run #7: 381 ms
2025-01-02 17:27:45 [main] INFO  - Run #8: 245 ms
2025-01-02 17:27:45 [main] INFO  - Run #9: 244 ms
2025-01-02 17:27:45 [main] INFO  - Run #10: 259 ms
2025-01-02 17:27:45 [main] INFO  - Summary: mean=636.7ms, stddev=842.4046592938575ms, n=10 {meanTimeInMillis=636.7, stdDevInMillis=842.4046592938575}
2025-01-02 17:27:45 [main] INFO  - Timing local read
2025-01-02 17:27:46 [main] INFO  - Run #1: 22 ms
2025-01-02 17:27:46 [main] INFO  - Run #2: 22 ms
2025-01-02 17:27:46 [main] INFO  - Run #3: 21 ms
2025-01-02 17:27:46 [main] INFO  - Run #4: 20 ms
2025-01-02 17:27:46 [main] INFO  - Run #5: 20 ms
2025-01-02 17:27:46 [main] INFO  - Run #6: 19 ms
2025-01-02 17:27:46 [main] INFO  - Run #7: 21 ms
2025-01-02 17:27:46 [main] INFO  - Run #8: 16 ms
2025-01-02 17:27:46 [main] INFO  - Run #9: 21 ms
2025-01-02 17:27:46 [main] INFO  - Run #10: 21 ms
2025-01-02 17:27:46 [main] INFO  - Summary: mean=20.3ms, stddev=1.676305461424021ms, n=10 {meanTimeInMillis=20.3, stdDevInMillis=1.676305461424021}
2025-01-02 17:27:46 [main] INFO  - Timing tests complete

So the time on the file share is really similar to the time taken on reading from a local file. Except for the first call. It does rule out blob store, looks fairly conclusive that that won't be faster.

But that suggests that in a file store setting the file share is missing the cache, as we don't have the same performance as we are seeing here on subsequent calls. That is a little disappointing.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published