This repo contains a simple example of querying a duckdb file from an azure file store vs blob storage.
I'm interested how the performance between these storage options is different. So this has a speed test.
You can run the blob storage test locally but the file store is harder, really I want to test this in a deployed container so that we can see in a deployed production app what the performance difference is.
Run locally with gradle
AZURE_STORAGE_CONNECTION_STRING="<connection string>" ./gradlew run path_file_store_duckdb $blob_container_name $blob_name
Build docker image
docker build -t ghcr.io/hivtools/blob-test:latest .
Run docker image
docker run --rm -it \
-e AZURE_STORAGE_CONNECTION_STRING="<connection string>" \
ghcr.io/hivtools/blob-test:latest \
java -jar run.jar $path_file_store_duckdb $blob_container_name $blob_name
It is probably easiest to run this on azure using a container instance via the CLI
az container create \
--resource-group "nmHint-RG" \
--azure-file-volume-account-name "<storage-account-name>" \
--azure-file-volume-account-key "<storage-account-key>" \
--azure-file-volume-share-name "results-share" \
--azure-file-volume-mount-path "/path/in/container" \
--name "blob-speed-test" \
--os-type "Linux" \
--cpu 2 \
--memory 4 \
--image ghcr.io/hivtools/blob-test:latest \
--environment-variables AZURE_STORAGE_CONNECTION_STRING="connection string" \
--command-line "java -jar run.jar /data/plot_data217ca92ffe.duckdb duckdb plot_data217ca92ffe.duckdb"
This is from running on azure as a container instance
2025-01-02 17:27:38 [main] INFO - File share vs blob store time comparison
2025-01-02 17:27:38 [main] INFO - Timing file share
2025-01-02 17:27:39 [main] INFO - Run #1: 592 ms
2025-01-02 17:27:39 [main] INFO - Run #2: 16 ms
2025-01-02 17:27:39 [main] INFO - Run #3: 18 ms
2025-01-02 17:27:39 [main] INFO - Run #4: 23 ms
2025-01-02 17:27:39 [main] INFO - Run #5: 17 ms
2025-01-02 17:27:39 [main] INFO - Run #6: 22 ms
2025-01-02 17:27:39 [main] INFO - Run #7: 18 ms
2025-01-02 17:27:39 [main] INFO - Run #8: 21 ms
2025-01-02 17:27:39 [main] INFO - Run #9: 12 ms
2025-01-02 17:27:39 [main] INFO - Run #10: 13 ms
2025-01-02 17:27:39 [main] INFO - Summary: mean=75.2ms, stddev=172.30020313394874ms, n=10 {meanTimeInMillis=75.2, stdDevInMillis=172.30020313394874}
2025-01-02 17:27:39 [main] INFO - Timing blob store
2025-01-02 17:27:42 [main] INFO - Run #1: 3153 ms
2025-01-02 17:27:43 [main] INFO - Run #2: 436 ms
2025-01-02 17:27:43 [main] INFO - Run #3: 481 ms
2025-01-02 17:27:43 [main] INFO - Run #4: 376 ms
2025-01-02 17:27:44 [main] INFO - Run #5: 414 ms
2025-01-02 17:27:44 [main] INFO - Run #6: 378 ms
2025-01-02 17:27:45 [main] INFO - Run #7: 381 ms
2025-01-02 17:27:45 [main] INFO - Run #8: 245 ms
2025-01-02 17:27:45 [main] INFO - Run #9: 244 ms
2025-01-02 17:27:45 [main] INFO - Run #10: 259 ms
2025-01-02 17:27:45 [main] INFO - Summary: mean=636.7ms, stddev=842.4046592938575ms, n=10 {meanTimeInMillis=636.7, stdDevInMillis=842.4046592938575}
2025-01-02 17:27:45 [main] INFO - Timing local read
2025-01-02 17:27:46 [main] INFO - Run #1: 22 ms
2025-01-02 17:27:46 [main] INFO - Run #2: 22 ms
2025-01-02 17:27:46 [main] INFO - Run #3: 21 ms
2025-01-02 17:27:46 [main] INFO - Run #4: 20 ms
2025-01-02 17:27:46 [main] INFO - Run #5: 20 ms
2025-01-02 17:27:46 [main] INFO - Run #6: 19 ms
2025-01-02 17:27:46 [main] INFO - Run #7: 21 ms
2025-01-02 17:27:46 [main] INFO - Run #8: 16 ms
2025-01-02 17:27:46 [main] INFO - Run #9: 21 ms
2025-01-02 17:27:46 [main] INFO - Run #10: 21 ms
2025-01-02 17:27:46 [main] INFO - Summary: mean=20.3ms, stddev=1.676305461424021ms, n=10 {meanTimeInMillis=20.3, stdDevInMillis=1.676305461424021}
2025-01-02 17:27:46 [main] INFO - Timing tests complete
So the time on the file share is really similar to the time taken on reading from a local file. Except for the first call. It does rule out blob store, looks fairly conclusive that that won't be faster.
But that suggests that in a file store setting the file share is missing the cache, as we don't have the same performance as we are seeing here on subsequent calls. That is a little disappointing.