You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that running the Search Test from the super command doc with CSUP in vector runtime consumed 75+ GB of RAM, which was enough to hang the EC2 instance with 32 GB of RAM that I'd used to successfully in the past to run the equivalent query with other RDBMS and as well as sequential super with BSUP input.
Details
Repro is with super commit 4084e01. The test data is available at s3://brim-sampledata/super-cmd-perf/gha.csup which was generated similar to BSUP as shown here, i.e.,
$ super -f csup -o gha.csup gharchive_gz/*.json.gz
The query is run like this:
$ super -version
Version: v1.18.0-206-g4084e011
$ SUPER_VAM=1 super -c "SELECT count()
FROM 'gha.csup'
WHERE grep('in case you have any feedback 😊', payload.pull_request.body)"
In the past I've run these queries successfully on an AWS EC2 m6idn.2xlarge instance which has 32 GB of RAM (but no swap), and that's always been enough resource to run the equivalent query successfully as shown in the doc on DuckDB, ClickHouse, DataFusion, and sequential super with BSUP input. Previously we'd not been able to run the query at all with super in vector runtime, but with the merge of #5523 it was time to give it a go. On the first try with the EC2 instance it consumed all the memory and hung the system.
To give it a closer look, I re-ran it on my Macbook which only has 16 GB of RAM but it does have swap. Keeping an eye on the process in Activity Monitor as it ran for a couple hours, I watched as it climbed to 75+ GB of RAM consumed before it finally did finish. Hoping to get a bit more detail for this issue, I attempted to get a memory profile on a re-run:
$ SUPER_VAM=1 super -memprofile=mem.pprof -c "SELECT count()
FROM 'gha.csup'
WHERE grep('in case you have any feedback 😊', payload.pull_request.body)"
However, at the very end after it showed the correct result a Killed: 9 appeared and my mem.pprof showed zero length, so I'm not sure what to make of that.
The text was updated successfully, but these errors were encountered:
The results in #5552 also show that the same test was able to run successfully with vector runtime within the 32 GB memory footprint if Parquet format was used instead of CSUP.
tl;dr
I found that running the Search Test from the
super
command doc with CSUP in vector runtime consumed 75+ GB of RAM, which was enough to hang the EC2 instance with 32 GB of RAM that I'd used to successfully in the past to run the equivalent query with other RDBMS and as well as sequentialsuper
with BSUP input.Details
Repro is with super commit 4084e01. The test data is available at
s3://brim-sampledata/super-cmd-perf/gha.csup
which was generated similar to BSUP as shown here, i.e.,The query is run like this:
In the past I've run these queries successfully on an AWS EC2
m6idn.2xlarge
instance which has 32 GB of RAM (but no swap), and that's always been enough resource to run the equivalent query successfully as shown in the doc on DuckDB, ClickHouse, DataFusion, and sequentialsuper
with BSUP input. Previously we'd not been able to run the query at all withsuper
in vector runtime, but with the merge of #5523 it was time to give it a go. On the first try with the EC2 instance it consumed all the memory and hung the system.To give it a closer look, I re-ran it on my Macbook which only has 16 GB of RAM but it does have swap. Keeping an eye on the process in Activity Monitor as it ran for a couple hours, I watched as it climbed to 75+ GB of RAM consumed before it finally did finish. Hoping to get a bit more detail for this issue, I attempted to get a memory profile on a re-run:
However, at the very end after it showed the correct result a
Killed: 9
appeared and mymem.pprof
showed zero length, so I'm not sure what to make of that.The text was updated successfully, but these errors were encountered: