You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there is a limit on how may partitions are supported during multipartition scan. If we increase the limit then will degrade the performance. Can we start thinking about how far we can go without degrading performance or cause issues? Also can we have a plan to add more tasks to get more cores during multipartition scan.
For example =>
0-200 or new limit --> default plan
New limit 400 --> same plan.. Some how create more tasks but fewer tasks than 5000
400 --> full table scan default behavior
The text was updated successfully, but these errors were encountered:
Basically multi partition queries always runs on one Spark partition. WE want to enable bigger multi partition queries which can spread to multiple Spark partitions without invoking filtered full table scans. This will require some intelligent logic.
…queries using new shardKeyColumns DatasetOption (#139)
* Replace chunk_size DatasetOption with shardKeyColumns; new CLI option to set during dataset creation
* feat(coordinator): Compute shardKeyHash from query filters
* Fix a MatchError found during flushing/ingestion
* Add chunk-length histogram for ChunkSink writes
* feat(cli): Add --everyNSeconds option to repeatedly query for data
* Don't read or write options column for C* datasets table - not needed anymore
* Make sure no negative watermarks are written in all cases
Currently there is a limit on how may partitions are supported during multipartition scan. If we increase the limit then will degrade the performance. Can we start thinking about how far we can go without degrading performance or cause issues? Also can we have a plan to add more tasks to get more cores during multipartition scan.
For example =>
0-200 or new limit --> default plan
New limit 400 --> same plan.. Some how create more tasks but fewer tasks than 5000
400 --> full table scan default behavior
The text was updated successfully, but these errors were encountered: