-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(v2): read path improvements #3675
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant work, Aleks! Thank you so much for solving this!
# See https://github.com/grpc/grpc-go/issues/7090 for more information | ||
- grpc.Dial(.*) is deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 I'll check other places
api/query/v1/query.proto
Outdated
message QueryNode { | ||
QueryNodeType type = 1; | ||
repeated QueryNode children = 2; | ||
repeated metastore.v1.BlockMeta blocks = 3; | ||
} | ||
|
||
enum QueryNodeType { | ||
MERGE = 0; | ||
READ = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The official guide prescribes to name the options in a cumbersome way. I propose to name enums in the way we want (but keeping the UNSPECIFIED
as 0 value) – READ
and MERGE
work perfectly here, so we use them. I would also add UNSPECIFIED
.
Also, you likely know this, but just in case: there's a nice trick for "local" enums, that makes things a little less verbose. I'm planning to do that with ReportType
and QueryType
(or get rid of the type prefixes, at least).
message QueryNode {
Type type = 1;
repeated QueryNode children = 2;
repeated metastore.v1.BlockMeta blocks = 3;
enum Type {
UNKNOWN = 0;
MERGE = 1;
READ = 2;
}
}
// create leaf nodes and spread the blocks in a uniform way | ||
var leafNodes []*queryv1.QueryNode | ||
for i := 0; i < len(blocks); i += maxReads { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an optimisation we can make later:
The breakdown is not entirely uniform, as is also the case in the source version. If we have maxReads
= 20, and 101 blocks, the last node will have just 1 block.
I had this snippet (but I'm sure there's a better way):
func Split[S ~[]E, E any](slice S, batchSize int) []S {
size := len(slice)
if size == 0 || batchSize < 1 {
return nil
}
n := int(math.Ceil(float64(size) / float64(batchSize)))
base := size / n
remainder := size % n
batches := make([]S, 0, n)
var offset int
for i := 0; i < n; i++ {
end := offset + base
if i < remainder {
// Distribute remaining elements among the first few batches.
end++
}
batches = append(batches, slice[offset:end])
offset = end
}
return batches
}
data []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
batch 3
Batch 1: [1 2 3]
Batch 2: [4 5 6]
Batch 3: [7 8]
Batch 4: [9 10]
Probably the same for merge nodes, as far as I understand.
444ec8f
to
5e421f0
Compare
Addresses a few issues encountered in the read path for v2: