feat(v2): read path improvements #3675

aleks-p · 2024-11-12T21:22:02Z

Addresses a few issues encountered in the read path for v2:

Simplifies the query plan representation. The current approach encodes the graph for the query plan as a int32 array which makes it difficult to reason about and maintain. We now have a straightforward node graph in place.
Replaces the static concurrency limit in the query backend with the gradient2 implementation from go-concurrency-limits. This adds some elasticity to the query backend under high load and should allow for fine tuning in the future. The limiter parameters are internal for now, but might be exposed in the future.
Updates the query backend client config to reduce aggressive retries. This gives large queries a better chance of succeeding by slowing them down. Further tuning will likely be needed here in the future.

kolesnikovae

Brilliant work, Aleks! Thank you so much for solving this!

kolesnikovae · 2024-11-13T02:06:38Z

.golangci.yml

+    # See https://github.com/grpc/grpc-go/issues/7090 for more information
+    - grpc.Dial(.*) is deprecated


👍🏻 I'll check other places

kolesnikovae · 2024-11-13T03:20:59Z

api/query/v1/query.proto

+message QueryNode {
+  QueryNodeType type = 1;
+  repeated QueryNode children = 2;
+  repeated metastore.v1.BlockMeta blocks = 3;
+}
+
+enum QueryNodeType {
+  MERGE = 0;
+  READ = 1;


Nit: The official guide prescribes to name the options in a cumbersome way. I propose to name enums in the way we want (but keeping the UNSPECIFIED as 0 value) – READ and MERGE work perfectly here, so we use them. I would also add UNSPECIFIED.

Also, you likely know this, but just in case: there's a nice trick for "local" enums, that makes things a little less verbose. I'm planning to do that with ReportType and QueryType (or get rid of the type prefixes, at least).

message QueryNode { Type type = 1; repeated QueryNode children = 2; repeated metastore.v1.BlockMeta blocks = 3; enum Type { UNKNOWN = 0; MERGE = 1; READ = 2; } }

kolesnikovae · 2024-11-13T05:50:35Z

pkg/experiment/query_backend/query_plan/query_plan.go

+	// create leaf nodes and spread the blocks in a uniform way
+	var leafNodes []*queryv1.QueryNode
+	for i := 0; i < len(blocks); i += maxReads {


There's an optimisation we can make later:

The breakdown is not entirely uniform, as is also the case in the source version. If we have maxReads = 20, and 101 blocks, the last node will have just 1 block.

I had this snippet (but I'm sure there's a better way):

func Split[S ~[]E, E any](slice S, batchSize int) []S { size := len(slice) if size == 0 || batchSize < 1 { return nil } n := int(math.Ceil(float64(size) / float64(batchSize))) base := size / n remainder := size % n batches := make([]S, 0, n) var offset int for i := 0; i < n; i++ { end := offset + base if i < remainder { // Distribute remaining elements among the first few batches. end++ } batches = append(batches, slice[offset:end]) offset = end } return batches }

Example:

data []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10} batch 3 Batch 1: [1 2 3] Batch 2: [4 5 6] Batch 3: [7 8] Batch 4: [9 10]

Probably the same for merge nodes, as far as I understand.

aleks-p requested a review from a team as a code owner November 12, 2024 21:22

kolesnikovae approved these changes Nov 13, 2024

View reviewed changes

aleks-p added 13 commits November 13, 2024 09:45

Switch to random load balancing for the query client

e45c291

Try a different backoff policy and balancing mode

414f83e

Try weighted round robin

9ec2614

Test weighted round robin

5039f59

wip: use go-concurrency-limits to handle query backend concurrency

576d226

Refactor query plan

973b8d7

Wrap go-concurrency-limits, add test

a72fecc

Improve proto doc

d5215d3

Improve formatting

ebce14a

Improve code comments

4ab2730

Fix lint warning

201eff2

Run go mod tidy

8bb7671

Make query node type a local enum

5e421f0

aleks-p force-pushed the v2/improve-query-load-balance branch from 444ec8f to 5e421f0 Compare November 13, 2024 13:45

aleks-p merged commit bb45e2e into main Nov 13, 2024
18 checks passed

aleks-p deleted the v2/improve-query-load-balance branch November 13, 2024 13:57

aleks-p mentioned this pull request Nov 13, 2024

fix(v2): set grpc.WithMaxCallAttempts() for the query backend client #3682

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(v2): read path improvements #3675

feat(v2): read path improvements #3675

aleks-p commented Nov 12, 2024

kolesnikovae left a comment •

edited

Loading

kolesnikovae Nov 13, 2024

kolesnikovae Nov 13, 2024

kolesnikovae Nov 13, 2024 •

edited

Loading

		# See https://github.com/grpc/grpc-go/issues/7090 for more information
		- grpc.Dial(.*) is deprecated

feat(v2): read path improvements #3675

feat(v2): read path improvements #3675

Conversation

aleks-p commented Nov 12, 2024

kolesnikovae left a comment • edited Loading

Choose a reason for hiding this comment

kolesnikovae Nov 13, 2024

Choose a reason for hiding this comment

kolesnikovae Nov 13, 2024

Choose a reason for hiding this comment

kolesnikovae Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

kolesnikovae left a comment •

edited

Loading

kolesnikovae Nov 13, 2024 •

edited

Loading