Fix `buffer_drain_concurrency` not doing anything #15042

rbranson · 2024-01-26T00:38:50Z

Description

This is based on a PR from @arthurschreiber with some additional changes based on the review feedback from @vmg. #14545

From the previous PR:

As described in #11684, the --buffer_drain_concurrency CLI argument to vtgate does not actually do anything.

This pull request implements the logic required to make this flag actually do something. 😬 When the buffer is drained, we now spawn as many goroutines as specified by --buffer_drain_concurrency to drain the buffer in parallel.

I don't think introducing a new flag as described in #11684 actually makes sense - instead I propose we mention this in the v19 changelog that this flag is now doing something, and don't backport the change to any earlier releases.

Related Issue(s)

Fixes Bug Report: buffer_drain_concurrency flag does nothing #11684

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Signed-off-by: Arthur Schreiber <[email protected]>

vitess-bot · 2024-01-26T00:38:53Z

codecov · 2024-01-26T01:46:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (main@81777e5). Click here to learn what that means.

Additional details and impacted files

@@           Coverage Diff           @@
##             main   #15042   +/-   ##
=======================================
  Coverage        ?   47.69%           
=======================================
  Files           ?     1155           
  Lines           ?   240139           
  Branches        ?        0           
=======================================
  Hits            ?   114526           
  Misses          ?   117012           
  Partials        ?     8601

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

go/vt/vtgate/buffer/buffer_test.go

go/vt/vtgate/buffer/shard_buffer.go

rbranson · 2024-01-26T21:26:29Z

The previous implementation, while clever, had a HoLB issue where slow requests could result in a long tail wait at the end of the drain process because the division of labor among the workers is done equally.

This new implementation is much smaller and still avoids the overhead and extra logic of pumping a channel to distribute work. Under the non-concurrent case, the overhead of the atomic increment (versus a loop increment) is extremely negligible, like 10-20 cycles, so IMO it's not really worth special casing the non-concurrent case.

Additionally, the abstraction and unit test coverage on parallelRangeIndex gives me enough confidence in the way the actual work distribution is handled that I don't think we need another set of tests for various buffer sizes.

mattlord

LGTM! Nice work on this @rbranson !

So far I only had some minor questions/nits/suggestions that are largely a matter of personal preference — so you can address those as you feel best. 🙂

The only requested change that I feel we really should make is this... We have a number of endtoend tests for the buffering here: https://github.com/vitessio/vitess/tree/main/go/test/endtoend/tabletgateway/buffer

We should set the buffer_drain_concurrency to 4 or 8 in at least some of those tests to add additional test coverage and verify the behavior. Please let me know if you see any issues with that or could use any help.

mattlord · 2024-01-27T17:55:21Z

go/vt/vtgate/buffer/buffer_test.go

+
+func TestBufferingConcurrent(t *testing.T) {
+	testAllImplementations(t, func(t *testing.T, fail failover) {
+		testBuffering1WithOptions(t, fail, 2)


IMO it would be better to use something like 8 here.

disagree because the buffers in the unit tests aren't large enough to create an actual concurrent drain at 8 in many cases.

I'm not sure why that matters but 2 is also fine (I was errantly thinking the default for the flag was 8 when I started the review). IF buffering fewer connections than the concurrency value is set to IS a problem -- then that's a problem we should address.

mattlord · 2024-01-27T17:57:13Z

go/vt/vtgate/buffer/buffer_test.go

+			var mu sync.Mutex
+			var wg sync.WaitGroup
+			var counter atomic.Int64


Extremely nitty, but we could use a single var block here.

mattlord · 2024-01-27T18:04:40Z

go/vt/vtgate/buffer/buffer_test.go

+						mu.Lock()
+						got = append(got, idx)
+						mu.Unlock()


Also annoyingly nitty, but IMO it's nicer to use a closure here:

appendVal := func(val int) { mu.Lock() defer mu.Unlock() got = append(got, val) } for i := 0; i < tc.concurrency; i++ { go func() { defer wg.Done() for { idx, ok := parallelRangeIndex(&counter, tc.max) if !ok { return } appendVal(idx)

mattlord · 2024-01-27T18:10:00Z

go/vt/vtgate/buffer/buffer_test.go

+			var counter atomic.Int64
+
+			wg.Add(tc.concurrency)
+			got := []int{}


Annoyingly nitty, but we can pre-allocate the space one time:

got := make([]int, 0, len(tc.calls))

Yeah our style guide recommends either an empty variable (var got []int) or explicitly allocating the slice. Either way is fine.

mattlord · 2024-01-27T18:14:57Z

go/vt/vtgate/buffer/buffer_test.go

+					for {
+						idx, ok := parallelRangeIndex(&counter, tc.max)
+						if !ok {
+							break


Any reason not to return here?

mattlord · 2024-01-27T18:17:00Z

go/vt/vtgate/buffer/shard_buffer.go

+	// if this is a 32-bit platform, max won't exceed the 32-bit integer limit
+	// so a cast from a too-large 64-bit int to a 32-bit int will never happen
+	return int(next) - 1, true


Any reason not to use int64 so that the behavior is deterministic / platform independent and we don't need to cast?

slice indexes are already "int" (would have to be casted later) and the issue is mitigated internally in the function, which I think is better than forcing the caller to deal with it

mattlord · 2024-01-27T18:18:07Z

go/vt/vtgate/buffer/shard_buffer.go

+			for {
+				idx, ok := parallelRangeIndex(&rangeCounter, entryCount-1)
+				if !ok {
+					break


Any reason not to return here?

break is exactly what I want here. I want the loop iteration to end. in this instance they are functionally equivalent, but "return" requires modifying the function to place any code after the loop.

Another way to put it, IF we want to exit the goroutine and return when parallelRangeIndex returns false, then IMO we should explicitly do so. That can always be changed if that's not what we want.

This is also an example of why comments are nice. Why would or wouldn't we want to return there? What might we want to do outside of the for loop? My reading of it was that we want to loop forever or until parallelIRangeIndex tells us to stop -- and that's the only purpose of the goroutine. So it caught my eye that we weren't explicitly ending the goroutine when it did.

mattlord · 2024-01-27T18:35:02Z

Please note the current DCO failure as well: https://github.com/vitessio/vitess/pull/15042/checks?check_run_id=20915653435

vmg

Looking good @rbranson, thanks for contributing to Vitess ;)

vmg · 2024-01-29T10:44:10Z

go/vt/vtgate/buffer/buffer_test.go

+			var counter atomic.Int64
+
+			wg.Add(tc.concurrency)
+			got := []int{}


Yeah our style guide recommends either an empty variable (var got []int) or explicitly allocating the slice. Either way is fine.

ajm188 · 2024-01-29T14:13:31Z

go/vt/vtgate/buffer/buffer_test.go

+
+	for idx, tc := range suite {
+		name := fmt.Sprintf("%d_max%d_concurrency%d", idx, tc.max, tc.concurrency)
+		t.Run(name, func(t *testing.T) {


small issue around variable shadowing here (for tc): https://stackoverflow.com/a/33459174

@ajm188 not sure I follow. the tests are working properly. the t.Run call and goroutines below terminate before the next loop iteration.

ajm188 · 2024-01-29T14:13:45Z

go/vt/vtgate/buffer/buffer_test.go

+						mu.Lock()
+						got = append(got, idx)
+						mu.Unlock()


Signed-off-by: Rick Branson <[email protected]>

rbranson · 2024-01-29T17:59:52Z

Had to squash to fix the commit signing issue

Signed-off-by: Rick Branson <[email protected]>

mattlord

@rbranson Thanks for adding the e2e test change! That was the only blocker for me.

Thanks again for the contribution! 😃

Fix buffer_drain_concurrency not doing anything.

f020258

Signed-off-by: Arthur Schreiber <[email protected]>

github-actions bot added this to the v19.0.0 milestone Jan 26, 2024

rbranson marked this pull request as ready for review January 26, 2024 01:56

rbranson requested review from harshit-gangal, systay, frouioui and GuptaManan100 as code owners January 26, 2024 01:56

deepthi added Type: Bug Component: Query Serving and removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jan 26, 2024

deepthi reviewed Jan 26, 2024

View reviewed changes

go/vt/vtgate/buffer/buffer_test.go Outdated Show resolved Hide resolved

go/vt/vtgate/buffer/shard_buffer.go Outdated Show resolved Hide resolved

deepthi requested review from ajm188 and removed request for systay and frouioui January 26, 2024 04:31

mattlord requested changes Jan 27, 2024

View reviewed changes

mattlord removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Jan 27, 2024

vmg approved these changes Jan 29, 2024

View reviewed changes

ajm188 reviewed Jan 29, 2024

View reviewed changes

Switch from channel pumping to slice ranging

a7bb30f

Signed-off-by: Rick Branson <[email protected]>

rbranson force-pushed the arthur/fix-buffer-drain-concurrency branch from bdc573c to a7bb30f Compare January 29, 2024 17:59

rbranson requested a review from mattlord January 29, 2024 19:28

Add concurrent buffer drain to E2E tests

1bda531

Signed-off-by: Rick Branson <[email protected]>

mattlord approved these changes Jan 29, 2024

View reviewed changes

mattlord merged commit c81a791 into vitessio:main Jan 30, 2024
102 checks passed

rbranson deleted the arthur/fix-buffer-drain-concurrency branch January 30, 2024 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `buffer_drain_concurrency` not doing anything #15042

Fix `buffer_drain_concurrency` not doing anything #15042

rbranson commented Jan 26, 2024 •

edited

Loading

vitess-bot bot commented Jan 26, 2024 •

edited by mattlord

Loading

codecov bot commented Jan 26, 2024 •

edited

Loading

rbranson commented Jan 26, 2024

mattlord left a comment

mattlord Jan 27, 2024

rbranson Jan 29, 2024

mattlord Jan 29, 2024 •

edited

Loading

mattlord Jan 27, 2024

mattlord Jan 27, 2024 •

edited

Loading

ajm188 Jan 29, 2024

mattlord Jan 27, 2024

vmg Jan 29, 2024

mattlord Jan 27, 2024

mattlord Jan 27, 2024

rbranson Jan 29, 2024

mattlord Jan 27, 2024

rbranson Jan 29, 2024

mattlord Jan 29, 2024 •

edited

Loading

mattlord commented Jan 27, 2024

vmg left a comment

vmg Jan 29, 2024

ajm188 Jan 29, 2024

rbranson Jan 29, 2024

ajm188 Jan 29, 2024

rbranson commented Jan 29, 2024

mattlord left a comment

Fix buffer_drain_concurrency not doing anything #15042

Fix buffer_drain_concurrency not doing anything #15042

Conversation

rbranson commented Jan 26, 2024 • edited Loading

Description

Related Issue(s)

Checklist

vitess-bot bot commented Jan 26, 2024 • edited by mattlord Loading

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

codecov bot commented Jan 26, 2024 • edited Loading

Codecov Report

rbranson commented Jan 26, 2024

mattlord left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattlord Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

mattlord commented Jan 27, 2024

vmg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbranson commented Jan 29, 2024

mattlord left a comment

Choose a reason for hiding this comment

Fix `buffer_drain_concurrency` not doing anything #15042

Fix `buffer_drain_concurrency` not doing anything #15042

rbranson commented Jan 26, 2024 •

edited

Loading

vitess-bot bot commented Jan 26, 2024 •

edited by mattlord

Loading

codecov bot commented Jan 26, 2024 •

edited

Loading

mattlord Jan 29, 2024 •

edited

Loading

mattlord Jan 27, 2024 •

edited

Loading

mattlord Jan 29, 2024 •

edited

Loading