mysql: do not allocate in parseOKPacket #15067

vmg · 2024-01-29T11:41:12Z

Description

While working on some enterprise customers, @maxenglander noticed a very significant amount of allocations from readComQueryResponse. I don't think this is the cause of the increased p99 latencies that the customer was seeing, but it is indeed wasteful and should be fixed.

The allocations happen in mysql.(*Conn).parseOKPacket, and there are two sources:

&PacketOK{} instances are being created every time the function is called, because they're returned as a pointer value. This is the 64 byte allocation group seen in the profile, 2.7GB total.
The coder struct that handles parsing of the packet is being allocated in the heap, because when the parsing fails (i.e. very rarely), the struct is passed as an %v argument to vterrors.Errorf. Since Errorf is a variadric API, its arguments are passed as interfaces (any), which forces the object to be always moved to the heap upon first instantiation. This is the 32 byte allocation group seen in the profile, 1.3GB total.

The fixes are as follows:

Since the PacketOK being returned from that function is only being used as a temporary, we can change the signature of the function to take the packet as an argument. Since we have a direct function call without interfaces in the way, this is good enough to keep the packet allocated on the stack in all the call-sites for the function.
For the coder struct, simply change the error returns of the function to receive data.data instead of the whole struct. This causes the &coder{} initialization at the start of the function to remain in the stack. Remember that the Go compiler doesn't necessarily place &var constructions directly into the heap: they can be placed on the stack if they're found not to escape.

As a result this PR removes the packet allocation for all queries (not only for queries without results), which should result in a measurable reduction in small allocations for all the benchmarks we're tracking. Furthermore, this applies anywhere where we're using the MySQL clients, not only in the tablets.

After the changes, the parseOKPacket API is gone from heap profiles, as it is now zero-allocation:

readComQueryResponse is also gone from the profile as it was calling parseOKPacket indirectly, so now this API is also zero-allocation. Another 2.6GB of memory gone.

On the arewefast benchmarks, we're once again seeing the long standing issue (which I'll eventually fix, I swear) where the changes are not directly comparable between PRs because when you reduce the memory allocations in Vitess, this results in GC changes that affect the throughput:

Here you can see that there are very significant memory savings everywhere where we're using a MySQL connection, which also reduces the CPU usage for Vitess as a whole (Total CPU time spent is down), but that results in less QPS, like we saw when introducing the new connection pool in #14034.

Related Issue(s)

performance: parseOKPacket should not allocate #15071

Checklist

"Backport to:" labels have been added if this change should be back-ported to release branches
If this change is to be back-ported to previous releases, a justification is included in the PR description
Tests were added or are not required
Did the new or modified tests pass consistently locally and on CI?
Documentation was added or is not required

Deployment Notes

Signed-off-by: Vicent Marti <[email protected]>

vitess-bot · 2024-01-29T11:41:14Z

deepthi

Can you or @maxenglander create an issue? And it will be nice to have a before/after view of allocations from any of the benchmarks added to the PR description.

go/mysql/query.go

codecov · 2024-01-29T12:35:49Z

Codecov Report

Attention: 480 lines in your changes are missing coverage. Please review.

Comparison is base (eddb39e) 47.29% compared to head (9966086) 47.65%.
Report is 71 commits behind head on main.

Files	Patch %	Lines
...vt/vtgate/planbuilder/operators/sharded_routing.go	0.00%	42 Missing ⚠️
go/vt/vtgate/planbuilder/operators/delete.go	0.00%	32 Missing ⚠️
go/vt/vtgate/evalengine/cached_size.go	0.00%	31 Missing ⚠️
.../vtgate/planbuilder/operators/delete_with_input.go	0.00%	31 Missing ⚠️
go/vt/vtctl/workflow/traffic_switcher.go	0.00%	25 Missing ⚠️
go/vt/vtgate/engine/delete_with_input.go	53.70%	23 Missing and 2 partials ⚠️
...tgate/planbuilder/operators/aggregation_pushing.go	0.00%	25 Missing ⚠️
...vt/vtgate/planbuilder/operators/queryprojection.go	0.00%	20 Missing ⚠️
.../vt/vtgate/planbuilder/operators/query_planning.go	0.00%	19 Missing ⚠️
go/vt/vtgate/evalengine/expr_tuple_bvar.go	62.50%	14 Missing and 4 partials ⚠️
... and 48 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #15067      +/-   ##
==========================================
+ Coverage   47.29%   47.65%   +0.35%     
==========================================
  Files        1137     1151      +14     
  Lines      238684   239760    +1076     
==========================================
+ Hits       112895   114263    +1368     
+ Misses     117168   116893     -275     
+ Partials     8621     8604      -17

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

vitess-bot · 2024-01-29T13:58:51Z

Hello! 👋

This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked.

You can find the performance comparison on the arewefastyet website.

vmg · 2024-01-29T13:59:13Z

Oops forgot to tag with the benchmark label. Will post results once they're available.

frouioui · 2024-01-29T16:48:30Z

Oops forgot to tag with the benchmark label. Will post results once they're available.

Seems like the benchmarks on the base of this Pull Requests are failing. I will check what's going on in a moment.

vmg · 2024-01-29T16:51:03Z

Update: I've managed to remove another allocation besides the PacketOK allocation, so the whole API is now conveniently zero-alloc. Updated the issue description with some graphs that show it.

harshit-gangal

Nice improvement.

mysql: do not allocate PacketOK when parsing from connection

9966086

Signed-off-by: Vicent Marti <[email protected]>

vmg requested review from harshit-gangal, systay and mattlord as code owners January 29, 2024 11:41

github-actions bot added this to the v19.0.0 milestone Jan 29, 2024

dbussink approved these changes Jan 29, 2024

View reviewed changes

deepthi reviewed Jan 29, 2024

View reviewed changes

deepthi added NeedsIssue A linked issue is missing for this Pull Request NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work labels Jan 29, 2024

dbussink reviewed Jan 29, 2024

View reviewed changes

go/mysql/query.go Show resolved Hide resolved

vmg added the Benchmark me Add label to PR to run benchmarks label Jan 29, 2024

vmg changed the title ~~mysql: do not allocate PacketOK when parsing from connection~~ mysql: do not allocate in parseOKPacket Jan 29, 2024

vmg removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request labels Jan 29, 2024

harshit-gangal approved these changes Jan 29, 2024

View reviewed changes

harshit-gangal merged commit ebf7869 into vitessio:main Jan 29, 2024
106 of 110 checks passed

harshit-gangal deleted the vmg/packet-ok branch January 29, 2024 17:13

vmg mentioned this pull request Jan 30, 2024

mysql: remove more allocations from parseOKPacket #15082

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mysql: do not allocate in parseOKPacket #15067

mysql: do not allocate in parseOKPacket #15067

vmg commented Jan 29, 2024 •

edited

Loading

vitess-bot bot commented Jan 29, 2024 •

edited by deepthi

Loading

deepthi left a comment

codecov bot commented Jan 29, 2024 •

edited

Loading

vitess-bot bot commented Jan 29, 2024

vmg commented Jan 29, 2024

frouioui commented Jan 29, 2024

vmg commented Jan 29, 2024

harshit-gangal left a comment

mysql: do not allocate in parseOKPacket #15067

mysql: do not allocate in parseOKPacket #15067

Conversation

vmg commented Jan 29, 2024 • edited Loading

Description

Related Issue(s)

Checklist

Deployment Notes

vitess-bot bot commented Jan 29, 2024 • edited by deepthi Loading

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

deepthi left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 29, 2024 • edited Loading

Codecov Report

vitess-bot bot commented Jan 29, 2024

vmg commented Jan 29, 2024

frouioui commented Jan 29, 2024

vmg commented Jan 29, 2024

harshit-gangal left a comment

Choose a reason for hiding this comment

vmg commented Jan 29, 2024 •

edited

Loading

vitess-bot bot commented Jan 29, 2024 •

edited by deepthi

Loading

codecov bot commented Jan 29, 2024 •

edited

Loading