-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: Collation buffer memory leak in unicode_loose_*
vindexes in v18 and below
#14587
Comments
The fix in Shopify#131 looks correct. Sorry you had to spend time debugging an issue we already fixed! Hopefully you can upgrade soon to a version that contains #14395, because besides not leaking memory, it is much much faster! |
For sure. We'll be pushing on upgrades in 2024. Lots of versions to catch up on. Vitess moves fast! |
PR against |
Closing this as the fix has been merged. We've also deployed the fix on our internal patched v15 release and memory use is now stable at a few hundred MB, with no outliers. |
Overview of the Issue
We saw vtgates in one of our vitess clusters using over 4 GB of memory and being OOM killed, while vtgates in other clusters were generally using less than 1 GB of memory for a similar workload. A pprof heap profile (attached below) showed that nearly all of the memory was allocated from golang.org/x/text/collate.appendPrimary and a read through the call graph and code found that it was being retained by pooledCollator.buf. The use of
sync.Pool
made the issue challenging to debug, as the pool allows objects to be garbage collected under some (apparently non-deterministic) circumstances, so we had some outliers using tons of memory then suddenly dropping back to the norm.I was able to reproduce the issue locally on v18.0.0 following the steps outlined below, and I was able to fix the issue by calling
collator.buf.Reset()
before returning the collator to the pool. e.g. Shopify#131This issue does not exist in
main
as the vindex collation logic was replaced in #14395 and the new shared buffer is reset here.I think this is worth fixing and backporting, but I'm not sure which branch to start with a PR against.
release-18.0
?pprof heap profile call graph:
Reproduction Steps
Using local examples from v18
go/vt/vtgate/vindexes/unicode.go
to not use a pool (just a single sharedpooledCollator
protected by a mutex)This ensures that reproduction isn't impacted by the nondeterministic nature of sync.Pool, i.e. https://pkg.go.dev/sync#Pool
SKIP_VTADMIN=true ./101_initial_cluster.sh
run_queries.go
Binary Version
vtgate version Version: 18.0.0 (Git revision 9a6f5262f7707ff80ce85c111d2ff686d85d29cc branch 'HEAD') built on Wed Nov 22 19:45:23 UTC 2023 by spin@localhost using go1.21.4 linux/amd64
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: