sql: role membership cache refresh logic can cause a thundering herd of queries to system.role_members
#135931
Labels
branch-release-24.1
Used to mark GA and release blockers, technical advisories, and bugs for 24.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
P-1
Issues/test failures with a fix SLA of 1 month
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Describe the problem
When the role membership cache is invalidated, each node will need to rebuild the cache. Currently, each user whose membership is being looked up will cause a query on the
system.role_members
table. This can be seen by looking at the requestKey of thesingleflight
operation here:cockroach/pkg/sql/authorization.go
Line 641 in 89dae00
This is wasteful. Each of those queries is already scanning all the data from
system.role_members
, so we should change this so that the cache only gets populated once per node. This will change the number of internal queries needed to repopulate the cache to beO(node count)
rather thanO((node count) * (active user count))
.To Reproduce
The cache is invalidated by
CREATE ROLE
,DROP ROLE
, andGRANT role
statements, so this problem is easy to reproduce.See the escalation https://github.com/cockroachlabs/support/issues/3128 for an example.
Expected behavior
Don't have such a big latency hit.
Additional data / screenshots
See the linked ticket.
Environment:
This was reported on CockroachDB version v24.1.6, but all supported versions are affected.
Jira issue: CRDB-44792
Epic CRDB-43310
The text was updated successfully, but these errors were encountered: