Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash object names used as cache keys for CachingBucket #9968

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

56quarters
Copy link
Contributor

@56quarters 56quarters commented Nov 20, 2024

What this PR does

When caching rule groups, the base64 encoded name of the rule group is used as part of the key for caching its contents. For long rule group names, this can exceed the max key length of Memcached. To solve this we use the sha256 of the object name in the cache key instead of the object name itself.

Which issue(s) this PR fixes or relates to

Related #9386

Notes for reviewers

We're using a SHA3 hash for this since we don't want collisions and I didn't want to use the approach we take in the query-frontend: storing a "results" object with a key to disambiguate collisions. I haven't run benchmarks but I did take care to make sure that we're only doing a single hash per object storage operation.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@56quarters 56quarters changed the base branch from main to 56quarters/ha-tracker November 21, 2024 20:21
@56quarters 56quarters marked this pull request as ready for review November 21, 2024 21:08
@56quarters 56quarters requested a review from a team as a code owner November 21, 2024 21:08
@56quarters
Copy link
Contributor Author

56quarters commented Nov 21, 2024

Benchmarks! Hashing when building keys is obviously much slower than not doing that but in absolute numbers, still pretty fast compared to the network operations we'll be doing:

$ go test -bench=BenchmarkCachingKey ./pkg/storage/tsdb/bucketcache/...
goos: linux
goarch: amd64
pkg: github.com/grafana/mimir/pkg/storage/tsdb/bucketcache
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
BenchmarkCachingKey/cachingKeyObjectSubrange()_with_bucket_ID:_false-16                 17202048                68.37 ns/op
BenchmarkCachingKey/cachingKeyObjectSubrange()_with_bucket_ID:_true-16                  14326315                76.61 ns/op
BenchmarkCachingKey/cachingKeyIter()_with_bucket_ID:_false-16                           17119052                63.46 ns/op
BenchmarkCachingKey/cachingKeyIter()_with_bucket_ID:_true-16                            17986634                66.55 ns/op
BenchmarkCachingKey/cachingKeyIter()_recursive_with_bucket_ID:_false-16                 14231662                82.10 ns/op
BenchmarkCachingKey/cachingKeyIter()_recursive_with_bucket_ID:_true-16                  12631911                89.82 ns/op
BenchmarkCachingKey/cachingKeyExists()_with_bucket_ID:_false-16                         23539920                50.68 ns/op
BenchmarkCachingKey/cachingKeyExists()_with_bucket_ID:_true-16                          20476664                61.26 ns/op
BenchmarkCachingKey/cachingKeyExists()_hashed_with_bucket_ID:_false-16                   1000000              1185 ns/op
BenchmarkCachingKey/cachingKeyExists()_hashed_with_bucket_ID:_true-16                     975402              1208 ns/op
BenchmarkCachingKey/cachingKeyContent()_hashed_with_bucket_ID:_false-16                   974630              1193 ns/op
BenchmarkCachingKey/cachingKeyContent()_hashed_with_bucket_ID:_true-16                    945249              1211 ns/op
BenchmarkCachingKey/cachingKeyAttributes()_with_bucket_ID:_false-16                     23124244                55.85 ns/op
BenchmarkCachingKey/cachingKeyAttributes()_with_bucket_ID:_true-16                      16003294                67.57 ns/op
BenchmarkCachingKey/cachingKeyObjectSubrange()_hashed_with_bucket_ID:_false-16            803034              1425 ns/op
BenchmarkCachingKey/cachingKeyObjectSubrange()_hashed_with_bucket_ID:_true-16             844786              1300 ns/op
BenchmarkCachingKey/cachingKeyContent()_with_bucket_ID:_false-16                        23122924                51.71 ns/op
BenchmarkCachingKey/cachingKeyContent()_with_bucket_ID:_true-16                         19091487                58.74 ns/op
BenchmarkCachingKey/cachingKeyAttributes()_hashed_with_bucket_ID:_false-16                940980              1258 ns/op
BenchmarkCachingKey/cachingKeyAttributes()_hashed_with_bucket_ID:_true-16                 934902              1219 ns/op
PASS
ok      github.com/grafana/mimir/pkg/storage/tsdb/bucketcache   25.124s

Base automatically changed from 56quarters/ha-tracker to main November 22, 2024 17:32
When caching rule groups, the base64 encoded name of the rule group is
used as part of the key for caching its contents. For long rule group
names, this can exceed the max key length of Memcached. To solve this
we use the sha256 of the object name in the cache key instead of the
object name itself.

Related #9386
Signed-off-by: Nick Pillitteri <[email protected]>
@narqo
Copy link
Contributor

narqo commented Nov 22, 2024

That's not blocking

I'm curious if you considered blake2 for key names hashing. I assume, it's ~2x faster than sha256, and it's what is used in the indexcache package.

@56quarters
Copy link
Contributor Author

I'm curious if you considered blake2 for key names hashing. I assume, it's ~2x faster than sha256, and it's what is used in the indexcache package.

I missed that, thank you! I'll take a look at this and post some new benchmarks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants