-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache,db: de-dup concurrent attempts to read the same block #4157
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lacks tests -- I would like an opinion on the approach and interfaces before adding those.
Reviewable status: 0 of 8 files reviewed, all discussions resolved (waiting on @RaduBerinde)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
sstable/reader.go
line 525 at r1 (raw file):
var err error var errorDuration time.Duration ch, errorDuration, err = crh.WaitForReadPermissionOrHandle(ctx)
Have you considered a GetOrPopulate
API where we pass a func that does the actual read (or we can define a simple interface that *Reader
implements)? Then the new logic would be internal to GetOrPopulate
. All the caller cares is that at the end we either get a BufferHandle or an error. We can keep the simple Get
for the buffer pool case.
40fbe2c
to
23f4b7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)
sstable/reader.go
line 525 at r1 (raw file):
Previously, RaduBerinde wrote…
Have you considered a
GetOrPopulate
API where we pass a func that does the actual read (or we can define a simple interface that*Reader
implements)? Then the new logic would be internal toGetOrPopulate
. All the caller cares is that at the end we either get a BufferHandle or an error. We can keep the simpleGet
for the buffer pool case.
I didn't seriously consider it since there is a lot of shared code in readBlockInternal
but there is also enough branching for the cache.ReadHandle
case. We would want to write one func that does the actual read in both cases. So it would need to have the branching too. It would avoid the need for the following blocks which precede some return statements:
if crh.Valid() {
crh.SetReadError(err)
}
I've moved those into a defer call here, so that by itself is not a big motivator for GetOrPopulate
.
I've now done an implementation of readBlockInternal
(readBlockInternal2), using (an unimplemented) GetOrPopulate
in sumeerbhola@bac895b. PTAL. By itself, it isn't simpler IMO. But it does hide ReadHandle
so it is perhaps compelling enough. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 3 unresolved discussions (waiting on @jbowens and @sumeerbhola)
sstable/reader.go
line 525 at r1 (raw file):
Previously, sumeerbhola wrote…
I didn't seriously consider it since there is a lot of shared code in
readBlockInternal
but there is also enough branching for thecache.ReadHandle
case. We would want to write one func that does the actual read in both cases. So it would need to have the branching too. It would avoid the need for the following blocks which precede some return statements:if crh.Valid() { crh.SetReadError(err) }
I've moved those into a defer call here, so that by itself is not a big motivator for
GetOrPopulate
.I've now done an implementation of
readBlockInternal
(readBlockInternal2), using (an unimplemented)GetOrPopulate
in sumeerbhola@bac895b. PTAL. By itself, it isn't simpler IMO. But it does hideReadHandle
so it is perhaps compelling enough. Thoughts?
Yeah, that looks worse.. I was envisioning that the function it calls would be a separate method, but I haven't considered all the details.
Why do we need a separate WaitForReadPermissionOrHandle
? Can't this happen inside GetWithReadHandle
- it could return either the data from the cache (regardless if we had to wait for it), or the handle that we will use to populate the cache after we do the read. I think only the first to create the entry should do the read, and anyone who comes in after should wait.
CC @jbowens as well
sstable/reader.go
line 505 at r2 (raw file):
if env.BufferPool == nil { ch, crh = r.cacheOpts.Cache.GetWithReadHandle( r.cacheOpts.CacheID, r.cacheOpts.FileNum, bh.Offset, r.loadBlockSema)
Why do we need to move the semaphore acquire into the cache code? That should just happen right before we issue the read, with the expectation that outside of strange conditions it won't do anything (and if it does, it should just behave like a slower IO). We can treat a ctx cancelation error the same way we would treat an IO error.
sstable/reader.go
line 552 at r2 (raw file):
// INVARIANT: !ch.Valid(). compressed := block.Alloc(int(bh.Length+block.TrailerLen), env.BufferPool)
It would be cleaner to separate out the code starting here into a method that does the actual read and returns the decompressed
block handle. We can SetReadError
more cleanly after that call (this is error prone if someone adds an if err := ...
case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)
internal/cache/read_shard.go
line 92 at r2 (raw file):
NB: we cannot place the loadBlockSema in the readShard since the readShard can be shared across the DBs, just like the block cache.
Should we make the load block semaphore shared across DBs, like the block cache? The primary motivation was preventing excessive memory utilization from many concurrent reads, right? And memory is a shared resource across DBs.
That would have been easier to implement, but there was concern that one store that is broken and has very slow IOs will block out all other stores. |
23f4b7f
to
5316806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the logic based on the comments. I'll start adding tests if this seems reasonable.
Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @jbowens and @RaduBerinde)
internal/cache/read_shard.go
line 92 at r2 (raw file):
Previously, jbowens (Jackson Owens) wrote…
NB: we cannot place the loadBlockSema in the readShard since the readShard can be shared across the DBs, just like the block cache.
Should we make the load block semaphore shared across DBs, like the block cache? The primary motivation was preventing excessive memory utilization from many concurrent reads, right? And memory is a shared resource across DBs.
As @RaduBerinde mentioned, this choice was made for performance isolation of stores.
sstable/reader.go
line 525 at r1 (raw file):
Why do we need a separate
WaitForReadPermissionOrHandle
? Can't this happen insideGetWithReadHandle
.
Yes, it can. It involves passing the context and needs additional return values. I've made the change.
sstable/reader.go
line 505 at r2 (raw file):
Previously, RaduBerinde wrote…
Why do we need to move the semaphore acquire into the cache code? That should just happen right before we issue the read, with the expectation that outside of strange conditions it won't do anything (and if it does, it should just behave like a slower IO). We can treat a ctx cancelation error the same way we would treat an IO error.
Initially it felt cleaner to move all the waiting in one place. And it required less explanation about the contract. But given that we need to still do semaphore acquisition for the BufferPool
case, I've moved it back.
The contract comment now says:
// The caller must immediately start doing a read, or can first wait on a
// shared resource that would also block a different reader if it was assigned
// the turn instead (specifically, this refers to Options.LoadBlockSema).
sstable/reader.go
line 552 at r2 (raw file):
Previously, RaduBerinde wrote…
It would be cleaner to separate out the code starting here into a method that does the actual read and returns the
decompressed
block handle. We canSetReadError
more cleanly after that call (this is error prone if someone adds anif err := ...
case).
Good point. Done
Concurrent reads of the same block have been observed to cause very high memory usage, and cause significant CPU usage for allocations/deallocations. We now coordinate across multiple concurrent attempts to read the same block via a readEntry, which makes the readers take turns until one succeeds. The readEntries are embedded in a map that is part of a readShard, where there is a readShard for each cache.Shard. See the long comment in the readShard declaration for motivation. The Options.LoadBlockSema is integrated into the readEntry, to simplify the waiting logic in the caller. Callers interact with this new behavior via Cache.GetWithReadHandle, which is only for callers that intend to do a read and then populate the cache. If this method returns a ReadHandle, the caller has permission to do a read. See the ReadHandle comment for details of the contract. Fixes cockroachdb#4138
5316806
to
bf70930
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach looks good to me. I will make another pass in more detail but you can start adding tests.
Reviewable status: 0 of 8 files reviewed, 5 unresolved discussions (waiting on @jbowens and @sumeerbhola)
internal/cache/clockpro.go
line 206 at r3 (raw file):
// etHot. But etTest is "colder" than etCold, since the only transition // into etTest is etCold => etTest, so since etTest transitions to // etHot, then etCold should also transition.
I think you're right. The paper says that if the page isn't in the list it is added as cold (the e == nil
case) but if it is "the faulted page turns into a hot page and is placed at the head of the list"
Maybe file an issue? We'd want it in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 8 files reviewed, 5 unresolved discussions (waiting on @jbowens and @RaduBerinde)
internal/cache/clockpro.go
line 206 at r3 (raw file):
Previously, RaduBerinde wrote…
I think you're right. The paper says that if the page isn't in the list it is added as cold (the
e == nil
case) but if it is "the faulted page turns into a hot page and is placed at the head of the list"Maybe file an issue? We'd want it in a separate PR
Concurrent reads of the same block have been observed to cause very high memory usage, and are wasteful of disk bandwidth. We now coordinate across multiple concurrent attempts to read the same block via a readEntry, which makes the readers take turns until one succeeds.
The readEntries are embedded in a map that is part of a readShard, where there is a readShard for each cache.Shard. See the long comment in the readShard declaration for motivation. The Options.LoadBlockSema is integrated into the readEntry, to simplify the waiting logic in the caller.
Callers interact with this new behavior via Cache.GetWithReadHandle, which is only for callers that intend to do a read and then populate the cache. If this method returns a ReadHandle, the caller must first wait for permission to do a read. See the ReadHandle comment for details of the contract.
Fixes #4138