-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free CachePeer from digest locking/unlocking #241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we have to remove more code if we are going to fix/change digest-peer relationship. I left a few suggestions.
Please also sync peerDigestPDFinish() description -- that function no longer destroys the digest.
src/peer_digest.cc
Outdated
if (pd->peer.set() && !pd->peer.valid()) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is no longer possible for peerDigestCheck() to be called when there is no peer or when the peer became invalid, right? PeerDigest is born with a valid peer. When that peer disappears, the peer destroys the corresponding PeerDigest object, so we will never be here (and would not be able to check anything inside that destroued PeerDigest object if we were!).
if (pd->peer.set() && !pd->peer.valid()) | |
return; | |
Assure(pd->peer.valid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, since now CachePeer owns its PeerDigest (and encompasses its lifetime).
src/peer_digest.cc
Outdated
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil()); | ||
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime << |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the new "every digest has a peer" assertion suggested in another change request is correct:
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil()); | |
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime << | |
debugs(72, 3, "cache_peer " << *pd->peer << " now: " << squid_curtime << |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (061be75).
src/peer_digest.cc
Outdated
if (pd->peer.set() && !pd->peer.valid()) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, since now CachePeer owns its PeerDigest (and encompasses its lifetime).
src/peer_digest.cc
Outdated
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil()); | ||
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime << |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, thank you.
src/peer_digest.cc
Outdated
@@ -349,7 +327,7 @@ peerDigestHandleReply(void *data, StoreIOBuffer receivedData) | |||
return; | |||
} | |||
|
|||
assert(fetch->pd); | |||
assert(fetch->pd.raw()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove because this function has no code dereferencing fetch->pd?
assert(fetch->pd.raw()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This became possible after we had implemented the invariant that each PeerDigest has a valid CachePeer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, thank you.
src/peer_digest.cc
Outdated
|
||
/* schedule next check if peer is still out there */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "peer is still out there" condition is always true here since pd
is not nil in this context.
/* schedule next check if peer is still out there */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
src/peer_digest.cc
Outdated
@@ -733,29 +684,22 @@ peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err) | |||
pd->stats.recv.msgs += fetch->recv.msg; | |||
|
|||
if (err) { | |||
debugs(72, DBG_IMPORTANT, "" << (pcb_valid ? "temporary " : "" ) << "disabling (" << pd->req_result << ") digest from " << host); | |||
debugs(72, DBG_IMPORTANT, "disabling (" << pd->req_result << ") digest from " << pd->host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere in this method, if possible, please undo pd->
addition for host
. AFAICT, we do not need to add pd->
to preserve the old functionality here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I got confused with this 'pd' diff reducer...
src/peer_digest.cc
Outdated
@@ -676,28 +635,24 @@ peerDigestFetchAbort(DigestFetchState * fetch, char *buf, const char *reason) | |||
{ | |||
assert(reason); | |||
debugs(72, 2, "peerDigestFetchAbort: peer " << fetch->pd->host << ", reason: " << reason); | |||
peerDigestReqFinish(fetch, buf, 1, 1, 1, reason, 1); | |||
peerDigestReqFinish(fetch, buf, reason, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove buf
argument as well. It is unused AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and following the caller chain I had to remove it in other functions as well.
Some of the functions used in peerDigestHandleReply() switch had this assertion while others did not. We can just rely on peerDigestFetchedEnough() precondition guaranteeing pd existence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (4c5f493).
src/peer_digest.cc
Outdated
@@ -676,28 +635,24 @@ peerDigestFetchAbort(DigestFetchState * fetch, char *buf, const char *reason) | |||
{ | |||
assert(reason); | |||
debugs(72, 2, "peerDigestFetchAbort: peer " << fetch->pd->host << ", reason: " << reason); | |||
peerDigestReqFinish(fetch, buf, 1, 1, 1, reason, 1); | |||
peerDigestReqFinish(fetch, buf, reason, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, and following the caller chain I had to remove it in other functions as well.
src/peer_digest.cc
Outdated
|
||
/* schedule next check if peer is still out there */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
src/peer_digest.cc
Outdated
@@ -733,29 +684,22 @@ peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err) | |||
pd->stats.recv.msgs += fetch->recv.msg; | |||
|
|||
if (err) { | |||
debugs(72, DBG_IMPORTANT, "" << (pcb_valid ? "temporary " : "" ) << "disabling (" << pd->req_result << ") digest from " << host); | |||
debugs(72, DBG_IMPORTANT, "disabling (" << pd->req_result << ") digest from " << pd->host); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I got confused with this 'pd' diff reducer...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, thank you. I hope we are almost done here.
src/peer_digest.cc
Outdated
|
||
if (fcb_valid) | ||
peerDigestFetchFinish(fetch, err); | ||
peerDigestFetchFinish(fetch, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible:
- Please move statistics update from peerDigestFetchFinish() to peerDigestFetchSetStats().
- Please move StoreEntry updates from peerDigestFetchFinish() to DigestFetchState destructor.
After that, we should be able to remove peerDigestFetchFinish() and simply do this:
peerDigestFetchFinish(fetch, err); | |
delete fetch; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/peer_digest.cc
Outdated
static void | ||
peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err) | ||
void | ||
PeerDigest::finish(DigestFetchState * const fetch, const int err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PeerDigest::finish(DigestFetchState * const fetch, const int err) | |
PeerDigest::noteFetchFinished(DigestFetchState * const fetch, const int err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/peer_digest.cc
Outdated
@@ -705,26 +662,16 @@ peerDigestReqFinish(DigestFetchState * fetch, char * /* buf */, | |||
pd->times.retry_delay = 0; | |||
peerDigestSetCheck(pd, peerDigestNewDelay(fetch->entry)); | |||
} | |||
pd->finish(fetch, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move the whole if
statement code above inside PeerDigest::noteFetchFinished(). We will only check for pd validity in this function and then call noteFetchFinished() to do the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (4db32d7).
src/peer_digest.cc
Outdated
static void | ||
peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err) | ||
void | ||
PeerDigest::finish(DigestFetchState * const fetch, const int err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/peer_digest.cc
Outdated
|
||
if (fcb_valid) | ||
peerDigestFetchFinish(fetch, err); | ||
peerDigestFetchFinish(fetch, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/peer_digest.cc
Outdated
@@ -705,26 +662,16 @@ peerDigestReqFinish(DigestFetchState * fetch, char * /* buf */, | |||
pd->times.retry_delay = 0; | |||
peerDigestSetCheck(pd, peerDigestNewDelay(fetch->entry)); | |||
} | |||
pd->finish(fetch, err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would an absolute minimal bug 5329 fix look like? My understanding is that it should be very small -- 10-15 changed lines -- but our confidence in its completeness will be fairly low because we would not be sure that some other locking dependencies are not mistreated. If I am right, please post a separate internal minimal PR with just that fix, without any "confidence boosting" and polishing changes.
PR description: This fix does not restore the lost lock but removes the corresponding unlock from the CachePeer destructor
If restoring that lost lock is the correct minimal fix, then it is OK to restore that lock (instead of removing the corresponding unlock from the CachePeer destructor).
To avoid misunderstanding, we will still propose all the other changes in this PR, even if we propose a minimal bug-fixing PR first.
|
||
assert(pd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can restore this assertion because peerDigestFetchedEnough() above guarantees that the PeerDigest object is still valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's restore it. However I thought that we decided minimize these 'assert(pd)' throughout the code, relying on peerDigestFetchedEnough() guarantees of existing pd.
src/peer_digest.cc
Outdated
{ | ||
DigestFetchState *fetch = (DigestFetchState *)data; | ||
PeerDigest *pd; | ||
const auto pd = fetch->pd.get(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert(pd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Note that there is also another case in peerDigestFetchReply(), where fetch->pd goes unchecked... Before our changes there was an assertion in peerDigestHandleReply() (peerDigestFetchReply() caller), but we removed that assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That peerDigestFetchReply() use case is fine, I think: We are allowed to dereference fetch->pd
without checking right after a peerDigestFetchedEnough() call because we know that peerDigestFetchedEnough() will protect us. This peerDigestSwapInMask() check is different because we dereference fetch->pd
before calling peerDigestFetchedEnough()1.
Needless to say, the whole thing is a big mess. We can probably simplify by checking for pd
validity at the beginning of each fetching callback, never destroying pd
in any fetching callbacks, and never checking pd
anywhere else in fetching callbacks. However, that simplification spills into splitting peerDigestFetchedEnough() from "a universal check that we do not really know when to call because our state always changes as we make progress through callback code" to a set of change-specific checks. For now, let's not make those changes. We will revisit this when we know how the minimal fix looks like.
Footnotes
-
A peerDigestFetchedEnough() call in our caller does not count, especially when our caller is looping and may reach "fetched enough" state inside that loop. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done (873e62).
Yes, is the smallest possible fix: done in PR242. I tested it with the scenario described at Bug 5329 report. |
The PR code already removed the corresponding unlock - the lock was brought in by the recent 4657405 after merging.
Extracted minimal changes into PR244. The initial bug was fixed at 465740. |
This improvement (introduced at Bug 5329 fix) completes the transition
from cbdata abuse for digest refcounting to CachePeer as a digest owner:
An owner does not lock/unlock the object; an owner creates/deletes it.
Establishing an "every digest has a peer owner" relationship simplifies
code and, hopefully, reduces the probability of similar future bugs.