Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free CachePeer from digest locking/unlocking #241

Closed
wants to merge 19 commits into from

Conversation

eduard-bagdasaryan
Copy link

@eduard-bagdasaryan eduard-bagdasaryan commented Dec 11, 2023

This improvement (introduced at Bug 5329 fix) completes the transition
from cbdata abuse for digest refcounting to CachePeer as a digest owner:
An owner does not lock/unlock the object; an owner creates/deletes it.
Establishing an "every digest has a peer owner" relationship simplifies
code and, hopefully, reduces the probability of similar future bugs.

Copy link

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we have to remove more code if we are going to fix/change digest-peer relationship. I left a few suggestions.

Please also sync peerDigestPDFinish() description -- that function no longer destroys the digest.

Comment on lines 184 to 185
if (pd->peer.set() && !pd->peer.valid())
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is no longer possible for peerDigestCheck() to be called when there is no peer or when the peer became invalid, right? PeerDigest is born with a valid peer. When that peer disappears, the peer destroys the corresponding PeerDigest object, so we will never be here (and would not be able to check anything inside that destroued PeerDigest object if we were!).

Suggested change
if (pd->peer.set() && !pd->peer.valid())
return;
Assure(pd->peer.valid());

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, since now CachePeer owns its PeerDigest (and encompasses its lifetime).

Comment on lines 187 to 188
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil());
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime <<

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the new "every digest has a peer" assertion suggested in another change request is correct:

Suggested change
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil());
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime <<
debugs(72, 3, "cache_peer " << *pd->peer << " now: " << squid_curtime <<

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rousskov rousskov changed the title Bug 5329: cbdata.cc:276 "c->locks > 0" assertion on peer reconfiguration Bug 5329: cbdata.cc:276 "c->locks > 0" assertion on reconfigure Dec 11, 2023
Copy link
Author

@eduard-bagdasaryan eduard-bagdasaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (061be75).

Comment on lines 184 to 185
if (pd->peer.set() && !pd->peer.valid())
return;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, since now CachePeer owns its PeerDigest (and encompasses its lifetime).

Comment on lines 187 to 188
debugs(72, 3, "cache_peer " << RawPointer(pd->peer).orNil());
debugs(72, 3, "peerDigestCheck: time: " << squid_curtime <<
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, thank you.

@@ -349,7 +327,7 @@ peerDigestHandleReply(void *data, StoreIOBuffer receivedData)
return;
}

assert(fetch->pd);
assert(fetch->pd.raw());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove because this function has no code dereferencing fetch->pd?

Suggested change
assert(fetch->pd.raw());

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, thank you.


/* schedule next check if peer is still out there */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "peer is still out there" condition is always true here since pd is not nil in this context.

Suggested change
/* schedule next check if peer is still out there */

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -733,29 +684,22 @@ peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err)
pd->stats.recv.msgs += fetch->recv.msg;

if (err) {
debugs(72, DBG_IMPORTANT, "" << (pcb_valid ? "temporary " : "" ) << "disabling (" << pd->req_result << ") digest from " << host);
debugs(72, DBG_IMPORTANT, "disabling (" << pd->req_result << ") digest from " << pd->host);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere in this method, if possible, please undo pd-> addition for host. AFAICT, we do not need to add pd-> to preserve the old functionality here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I got confused with this 'pd' diff reducer...

@@ -676,28 +635,24 @@ peerDigestFetchAbort(DigestFetchState * fetch, char *buf, const char *reason)
{
assert(reason);
debugs(72, 2, "peerDigestFetchAbort: peer " << fetch->pd->host << ", reason: " << reason);
peerDigestReqFinish(fetch, buf, 1, 1, 1, reason, 1);
peerDigestReqFinish(fetch, buf, reason, 1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove buf argument as well. It is unused AFAICT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and following the caller chain I had to remove it in other functions as well.

Copy link
Author

@eduard-bagdasaryan eduard-bagdasaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (4c5f493).

@@ -676,28 +635,24 @@ peerDigestFetchAbort(DigestFetchState * fetch, char *buf, const char *reason)
{
assert(reason);
debugs(72, 2, "peerDigestFetchAbort: peer " << fetch->pd->host << ", reason: " << reason);
peerDigestReqFinish(fetch, buf, 1, 1, 1, reason, 1);
peerDigestReqFinish(fetch, buf, reason, 1);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and following the caller chain I had to remove it in other functions as well.


/* schedule next check if peer is still out there */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -733,29 +684,22 @@ peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err)
pd->stats.recv.msgs += fetch->recv.msg;

if (err) {
debugs(72, DBG_IMPORTANT, "" << (pcb_valid ? "temporary " : "" ) << "disabling (" << pd->req_result << ") digest from " << host);
debugs(72, DBG_IMPORTANT, "disabling (" << pd->req_result << ") digest from " << pd->host);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I got confused with this 'pd' diff reducer...

Copy link

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, thank you. I hope we are almost done here.


if (fcb_valid)
peerDigestFetchFinish(fetch, err);
peerDigestFetchFinish(fetch, err);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible:

  • Please move statistics update from peerDigestFetchFinish() to peerDigestFetchSetStats().
  • Please move StoreEntry updates from peerDigestFetchFinish() to DigestFetchState destructor.

After that, we should be able to remove peerDigestFetchFinish() and simply do this:

Suggested change
peerDigestFetchFinish(fetch, err);
delete fetch;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

static void
peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err)
void
PeerDigest::finish(DigestFetchState * const fetch, const int err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
PeerDigest::finish(DigestFetchState * const fetch, const int err)
PeerDigest::noteFetchFinished(DigestFetchState * const fetch, const int err)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -705,26 +662,16 @@ peerDigestReqFinish(DigestFetchState * fetch, char * /* buf */,
pd->times.retry_delay = 0;
peerDigestSetCheck(pd, peerDigestNewDelay(fetch->entry));
}
pd->finish(fetch, err);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the whole if statement code above inside PeerDigest::noteFetchFinished(). We will only check for pd validity in this function and then call noteFetchFinished() to do the rest.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

@eduard-bagdasaryan eduard-bagdasaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (4db32d7).

static void
peerDigestPDFinish(DigestFetchState * fetch, int pcb_valid, int err)
void
PeerDigest::finish(DigestFetchState * const fetch, const int err)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


if (fcb_valid)
peerDigestFetchFinish(fetch, err);
peerDigestFetchFinish(fetch, err);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -705,26 +662,16 @@ peerDigestReqFinish(DigestFetchState * fetch, char * /* buf */,
pd->times.retry_delay = 0;
peerDigestSetCheck(pd, peerDigestNewDelay(fetch->entry));
}
pd->finish(fetch, err);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would an absolute minimal bug 5329 fix look like? My understanding is that it should be very small -- 10-15 changed lines -- but our confidence in its completeness will be fairly low because we would not be sure that some other locking dependencies are not mistreated. If I am right, please post a separate internal minimal PR with just that fix, without any "confidence boosting" and polishing changes.

PR description: This fix does not restore the lost lock but removes the corresponding unlock from the CachePeer destructor

If restoring that lost lock is the correct minimal fix, then it is OK to restore that lock (instead of removing the corresponding unlock from the CachePeer destructor).

To avoid misunderstanding, we will still propose all the other changes in this PR, even if we propose a minimal bug-fixing PR first.


assert(pd);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can restore this assertion because peerDigestFetchedEnough() above guarantees that the PeerDigest object is still valid.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's restore it. However I thought that we decided minimize these 'assert(pd)' throughout the code, relying on peerDigestFetchedEnough() guarantees of existing pd.

{
DigestFetchState *fetch = (DigestFetchState *)data;
PeerDigest *pd;
const auto pd = fetch->pd.get();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(pd);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Note that there is also another case in peerDigestFetchReply(), where fetch->pd goes unchecked... Before our changes there was an assertion in peerDigestHandleReply() (peerDigestFetchReply() caller), but we removed that assertion.

Copy link

@rousskov rousskov Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That peerDigestFetchReply() use case is fine, I think: We are allowed to dereference fetch->pd without checking right after a peerDigestFetchedEnough() call because we know that peerDigestFetchedEnough() will protect us. This peerDigestSwapInMask() check is different because we dereference fetch->pd before calling peerDigestFetchedEnough()1.

Needless to say, the whole thing is a big mess. We can probably simplify by checking for pd validity at the beginning of each fetching callback, never destroying pd in any fetching callbacks, and never checking pd anywhere else in fetching callbacks. However, that simplification spills into splitting peerDigestFetchedEnough() from "a universal check that we do not really know when to call because our state always changes as we make progress through callback code" to a set of change-specific checks. For now, let's not make those changes. We will revisit this when we know how the minimal fix looks like.

Footnotes

  1. A peerDigestFetchedEnough() call in our caller does not count, especially when our caller is looping and may reach "fetched enough" state inside that loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (873e62).

@eduard-bagdasaryan
Copy link
Author

If restoring that lost lock is the correct minimal fix, then it is OK to restore that lock

Yes, is the smallest possible fix: done in PR242. I tested it with the scenario described at Bug 5329 report.

The PR code already removed the corresponding unlock - the
lock was brought in by the recent 4657405 after merging.
@eduard-bagdasaryan eduard-bagdasaryan changed the title Bug 5329: cbdata.cc:276 "c->locks > 0" assertion on reconfigure Free CachePeer from digest locking/unlocking Jan 11, 2024
@eduard-bagdasaryan
Copy link
Author

Extracted minimal changes into PR244. The initial bug was fixed at 465740.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants