Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCIP-1730 Implementing Healthy function in LogPoller #584
CCIP-1730 Implementing Healthy function in LogPoller #584
Changes from 6 commits
0b642ea
b5ca33d
2841718
8e60ec1
b3f1c9b
104eaa1
48d2284
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good overall but what's the difference between
Ready
andHealthy
in this context? We already have aHealthReport
method that callslp.Healthy()
but since that's not implemented on the LP itself it calls it on theStateMachine
object from chainlink-common which is embedded in the LP struct.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The added "benefit" of implementing
Healthy
is thatHealthReport
will return a meaningful report now since we return whether finality has been violated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind I'm not 100% sure how these functions are consumed outside of this particular context we want to consume them in now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice catch, I wasn't aware of
healthy
function when going through the interfaces. @reductionista could you please advise which is better for this case?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe healthy/ready are inherited from k8s probes. If not ready k8s won't send traffic yet and if not healthy k8s will restart the node. We only expose the ready one to k8s directly
ccip/core/web/health_controller.go
Line 27 in 2f10153
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that
Healthy() error
is not present in LogPoller's interface, probably it has to be added first 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the
services.StateMachine
that's embedded in thelogPoller
struct - we have to override it to get useful behavior IIUC.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah you mean the
LogPoller
interface - that's a good point. We should add it there if we want clients to call it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, already added it, I'm just a bit worried about what @connorwstein said about that function being used for k8s tooling. If that's the case, then we don't want to restart pod whenever LogPoller is marked as not healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is the SME on this? Maybe someone from infra or maybe core?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think still an edge case here: say a finality violation is detected and you wipe the db. We'll hit the "first poll ever on new chain" return case here and leave the bool true. I think you want a defer where any non-error return value from this function we set to false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you suggest marking it as false at the end of
PollAndSave
tick? It would be definitely easier if errors were bubbled up, but still doableThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imagining something like this, can be a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if we can use this but the semantics seem weird, it seems like the buffer is flushed whenever you read it, so it's not really super usable for our use case.