-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto 8087 modular streams #11489
Auto 8087 modular streams #11489
Conversation
I see that you haven't updated any README files. Would it make sense to do so? |
498d1f9
to
948a5bd
Compare
} else if lookup.IsMercuryV03() { | ||
state, reason, values, retryable, retryInterval, err = s.v03Client.DoRequest(ctx, lookup, pluginRetryKey) | ||
if err := s.CheckCallback(ctx, values, lookup, checkResults, i); err != nil { | ||
s.lggr.Errorf("at block %d upkeep %s requested time %s CheckCallback err: %s", lookup.Block, lookup.UpkeepId, lookup.Time, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this should be errorF, since there can be user issues here but that will be seen as an error on the node. Previously these were debug logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Make sense. I will change this to s.lggr.Debugf
then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, dug further on this. There are also system errors, e.g. pack, unpack error. Are you saying we should just do Debugf here for all errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Errorf
might be appropriate here. I don't where a user issue would generate an error return from CheckCallback()
. I only see packing/unpacking and rpc issues as potential sources of error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts on your log level question...
If the err only comes from user mistake --> use Debug()
If the err only comes from system problems --> use Error()
If the err could be the result of user or system problems --> we're logging in the wrong place and need more granularity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agreed with your analysis @RyanRHall. We are keeping this as Errorf since only errors come here.
I don't where a user issue would generate an error return from CheckCallback(). I only see packing/unpacking and rpc issues as potential sources of error.
This is true after my latest update. here is the most recent commit: 2eee0e9. With the change, user issues and system errors are differentiated.
} | ||
|
||
if !needed { | ||
checkResults[i].IneligibilityReason = uint8(mercury.MercuryUpkeepFailureReasonUpkeepNotNeeded) | ||
s.lggr.Debugf("at block %d upkeep %s requested time %s callback reports upkeep not needed", lookup.Block, lookup.UpkeepId, lookup.Time) | ||
return | ||
return fmt.Errorf("at block %d upkeep %s requested time %s callback reports upkeep not needed", lookup.Block, lookup.UpkeepId, lookup.Time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll double log this as of now.
debug above
and error in the new function you introduced
we shouldn't add error logs for this purpose as that should indicate something wrong in the node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, we are double logging.
How about we remove the s.lggr.Errorf()
(e.g. L192, L206, L216) in this new CheckCallback
function, and let the caller capture the error and print (in doLookup() L #184)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should work, will just need to verify all logs in existing cases and ensure they remain the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good. Comment about logging changes
92a84ef
to
2eee0e9
Compare
lggr, _ := logger.NewLogger() | ||
blockSub := &blockSubscriber{k.client} | ||
streams := streams.NewStreamsLookup(packer, mercuryConfig, blockSub, k.rpcClient, keeperRegistry21, lggr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the output look like with this logger? is it useful output? does it look cluttered? If the output isn't useful, we might want to just pass a noop lggr here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example. The message doesnt seem to be cluttered, except the mercuryBytes is pretty long.
If the output isn't useful, we might want to just pass a noop lggr here.
Want to hear more about this.
2023-12-08T10:44:58.399-0800 [INFO] at block 56234784 upkeep 5591498142036749453487419299781783197030971023186134955311257372668222176389 requested time 1700575301 checkCallback mercuryBytes: 0x0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000003ec00000000000000000000000000000000000000000000000000000000000000500000000000000000000000000000000000000000000000000000000000000004000000000000000000000000000000000000000000000000000000000000003400000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000002a0000649e2796239591d181cc09f64b3fbc29dad46278ff8da175e27fce7e2039c000000000000000000000000000000000000000000000000000000000854a90b000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e000000000000000000000000000000000000000000000000000000000000001e00000000000000000000000000000000000000000000000000000000000000240010100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e000029584363bcf642315133c335b3646513c20f049602fc7d933be0d3f6360d300000000000000000000000000000000000000000000000000000000655cb84500000000000000000000000000000000000000000000000000000000655cb845000000000000000000000000000000000000000000000000000000000000138800000000000000000000000000000000000000000000000000000000000acecc00000000000000000000000000000000000000000000000000000000655e09c500000000000000000000000000000000000000000000006c93b6fd87c49ce77000000000000000000000000000000000000000000000000000000000000000020d863ed649850c8594447ccbc3cef931dfc7470f428dd744776a6f43715bf820954a32f1e31212d2f5ac4ca7bd595460e94a2ffab46e307e3aa3dbbde2d39ed3000000000000000000000000000000000000000000000000000000000000000274b8aeb3bb6beac1b3892f1cc962f34010bd8edaab8e347a8b9f53a210f1a4ed3c66b5cd980d98b89390093c01856d6008c968dd8fca075d941e5265d70a90ee00000000000000000000000000000000000000000000000000000000000001a0000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000400000000000000000000000000000000000000000000000000000000000000140000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000100000000000000000000000013cda86c223f9c2248f6321a6bdf3c321876ee870000000000000000000000000000000000000000000000000000000000000002000000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000003e8000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006c26b332e1bb924800 streams/streams.go:212 version=unset@unset
// handle v0.2 | ||
cfg, err := keeperRegistry21.GetUpkeepPrivilegeConfig(triggerCallOpts, upkeepID) | ||
// check if upkeep is allowed to use mercury v0.2 | ||
_, _, _, allowed, err := streams.AllowedToUseMercury(latestCallOpts, upkeepID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the call opts might be incorrect here - shouldn't we be using the triggerCallOpts
? Ex what happens if we debug a log from yesterday, when a user didn't have permission to use mercury, but today they do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is covered in my followup, here is branch for that code: https://github.com/smartcontractkit/chainlink/blob/AUTO-8170-debug-e2e/core/scripts/chaincli/handler/debug.go#L263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we duplicate the change here or merge the two PRs? This PR effectively introduces a bug. I don't think we should introduce a bug here and then fix it somewhere else. Let's just not introduce the bug :)
core/services/ocr2/plugins/ocr2keeper/evmregistry/v21/mercury/streams/streams.go
Show resolved
Hide resolved
if err != nil { | ||
failCheckConfig("failed to do mercury request ", err) | ||
resolveIneligible("failed to DoMercuryRequest") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this indicate an error in communicating with the mercury server? I just want to confirm that if the user had a bad key or bad mercury server address.. do we resolveIneligible()
or failCheckConfig()
(it should be the latter!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! Updating to failCheckConfig()
if err != nil { | ||
s.lggr.Errorf("at block %d upkeep %s requested time %s DoMercuryRequest err: %s", lookup.Block, lookup.UpkeepId, lookup.Time, err.Error()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DoMercuryRequest()
already logs all of it's errors, so this is a double log for the same error. Is it possible to consolidate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest leaving this log and then just returning the error (not logging it) in DoMercuryRequest()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, updating.
} else if lookup.IsMercuryV03() { | ||
state, reason, values, retryable, retryInterval, err = s.v03Client.DoRequest(ctx, lookup, pluginRetryKey) | ||
if err := s.CheckCallback(ctx, values, lookup, checkResults, i); err != nil { | ||
s.lggr.Errorf("at block %d upkeep %s requested time %s CheckCallback err: %s", lookup.Block, lookup.UpkeepId, lookup.Time, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Errorf
might be appropriate here. I don't where a user issue would generate an error return from CheckCallback()
. I only see packing/unpacking and rpc issues as potential sources of error.
} else if lookup.IsMercuryV03() { | ||
state, reason, values, retryable, retryInterval, err = s.v03Client.DoRequest(ctx, lookup, pluginRetryKey) | ||
if err := s.CheckCallback(ctx, values, lookup, checkResults, i); err != nil { | ||
s.lggr.Errorf("at block %d upkeep %s requested time %s CheckCallback err: %s", lookup.Block, lookup.UpkeepId, lookup.Time, err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts on your log level question...
If the err only comes from user mistake --> use Debug()
If the err only comes from system problems --> use Error()
If the err could be the result of user or system problems --> we're logging in the wrong place and need more granularity
2eee0e9
to
7dc1ba6
Compare
// handle v0.2 | ||
cfg, err := keeperRegistry21.GetUpkeepPrivilegeConfig(triggerCallOpts, upkeepID) | ||
// check if upkeep is allowed to use mercury v0.2 | ||
_, _, _, allowed, err := streams.AllowedToUseMercury(latestCallOpts, upkeepID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we duplicate the change here or merge the two PRs? This PR effectively introduces a bug. I don't think we should introduce a bug here and then fix it somewhere else. Let's just not introduce the bug :)
7dc1ba6
to
2803c56
Compare
_, errBalance := k.client.BalanceAt(ctx, gethcommon.Address{}, big.NewInt(1)) | ||
if errBalance != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is very go-ish. Usually there is a better way to solve the linter issues without renaming the err variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the linter locally and it looks like it only complained on 3 lines. Here's an example of how I would suggest to fix the linter issue:
// old code with shadow err
foo, err := someFunction()
// ...
if ... {
// this is line 134, and the error below shadows the one above
tmpCheckResult, err := keeperRegistry21.CheckUpkeep0(latestCallOpts, upkeepID)
}
// fixed code
foo, err := someFunction()
// ...
if ... {
// we get rid of the := in assignment and re-use the same err val declared outside the if block - no more shadow issue!
var tmpCheckResult iregistry21.CheckUpkeep0
tmpCheckResult, err = keeperRegistry21.CheckUpkeep0(latestCallOpts, upkeepID)
}
9b85d72
to
274c2f5
Compare
274c2f5
to
b17456b
Compare
All the lint issues in debug.go are fixed. Other chaincli lint issues are historical issues, unrelated to this PR and tracked in https://smartcontract-it.atlassian.net/browse/AUTO-8191 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
streams.go changes lgtm
* make streams lookup modular * polish * address comment to use pointer instead of array/map * rebase * get rid of slice * keep using array and index * remove duplicate logging * address err re-declarion lint issues
AUTO-8087
This PR is a follow up on this reverted PR: #11368
To have granular control in the debugging script and also to provide better code readability, we split the bigger functions in streams lookup to smaller functions, which have more clear boundaries. The smaller functions are used in debugging script to print staged debugging logs.
In this PR, we try to limit the number of refactoring, we used to remove the index and arrary, and use a reference of checkResult which caused a bug. I removed those unnecessary changes in this PR, those kind of improvements should be in separate PRs.
Test Plan:
go run main.go keeper debug 5591498142036749453487419299781783197030971023186134955311257372668222176389 0xdc6d0e547a5aa85fefa5b0f3a37e3493eafb5aeba8b5f3071ce53c9e9a539e9c 0
chaincli lint issues are not relevant to this PR.
since we touched the debugging script which doesnt have unit test, sonar test coverage complains.