-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060
Conversation
I see that you haven't updated any README files. Would it make sense to do so? |
c0b0c5d
to
9e32937
Compare
7f5d5c9
to
e7ef51a
Compare
e7ef51a
to
cc01004
Compare
@@ -147,6 +147,7 @@ func NewLogPoller(orm ORM, ec Client, lggr logger.Logger, pollPeriod time.Durati | |||
backfillBatchSize: backfillBatchSize, | |||
rpcBatchSize: rpcBatchSize, | |||
keepFinalizedBlocksDepth: keepFinalizedBlocksDepth, | |||
logPrunePageSize: logsPrunePageSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this param as a backward compatibility layer. Products already using "delete all" queries will not be affected by this PR. Product has to explicitly specify that they need paging
b7b1552
to
9ddf94a
Compare
Since this adds a new toml config param, we should also mention it in docs/CHANGELOG.md |
e54bbb1
to
3bb5914
Compare
3bb5914
to
a20d485
Compare
core/chains/evm/logpoller/orm.go
Outdated
@@ -272,7 +272,7 @@ func (o *DbORM) DeleteExpiredLogs(limit int64, qopts ...pg.QOpt) (int64, error) | |||
GROUP BY evm_chain_id, address, event | |||
HAVING NOT 0 = ANY(ARRAY_AGG(retention)) | |||
) r ON l.evm_chain_id = $1 AND l.address = r.address AND l.event_sig = r.event | |||
AND l.block_timestamp <= STATEMENT_TIMESTAMP() - (r.retention / 10^9 * interval '1 second') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to hear your reasoning for the block_timestamp -> created_at change.
I was thinking block_timestamp makes more sense, because how long we retain a log seems like it should be based on when the log was emitted rather than when we happened to find it. For example, if a log is a week old and the retention is 24 hours, but the node happens to have been down for a week... when we spin up the node it would backfill that log and then retain it for another 24 hours. Seems like it ought to just get rid of it immediately since it's been way more than 24 hours since it was emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That actually makes me think of another optimization I should add: if a log comes back from the rpc server and it's already past its retention time we shouldn't save it to the db in the first place; I don't think that check is in there yet, but it should be. (Unless we do change it to created_at, I'd like to hear if there is a use case for that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first i wanted that to use block_timestamp #12060, but I'm not sure about replaying logs older than retention, they will be immediately wiped out from db
ca6a6e8
to
db8192e
Compare
Quality Gate passedIssues Measures |
## Motivation The goal of this PR is to reduce the number of logs and blocks we keep in the database by utilizing a built-in retention mechanism in LogPoller. Requires paging for smooth deployment smartcontractkit/chainlink#12060 ## Solution This PR enables retention for all the LogPoller's filters registered by CCIP. Additionally, to avoid pushing too much pressure during deletion (especially the first run will have a lot of logs to remove) we've updated `LogPrunePageSize` to 10k. Please see the original PR in the chainlink repo to learn more about paging and its impact on the database. `LogPrunePageSize` is altered in the `fallback.toml` to avoid the necessity of setting this value for every chain that CCIP is deployed on.
## Motivation The goal of this PR is to reduce the number of logs and blocks we keep in the database by utilizing a built-in retention mechanism in LogPoller. Requires paging for smooth deployment smartcontractkit/chainlink#12060 ## Solution This PR enables retention for all the LogPoller's filters registered by CCIP. Additionally, to avoid pushing too much pressure during deletion (especially the first run will have a lot of logs to remove) we've updated `LogPrunePageSize` to 10k. Please see the original PR in the chainlink repo to learn more about paging and its impact on the database. `LogPrunePageSize` is altered in the `fallback.toml` to avoid the necessity of setting this value for every chain that CCIP is deployed on.
Motivation
Current implementation of pruning expired logs has two major drawbacks:
Solution
Solution here is to remove data in batches instead of everything at once. Paging not only limits locking but also gives us finer control over the operation, allowing the database to handle other transactions more efficiently between batches. To increase isolation between LogPoller's main routine we've also moved pruning to a separate routine.
Besides that, we've added information to deletes about number of records deleted (it's also tracked with Prometheus metric). This should increase our visibility on how fast logs are created/deleted, based on this data we should be able to adjust page size/prune threshold if necessary.
To make it backward compatible, prune paging is enabled only if
LogPrunePageSize
is defined. That being said, once merged it should not impact any existing products untilLogPrunePageSize
is defined.