CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060

mateusz-sekara · 2024-02-16T15:01:03Z

Motivation

Current implementation of pruning expired logs has two major drawbacks:

it removes all expired logs within a single database call. First of all, this could generate high pressure (especially IO) when removing large number of logs at once (> 100k).
for long prunes/deletes, regular PollAndSave is not executed in parallel, because there is one LogPoller's routine per chain. That might impact product's availability/

Solution

Solution here is to remove data in batches instead of everything at once. Paging not only limits locking but also gives us finer control over the operation, allowing the database to handle other transactions more efficiently between batches. To increase isolation between LogPoller's main routine we've also moved pruning to a separate routine.
Besides that, we've added information to deletes about number of records deleted (it's also tracked with Prometheus metric). This should increase our visibility on how fast logs are created/deleted, based on this data we should be able to adjust page size/prune threshold if necessary.

To make it backward compatible, prune paging is enabled only if LogPrunePageSize is defined. That being said, once merged it should not impact any existing products until LogPrunePageSize is defined.

github-actions · 2024-02-16T15:01:24Z

I see that you haven't updated any README files. Would it make sense to do so?

mateusz-sekara · 2024-02-20T09:34:49Z

core/chains/evm/logpoller/log_poller.go

@@ -147,6 +147,7 @@ func NewLogPoller(orm ORM, ec Client, lggr logger.Logger, pollPeriod time.Durati
 		backfillBatchSize:        backfillBatchSize,
 		rpcBatchSize:             rpcBatchSize,
 		keepFinalizedBlocksDepth: keepFinalizedBlocksDepth,
+		logPrunePageSize:         logsPrunePageSize,


I've added this param as a backward compatibility layer. Products already using "delete all" queries will not be affected by this PR. Product has to explicitly specify that they need paging

core/chains/evm/logpoller/log_poller.go

core/chains/evm/logpoller/orm_test.go

reductionista · 2024-02-20T23:00:39Z

Since this adds a new toml config param, we should also mention it in docs/CHANGELOG.md

reductionista · 2024-02-22T17:27:11Z

core/chains/evm/logpoller/orm.go

@@ -272,7 +272,7 @@ func (o *DbORM) DeleteExpiredLogs(limit int64, qopts ...pg.QOpt) (int64, error)
 				GROUP BY evm_chain_id, address, event
 				HAVING NOT 0 = ANY(ARRAY_AGG(retention))
 			) r ON l.evm_chain_id = $1 AND l.address = r.address AND l.event_sig = r.event
-			AND l.block_timestamp <= STATEMENT_TIMESTAMP() - (r.retention / 10^9 * interval '1 second')


Curious to hear your reasoning for the block_timestamp -> created_at change.

I was thinking block_timestamp makes more sense, because how long we retain a log seems like it should be based on when the log was emitted rather than when we happened to find it. For example, if a log is a week old and the retention is 24 hours, but the node happens to have been down for a week... when we spin up the node it would backfill that log and then retain it for another 24 hours. Seems like it ought to just get rid of it immediately since it's been way more than 24 hours since it was emitted.

That actually makes me think of another optimization I should add: if a log comes back from the rpc server and it's already past its retention time we shouldn't save it to the db in the first place; I don't think that check is in there yet, but it should be. (Unless we do change it to created_at, I'd like to hear if there is a use case for that.)

At first i wanted that to use block_timestamp #12060, but I'm not sure about replaying logs older than retention, they will be immediately wiped out from db

… difference

cl-sonarqube-production · 2024-02-22T18:26:54Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
41.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

## Motivation The goal of this PR is to reduce the number of logs and blocks we keep in the database by utilizing a built-in retention mechanism in LogPoller. Requires paging for smooth deployment smartcontractkit/chainlink#12060 ## Solution This PR enables retention for all the LogPoller's filters registered by CCIP. Additionally, to avoid pushing too much pressure during deletion (especially the first run will have a lot of logs to remove) we've updated `LogPrunePageSize` to 10k. Please see the original PR in the chainlink repo to learn more about paging and its impact on the database. `LogPrunePageSize` is altered in the `fallback.toml` to avoid the necessity of setting this value for every chain that CCIP is deployed on.

mateusz-sekara temporarily deployed to sdlc February 16, 2024 15:01 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-deletin-in-batches branch from c0b0c5d to 9e32937 Compare February 16, 2024 16:24

mateusz-sekara temporarily deployed to sdlc February 16, 2024 16:25 — with GitHub Actions Inactive

mateusz-sekara temporarily deployed to sdlc February 19, 2024 12:25 — with GitHub Actions Inactive

mateusz-sekara temporarily deployed to sdlc February 19, 2024 13:38 — with GitHub Actions Inactive

mateusz-sekara mentioned this pull request Feb 19, 2024

CCIP-1716 Adding retention to filters used by LogPoller smartcontractkit/ccip#530

Merged

mateusz-sekara force-pushed the lp-deletin-in-batches branch from 7f5d5c9 to e7ef51a Compare February 19, 2024 15:37

mateusz-sekara temporarily deployed to sdlc February 19, 2024 15:37 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-deletin-in-batches branch from e7ef51a to cc01004 Compare February 19, 2024 15:39

mateusz-sekara temporarily deployed to sdlc February 19, 2024 15:39 — with GitHub Actions Inactive

mateusz-sekara commented Feb 20, 2024

View reviewed changes

mateusz-sekara temporarily deployed to sdlc February 20, 2024 10:42 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-deletin-in-batches branch from b7b1552 to 9ddf94a Compare February 20, 2024 11:29

mateusz-sekara temporarily deployed to sdlc February 20, 2024 11:29 — with GitHub Actions Inactive

reductionista reviewed Feb 20, 2024

View reviewed changes

core/chains/evm/logpoller/log_poller.go Show resolved Hide resolved

core/chains/evm/logpoller/orm_test.go Outdated Show resolved Hide resolved

mateusz-sekara temporarily deployed to sdlc February 21, 2024 14:06 — with GitHub Actions Inactive

mateusz-sekara temporarily deployed to sdlc February 21, 2024 14:43 — with GitHub Actions Inactive

mateusz-sekara changed the title ~~Moving pruning to a separate loop and respecting paging there~~ CCIP-1496 Moving pruning to a separate loop and respecting paging there Feb 21, 2024

mateusz-sekara temporarily deployed to sdlc February 21, 2024 15:11 — with GitHub Actions Inactive

mateusz-sekara temporarily deployed to sdlc February 21, 2024 15:27 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-deletin-in-batches branch from e54bbb1 to 3bb5914 Compare February 21, 2024 15:29

mateusz-sekara temporarily deployed to sdlc February 21, 2024 15:29 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-deletin-in-batches branch from 3bb5914 to a20d485 Compare February 21, 2024 15:30

mateusz-sekara temporarily deployed to sdlc February 21, 2024 15:30 — with GitHub Actions Inactive

mateusz-sekara marked this pull request as ready for review February 21, 2024 15:31

mateusz-sekara requested review from a team as code owners February 21, 2024 15:31

reductionista reviewed Feb 22, 2024

View reviewed changes

mateusz-sekara added 10 commits February 22, 2024 18:58

Moving pruning to a separate loop and respecting paging there

67a48cd

Separate parameter to configure paging during logs pruning

c770a7a

Fixing specs

5a844e7

Config fixes

e122b6b

Post review fixes

2e76950

Post review fixes

975f9be

Tests

7dc2152

Switching to index instead of primary key, because of the performance…

af114fd

… difference

Minor performance improvement to blocks deletion

8923be9

Minor fix

db8192e

mateusz-sekara force-pushed the lp-deletin-in-batches branch from ca6a6e8 to db8192e Compare February 22, 2024 17:59

mateusz-sekara requested a review from reductionista February 22, 2024 17:59

mateusz-sekara temporarily deployed to sdlc February 22, 2024 17:59 — with GitHub Actions Inactive

Minor fix

6d5d9b9

mateusz-sekara temporarily deployed to sdlc February 22, 2024 18:15 — with GitHub Actions Inactive

reductionista approved these changes Feb 22, 2024

View reviewed changes

mateusz-sekara enabled auto-merge February 22, 2024 18:50

mateusz-sekara added this pull request to the merge queue Feb 22, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 22, 2024

mateusz-sekara added this pull request to the merge queue Feb 23, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 23, 2024

mateusz-sekara added this pull request to the merge queue Feb 23, 2024

Merged via the queue into develop with commit 5f212bb Feb 23, 2024
97 checks passed

mateusz-sekara deleted the lp-deletin-in-batches branch February 23, 2024 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060

CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060

mateusz-sekara commented Feb 16, 2024 •

edited

Loading

github-actions bot commented Feb 16, 2024

mateusz-sekara Feb 20, 2024

reductionista commented Feb 20, 2024

reductionista Feb 22, 2024

reductionista Feb 22, 2024 •

edited

Loading

mateusz-sekara Feb 22, 2024

cl-sonarqube-production bot commented Feb 22, 2024

CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060

CCIP-1496 Moving pruning to a separate loop and respecting paging there #12060

Conversation

mateusz-sekara commented Feb 16, 2024 • edited Loading

Motivation

Solution

github-actions bot commented Feb 16, 2024

mateusz-sekara Feb 20, 2024

Choose a reason for hiding this comment

reductionista commented Feb 20, 2024

reductionista Feb 22, 2024

Choose a reason for hiding this comment

reductionista Feb 22, 2024 • edited Loading

Choose a reason for hiding this comment

mateusz-sekara Feb 22, 2024

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Feb 22, 2024

Quality Gate passed

mateusz-sekara commented Feb 16, 2024 •

edited

Loading

reductionista Feb 22, 2024 •

edited

Loading