Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Topic1, Topic2, Topic3, LogsPerBlock fields to logpoller.Filter and log_poller_filters schema #11949

Merged
merged 6 commits into from
Feb 22, 2024

Conversation

reductionista
Copy link
Contributor

@reductionista reductionista commented Feb 7, 2024

Also in this PR:

  • Converted all cases where a logpoller.Filter object is constructed over to using field names instead of positional arguments. This should make it much easier to add more fields in the future, and allow default values for uninteresting fields during tests to be filled in automatically.
  • Handle NULL HashArray and AddressArray reads from db by setting dest to nil instead of erroring out.
  • Bug fix to SQL QUERY logging (handles NULL and empty array values for BYTEA[] now, instead of pancing)
  • Minor improvements to SQL QUERY logging, so it's closer to an actual SQL query you can run via cut and paste
  • Add test for writing and reading back in filters from db, with various combinations of topic filters set

Copy link
Contributor

github-actions bot commented Feb 7, 2024

I see that you haven't updated any README files. Would it make sense to do so?

Copy link
Collaborator

@mateusz-sekara mateusz-sekara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I don't see any usage of these filters during PollAndSave cycle. IIUC goal of this PR is to support filters with extended definition, but probably code responsible for using them during PollAndSave would be in a separate PR, right?

@@ -0,0 +1,24 @@
-- +goose Up

ALTER TABLE evm.log_poller_filters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Goose run all these DDLs in a single TX? If not, there is an edge in which you drop constraint, malicious TX violating that constraint gets in, and then you bring back a new constraint. Not sure what's the probability of this happening, but probably sth like this could be safer

ALTER TABLE evm.log_poller_filters
    ADD COLUMN topic2 BYTEA CHECK (octet_length(topic2) = 32),
    ADD COLUMN topic3 BYTEA CHECK (octet_length(topic3) = 32),
    ADD COLUMN topic4 BYTEA CHECK (octet_length(topic4) = 32),
    ADD COLUMN max_logs_kept NUMERIC(78,0),
    ADD COLUMN logs_per_block NUMERIC(78,0);

CREATE UNIQUE INDEX evm_log_poller_filters_name_chain_address_event_topics_key ON evm.log_poller_filters (hash_record_extended((name, evm_chain_id, address, event, topic2, topic3, topic4), 0));

ALTER TABLE evm.log_poller_filters
    DROP CONSTRAINT evm_log_poller_filters_name_evm_chain_id_address_event_key;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the db migrations run concurrently with any other tx's, it should just run in isolation before the node starts up. But sure, I can change the order seems like good practice.

SELECT * FROM
(SELECT :name, :evm_chain_id ::::NUMERIC, :retention ::::BIGINT, NOW()) x,
(SELECT :name, :evm_chain_id ::::NUMERIC, :retention ::::BIGINT, :max_logs_kept ::::NUMERIC, :logs_per_block ::::NUMERIC, NOW()) x,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand the max_logs_kept argument. It might be useful for some filters for CCIP in which we are interested in the latest logs, so it's better to rely on the number of logs instead of duration retention. However, I wonder what's the use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the changes in this PR are aimed at addressing security concerns with Automation's new LogTriggers feature. They allow external users to track any logs on any smart contract, so without this it would open the door to a DoS attack where someone tells us to save every log from an ulta high volume smart contract, filling up the disk and halting operation.

The LogsPerBlock limit is less important, but it will be there as a safeguard against overloading the network connection with a high burst of logs all in one block. (They verified by writing a test contract that there is no limit on the number of logs you can emit at once, they showed me an example where they emitted 16,000 logs in one tx.)

ADD COLUMN topic2 BYTEA CHECK (octet_length(topic2) = 32),
ADD COLUMN topic3 BYTEA CHECK (octet_length(topic3) = 32),
ADD COLUMN topic4 BYTEA CHECK (octet_length(topic4) = 32),
ADD COLUMN max_logs_kept NUMERIC(78,0),
Copy link
Collaborator

@mateusz-sekara mateusz-sekara Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10^77 of logs? Does it make sense? Isn't regular bigint just good enough? I can't imagine a case in which someone defines that he/she wants to keep 10^18 logs or sth :D

Topic3 evmtypes.HashArray // list of possible values for topic3
Topic4 evmtypes.HashArray // list of possible values for topic4
Retention time.Duration // maximum amount of time to retain logs
MaxLogsKept *ubig.Big // maximum number of logs to retain
Copy link
Collaborator

@mateusz-sekara mateusz-sekara Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is gonna be used during PruneOldLogs, right? Counting during delete might be heavy, have you thought about the delete query that would respect that value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thought about that yet, but I suspect we can find a way to make it efficient. If pruning all filters at once is a problem, one thing we could do is do one filter at a time... so that only the rows for a single filter are locked at once while we count & prune them.

@reductionista
Copy link
Contributor Author

Correct me if I'm wrong, but I don't see any usage of these filters during PollAndSave cycle. IIUC goal of this PR is to support filters with extended definition, but probably code responsible for using them during PollAndSave would be in a separate PR, right?

That's right. There are two followup PR's to make use of these extra fields. One of them changes the way in which logs are requested (instead of a single request with a big combined filter, it's a batch of requests sent all at once, normally 1 batch element per filter but with some optimizations for combining requests. This one is in progress, and there will be another PR after that for implementing the rate limiting & MaxLogsKept feature. Since both of these PR's required extending the logpoller.Filter object and migrating the db, I figured it made sense to combine that part of it as a common precursor to these changes.

mateusz-sekara
mateusz-sekara previously approved these changes Feb 22, 2024
And update evm.log_poller_filters schema to match
[]BYTEA was panicing for nil or empty array, which then gets trapped and
ignored... making it very difficult to figure out which query the error
is coming from.

While fixing that bug, updated formating of []BYTEA and TEXT to be
closer an actual SQL query you can run by cutting and pasting from the log
- Add new index before removing old index to be safer
- Change MaxLogsKept & LogsPerBlock from big.Int to uint64
Copy link
Collaborator

@mateusz-sekara mateusz-sekara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@reductionista reductionista added this pull request to the merge queue Feb 22, 2024
Merged via the queue into develop with commit 63c286d Feb 22, 2024
97 checks passed
@reductionista reductionista deleted the BCF-2890-logpoller-filter-extension branch February 22, 2024 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants