Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Effect functions #810

Open
typedarray opened this issue Apr 17, 2024 · 11 comments
Open

[Feature] Effect functions #810

typedarray opened this issue Apr 17, 2024 · 11 comments

Comments

@typedarray
Copy link
Collaborator

We should support an alternative kind of function that uses a similar registration / context API, but supports side effects.

@vimtor
Copy link

vimtor commented Oct 31, 2024

Why can't you just call the effect on the indexer function? Would there be problems with block reorgs?

@typedarray
Copy link
Collaborator Author

typedarray commented Oct 31, 2024

Why can't you just call the effect on the indexer function? Would there be problems with block reorgs?

A few reasons:

  1. Performance - Side effect workloads often take a long time (e.g. CPU-bound image processing, waiting for HTTP responses from other APIs, analytical SQL queries). If these effects run in the indexing "hot loop", it can make reindexing take forever, and for high-throughput use cases it prevents the app from keeping up with chain tip. Also, side effect workloads can often run in parallel/asynchronously (and indexing often can't).
  2. Errors - If an unrecoverable error occurs during indexing, the Ponder instance crashes. This makes sense because the only "approved" actions within an indexing function are 1) SQL operations and 2) RPC requests like readContract. If either of these fail despite many retries, something is very wrong and demands the developer's immediate attention (database or RPC provider is down). With side effects, errors are much more common and are often tolerable, because the functions usually call external services. So, side effect errors probably should not crash the instance.
  3. Reorgs - Ponder automatically handles reorgs for indexing, which is possible because we control the SQL layer and can ensure that every database operation is revertible. With side effects, there's no way for the framework to revert the action automatically, so we likely need to expose an optional "undo" function that runs on reorg.
  4. State - Today (without side effects) Ponder maintains no state between hot reloads / redeployments, because the database state can be deterministically reconstructed on demand. With side effects, you very likely want to keep track of which events have already been handled, and not re-run all side effects on each redeployment. So, we probably need new semantics for specifying when a specific side effect should run (full backfill, or only in realtime? only after finalization?). There are also some gnarly issues here for zero-downtime deployments.

With these requirements in mind, it becomes pretty clear that we need a job queue pattern for side effects (with some EVM/Ponder sauce to make it really intuitive).

@tk-o
Copy link
Contributor

tk-o commented Dec 2, 2024

@typedarray, I'm keen to support the Ponder team in getting the Effects feature built 🙃

I appreciate all the thinking going into making things work correctly. The nature of blockchain networks requires a lot of intellectual gymnastics to get right outcomes.

Running effects when the current state can get soon invalidated is a challenge. A challenge of applying probabilities.

Use case

Just the other day, I proposed this change:

In my use case, I need my web backend service to calculate some rewards for users depending on their on-chain activity and staked tokens amount.

Context

Ponder indexing

I understand, that indexing functions, currently, are triggered for unfinalized blocks. It means, that some of the invocations are going to be discarded if a blockchain-reorg gets detected.

The re-org means the last locally processed block hash is not equal the parent hash of a just-fetched remote block.

Such defined re-orgs are driven by an RPC that cannot catch up with the fastest RPCs on the network and when asked for blocks, return some not up-to-date data.

RPC quality vs re-orgs count

For many months, I've been maintaining four Ponder indexers across multiple mainnets (Ethereum, Arbitrum). Each of them prints the following metric about reorgs:

# HELP ponder_indexing_function_error_total Total number of errors encountered during indexing function execution
# TYPE ponder_indexing_function_error_total counter

# HELP ponder_historical_start_timestamp Unix timestamp (ms) when the historical sync service started
# TYPE ponder_historical_start_timestamp gauge

I read such metrics print as no reorgs have ever happened. Is that a correct thing to say?

I was also running indexing on couple of EVM testnets, and one of them gave me the re-orgs for almost every block fetched. I switched to another RPC and the re-orgs were gone.

Re-orgs

vs indexed data quality

From my observations, reorgs happen more often on low-quality RPCs, and might even not happen for a very long time when using quality RPCs. It's a challenge while indexing the data, as the data source might keep changing.

Ponder does not support effects as the quality of data source (the latest fetched block is always ahead of the latest processed block) cannot be guaranteed. However, the Ponder data api (GQL/REST API) allows fetching data about the current state of indexed blockchain view, which might also not be correct at a time. There's a very little probability that some client would as Ponder data api for some data that would be changed due to a re-org.

vs effects

The effects are only useful if there are any clients subscribing to them. So either the client, or the indexer, need to work around the possibility of effect being invalidated. In short, someone needs to keep track of the confirmed effects.

We cannot definitely say that the next block that will be fetched is not going to trigger the re-org. But, we can describe how likely it is to occur. For example, by saying how many re-orgs actually happened for the total of realtime synced blocks so far.

For quality data sources (fast RPCs), the re-org likeliness should be simply zero. Sometimes, it's going to be slightly around zero. And sometimes it going to be a much greater probability.

Using this predictive metric, we could delay fetching blocks to give the RPC a chance to catch up. We could also try to delay triggering the effect function for a certain blocks range, depending on that predictive metric.

@tk-o
Copy link
Contributor

tk-o commented Dec 2, 2024

On the other hand, we could also make Ponder to only fetch blocks with block tags (safe or finalized):

From what I see, Ponder syncing engine allows fetching block either by a number of by the latest tag (which has low confidence in becoming a finalized block).

For example, we could keep track of the last safe/finalized block number, and only trigger effects for earlier blocks.

Any thoughts on that, @typedarray?

@tk-o
Copy link
Contributor

tk-o commented Dec 2, 2024

Addressing the performance issue: perhaps running side-effects from inside a worker thread would be a good choice.

It's not going to block the indexing, and also will will allow queuing the side-effect callbacks within the same thread.

Have a look at: https://piscinajs.github.io/piscina/

@tk-o
Copy link
Contributor

tk-o commented Dec 2, 2024

I created a module handling the side effects with re-orgs in mind. Here's the demo:
https://stackblitz.com/edit/evm-event-side-effects?file=lib%2Fevm-side-effects%2Fqueue.spec.ts

The module requires two block data providers: one fetching the latest block, and one fetching the finalized block.

Ponder indexing functions run each time the latest block arrives. That's when we yet don't know if the block is final. But we can register effects for the given event (the event id being: block number, block hash, log index). The effect callback functions might, or might not be called.

If the re-org occurs, the effects queue would be rebuilt without the effects that were registered for the just-discarded blocks (block number, block hash).

Finally, the second block data provider kicks in. The safe/finalized block provider. It calls the effects queue and process all remaining effects for the blocks before the current safe/finalized block.

Demo: https://stackblitz.com/edit/evm-event-side-effects?file=lib%2Fevm-side-effects%2Fqueue.spec.ts
telegram-cloud-photo-size-5-6228498256707567750-y

@typedarray
Copy link
Collaborator Author

typedarray commented Dec 2, 2024

I created a module handling the side effects with re-orgs in mind. Here's the demo:

Cool design, I can see this working for some side effect use cases.

But, what if the user suspects that reorgs are very rare, and latency is critical. So, they would rather run the side effect as soon as possible, tolerating the ~1 in a million case where the event ends up getting reorg'd out. Or, maybe they'd like to register an "undo" function which runs in that 1 in million scenario.

What if the effect functions must run in-order, similar to indexing functions? Should we allow this as an option, not offer any ordering guarantees?

What should the coding experience look like, particularly when running the dev server? Effect functions will probably have a lot of duplicate logic/work as indexing functions. Can you kick off a side effect job from within an indexing function? Or only register them at the top-level as "siblings" to indexing functions, e.g. ponder.effect("Pool:Swap", ...).

These are the types of design questions that we need to answer. Honestly, the implementation will probably be easy once we have clear answers to these (and more).

@tk-o
Copy link
Contributor

tk-o commented Dec 2, 2024

Ok, so design wise, I'd go with:

  1. Opt-in to commit/rollback style for triggering effects and having them undone when the re-org occurred. Not everyone will need the undo step.
  2. Opt-in to using confidence levels when executing effect functions — the confidence would be related to a rolling average of reorgs happening over the past N blocks. Not everyone will need it, but some might want to keep some alerting on.
  3. The effects should be baked-in to the indexing functions, and effectively serve as a post-indexing hook during the realtime indexing phase. For example, an indexing function could return an effect object instance. That instance would know about the context of the specific event and its block. Also, the return is optional, so if void is returned, nothing happens.
  4. The effect functions should run in order by default, so all the client mutations can have a chance to occur in an ordered sequence. I don't see a case when the client doesn't care about the order of events. Or if they don't they can simply pull a batch of raw events data periodically from the Ponder's GQL interface.
  5. The execution of the effect functions should happen in a worker thread, so we could keep indexing independently to executing effects. No need, whatsoever, to keep that execution on the main thread.
  6. Ponder should allow picking the confidence levels for effects to get executed. For example, some might want only the finalized blocks to trigger effects, while others are ok with effects being triggered for the latest blocks (my use cases would work like that). This should address adjusting latency according to client's needs.

@tk-o
Copy link
Contributor

tk-o commented Dec 26, 2024

Hey @typedarray, I believe the Ponder team has been working on this issue recently. Are there any updates on side-effects management?

@typedarray
Copy link
Collaborator Author

@tk-o no near-term update. We're still committed to solving the core indexing use case before expanding to effects. At the moment, this means improving the client/query story, direct SQL, working with offchain data, and performance for huge apps.

@tk-o
Copy link
Contributor

tk-o commented Jan 9, 2025

@tk-o no near-term update. We're still committed to solving the core indexing use case before expanding to effects. At the moment, this means improving the client/query story, direct SQL, working with offchain data, and performance for huge apps.

Fair enough, sir. I will definitely keep my own solution running, but also think of having it changed to take the questions you've listed into account and have solutions in place as the answers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants