Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][indexer-alt] Add pruning strategy #20799

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

lxfind
Copy link
Contributor

@lxfind lxfind commented Jan 7, 2025

Description

This is still a draft. An exploration on an alternative approach towards pruning. Let me know how you feel about it:

  1. Adds a concept of pruning strategy, and every Handler must implement a function that returns it.
  2. There are two strategies at the moment: SimpleRange which prunes based on a key column for a range; and PerObjectPruning which prunes per object for a given checkpoint range.
  3. The strategy explicitly defines whether it uses processed checkpoint data, which feeds to startup watermark.
  4. A side benefit is it will then allow us easily support simple pruning for all pipelines without implementing them.

The only thing I don't quite like is that we have to pass table and column names to the strategy.

Test plan

How did you test the new or updated feature?


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • gRPC:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:

Copy link

vercel bot commented Jan 7, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 7, 2025 4:49am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Jan 7, 2025 4:49am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Jan 7, 2025 4:49am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Jan 7, 2025 4:49am

@lxfind lxfind marked this pull request as ready for review January 7, 2025 04:48
@lxfind lxfind temporarily deployed to sui-typescript-aws-kms-test-env January 7, 2025 04:48 — with GitHub Actions Inactive
@lxfind lxfind temporarily deployed to sui-typescript-aws-kms-test-env January 7, 2025 04:48 — with GitHub Actions Inactive
@lxfind lxfind requested review from amnn and wlmyng January 7, 2025 04:54
Copy link
Contributor

@wlmyng wlmyng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What seems to be the main pain point with the current implementation is that an indexer provider would need to remember to have the processor write to some shared info that the pruner would know how to read from. But the benefit of this is that the framework doesn't force the indexer operator to conform to specific patterns and interfaces.

It feels a bit redundant today, since most of our pipelines do follow the simple pruning strategy, but then we needed to change tings up with objinfo and coinbalancebuckets. Perhaps other indexer providers might conceive of yet more complex tables with advanced pruning strategies. This would be harder to work with if the framework is responsible for housing pruning strategies.


fn pruning_strategy(&self) -> Arc<dyn PruningStrategyTrait> {
Arc::new(SimpleRangePruning {
table_name: "cp_sequence_numbers".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be the same as the NAME right? Or do we not want to bundle these things together. cuz then we'd just have one thing to pass to the strategy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, there is nothing that forces the name of DB table to be identical with the name of the pipeline.

Comment on lines +41 to +52
})
.collect::<Vec<_>>()
.join(",");
let query = format!(
"
WITH to_prune_data (object_id, cp_sequence_number_exclusive) AS (
VALUES {values}
)
DELETE FROM {table_name}
USING to_prune_data
WHERE {table_name}.{object_id_column_name} = to_prune_data.object_id
AND {table_name}.{cp_sequence_number_column_name} < to_prune_data.cp_sequence_number_exclusive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like this is yet another argument for moving to sqlx in lieu of diesel

@lxfind
Copy link
Contributor Author

lxfind commented Jan 7, 2025

Perhaps other indexer providers might conceive of yet more complex tables with advanced pruning strategies. This would be harder to work with if the framework is responsible for housing pruning strategies.

The pruning strategy is just an trait. So if a custom indexer needs a more complex pruning strategy, they could implement their own and use it.

@lxfind
Copy link
Contributor Author

lxfind commented Jan 7, 2025

Note that we could also trivially introduce a NoPruning strategy (which is basically the default when a pipeline is added without implementing prune() function. I think it may be less error prone when there is no default provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants