[RFC][indexer-alt] Add pruning strategy #20799

lxfind · 2025-01-07T04:47:58Z

Description

This is still a draft. An exploration on an alternative approach towards pruning. Let me know how you feel about it:

Adds a concept of pruning strategy, and every Handler must implement a function that returns it.
There are two strategies at the moment: SimpleRange which prunes based on a key column for a range; and PerObjectPruning which prunes per object for a given checkpoint range.
The strategy explicitly defines whether it uses processed checkpoint data, which feeds to startup watermark.
A side benefit is it will then allow us easily support simple pruning for all pipelines without implementing them.

The only thing I don't quite like is that we have to pass table and column names to the strategy.

Test plan

How did you test the new or updated feature?

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

vercel · 2025-01-07T04:48:02Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jan 7, 2025 4:49am

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Jan 7, 2025 4:49am
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Jan 7, 2025 4:49am
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Jan 7, 2025 4:49am

wlmyng

What seems to be the main pain point with the current implementation is that an indexer provider would need to remember to have the processor write to some shared info that the pruner would know how to read from. But the benefit of this is that the framework doesn't force the indexer operator to conform to specific patterns and interfaces.

It feels a bit redundant today, since most of our pipelines do follow the simple pruning strategy, but then we needed to change tings up with objinfo and coinbalancebuckets. Perhaps other indexer providers might conceive of yet more complex tables with advanced pruning strategies. This would be harder to work with if the framework is responsible for housing pruning strategies.

wlmyng · 2025-01-07T18:19:02Z

crates/sui-indexer-alt-framework/src/handlers/cp_sequence_numbers.rs

+
+    fn pruning_strategy(&self) -> Arc<dyn PruningStrategyTrait> {
+        Arc::new(SimpleRangePruning {
+            table_name: "cp_sequence_numbers".to_string(),


this should be the same as the NAME right? Or do we not want to bundle these things together. cuz then we'd just have one thing to pass to the strategy

Today, there is nothing that forces the name of DB table to be identical with the name of the pipeline.

wlmyng · 2025-01-07T18:24:20Z

crates/sui-indexer-alt-framework/src/pipeline/concurrent/pruner/per_object_pruner.rs

+            })
+            .collect::<Vec<_>>()
+            .join(",");
+        let query = format!(
+            "
+            WITH to_prune_data (object_id, cp_sequence_number_exclusive) AS (
+                VALUES {values}
+            )
+            DELETE FROM {table_name}
+            USING to_prune_data
+            WHERE {table_name}.{object_id_column_name} = to_prune_data.object_id
+              AND {table_name}.{cp_sequence_number_column_name} < to_prune_data.cp_sequence_number_exclusive


feel like this is yet another argument for moving to sqlx in lieu of diesel

lxfind · 2025-01-07T18:46:49Z

Perhaps other indexer providers might conceive of yet more complex tables with advanced pruning strategies. This would be harder to work with if the framework is responsible for housing pruning strategies.

The pruning strategy is just an trait. So if a custom indexer needs a more complex pruning strategy, they could implement their own and use it.

lxfind · 2025-01-07T18:57:57Z

Note that we could also trivially introduce a NoPruning strategy (which is basically the default when a pipeline is added without implementing prune() function. I think it may be less error prone when there is no default provided.

[indexer-alt] Add pruning strategy

9db49d9

lxfind marked this pull request as ready for review January 7, 2025 04:48

lxfind temporarily deployed to sui-typescript-aws-kms-test-env January 7, 2025 04:48 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs January 7, 2025 04:49 View deployment

lxfind requested review from amnn and wlmyng January 7, 2025 04:54

wlmyng reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC][indexer-alt] Add pruning strategy #20799

[RFC][indexer-alt] Add pruning strategy #20799

lxfind commented Jan 7, 2025 •

edited

Loading

vercel bot commented Jan 7, 2025 •

edited

Loading

wlmyng left a comment

wlmyng Jan 7, 2025

lxfind Jan 7, 2025

wlmyng Jan 7, 2025

lxfind commented Jan 7, 2025

lxfind commented Jan 7, 2025

[RFC][indexer-alt] Add pruning strategy #20799

Are you sure you want to change the base?

[RFC][indexer-alt] Add pruning strategy #20799

Conversation

lxfind commented Jan 7, 2025 • edited Loading

Description

Test plan

Release notes

vercel bot commented Jan 7, 2025 • edited Loading

wlmyng left a comment

Choose a reason for hiding this comment

wlmyng Jan 7, 2025

Choose a reason for hiding this comment

lxfind Jan 7, 2025

Choose a reason for hiding this comment

wlmyng Jan 7, 2025

Choose a reason for hiding this comment

lxfind commented Jan 7, 2025

lxfind commented Jan 7, 2025

lxfind commented Jan 7, 2025 •

edited

Loading

vercel bot commented Jan 7, 2025 •

edited

Loading