-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC][indexer-alt] Add pruning strategy #20799
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What seems to be the main pain point with the current implementation is that an indexer provider would need to remember to have the processor write to some shared info that the pruner would know how to read from. But the benefit of this is that the framework doesn't force the indexer operator to conform to specific patterns and interfaces.
It feels a bit redundant today, since most of our pipelines do follow the simple pruning strategy, but then we needed to change tings up with objinfo and coinbalancebuckets. Perhaps other indexer providers might conceive of yet more complex tables with advanced pruning strategies. This would be harder to work with if the framework is responsible for housing pruning strategies.
|
||
fn pruning_strategy(&self) -> Arc<dyn PruningStrategyTrait> { | ||
Arc::new(SimpleRangePruning { | ||
table_name: "cp_sequence_numbers".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be the same as the NAME
right? Or do we not want to bundle these things together. cuz then we'd just have one thing to pass to the strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today, there is nothing that forces the name of DB table to be identical with the name of the pipeline.
}) | ||
.collect::<Vec<_>>() | ||
.join(","); | ||
let query = format!( | ||
" | ||
WITH to_prune_data (object_id, cp_sequence_number_exclusive) AS ( | ||
VALUES {values} | ||
) | ||
DELETE FROM {table_name} | ||
USING to_prune_data | ||
WHERE {table_name}.{object_id_column_name} = to_prune_data.object_id | ||
AND {table_name}.{cp_sequence_number_column_name} < to_prune_data.cp_sequence_number_exclusive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel like this is yet another argument for moving to sqlx in lieu of diesel
The pruning strategy is just an trait. So if a custom indexer needs a more complex pruning strategy, they could implement their own and use it. |
Note that we could also trivially introduce a NoPruning strategy (which is basically the default when a pipeline is added without implementing |
Description
This is still a draft. An exploration on an alternative approach towards pruning. Let me know how you feel about it:
The only thing I don't quite like is that we have to pass table and column names to the strategy.
Test plan
How did you test the new or updated feature?
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.