Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(batching): add compute_batch_size to BatchStep #390

Merged
merged 3 commits into from
Nov 19, 2024

Conversation

MeredithAnya
Copy link
Member

@MeredithAnya MeredithAnya commented Nov 6, 2024

I wanted to be able to batch by an amount other than just the count of messages. I have this implement in getsentry/snuba#6510, but it's a lot of copying from arroyo, so it would be nice to have it baked into here. I'm not sure if we have a ton of other uses for it, but it also seems like it would be relatively harmless to add since by default the batching works the same as before

Looks like the rust implementation already has something similar (compute_batch_size) added in a5087b8

MeredithAnya added a commit to getsentry/snuba that referenced this pull request Nov 8, 2024
**context**:
The work for bulk deleting in snuba has so far included the following:
* Added the kafka schemas -
getsentry/sentry-kafka-schemas#347
* Adding the new endpoint including producing to the topic -
#6440
* Created the topic in production for all env -
getsentry/ops#12711

**what's left**:
Now that we have the topics created we can finish up the consumer side.
- [ ] Add the consumer logic to snuba (This PR)
- [ ] Add the consumer deployment to S4S region in the ops repository
- [ ] Set up datadog alerts/metrics and other observability

**this PR**:
It's a bit of a larger PR but it can be reviewed in a couple sections:
* The main consumer logic and strategy
* This has the logic to create the strategy factory for the consumer and
composes the strategy steps. The `strategy.py` file has the details for
actually executing the delete query
* The `batching.py` file - I have an [arroyo
PR](getsentry/arroyo#390) that makes this file
obsolete but in the meantime I don't think it needs to block this PR
since it will be easy to remove after
* The formatters are going to be the only code that someone will need to
write in the future when deploying the deletions consumer for a
different storage. How one formats the conditions for the `DELETE` query
is up to that logic.
Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok. if i understood correctly it's just the parameter name that's different right? generally i'd prefer to name almost everything the same between rust and python, to reduce mental overhead

@MeredithAnya
Copy link
Member Author

looks ok. if i understood correctly it's just the parameter name that's different right? generally i'd prefer to name almost everything the same between rust and python, to reduce mental overhead

I think yes mostly the same, although it seems like in the rust implementation its required, whereas here it defaults to incrementing by 1, I can change the function name though to be consistent

@MeredithAnya MeredithAnya changed the title ref(batching): add increment_by to BatchStep ref(batching): add compute_batch_size to BatchStep Nov 19, 2024
@MeredithAnya MeredithAnya merged commit 0fecd4a into main Nov 19, 2024
14 checks passed
@MeredithAnya MeredithAnya deleted the meredith/11-6-24 branch November 19, 2024 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants