Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize with IN Clause for UPDATE/DELETE Statements on Vindexes #15455

Merged
merged 6 commits into from
Apr 1, 2024

Conversation

wangweicugw
Copy link
Contributor

@wangweicugw wangweicugw commented Mar 12, 2024

Description

When I run a SELECT statement using an IN clause with the Vindexes, the query is correctly split according to the corresponding values in the IN list. This ensures that each shard is queried separately, which is the expected behavior for optimized shard querying.

However, when I perform an UPDATE/DELETE statement with an IN clause on the Vindexes, the SQL is not rewritten in the same manner. Instead, the query is sent directly to the shard without being split based on the IN clause values.

I hope to have consistent behavior with SELECT when processing UPDATE/DELETE statements with an IN clause on the Vindexes.

I would like the following UPDATE statement:

UPDATE test_vindex_signed_int SET id = id + 100 WHERE v_key IN (1, 100)

to be rewritten into separate statements, one for each value in the IN clause, like so:

UPDATE test_vindex_signed_int SET id = id + 100 WHERE v_key IN (1)
UPDATE test_vindex_signed_int SET id = id + 100 WHERE v_key IN (100)

This would ensure that the UPDATE/DELETE operation is as efficient as possible, especially for large batch updates, by targeting only the relevant shards.

Related Issue(s)

Checklist

  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Copy link
Contributor

vitess-bot bot commented Mar 12, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Mar 12, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Mar 12, 2024
@wangweicugw wangweicugw changed the title Optimize update/delete processing of in Optimize with IN Clause for UPDATE/DELETE Statements on Vindexes Mar 12, 2024
@systay systay added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving and removed NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Mar 12, 2024
Copy link

codecov bot commented Mar 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 65.76%. Comparing base (d5bd597) to head (4e5f087).
Report is 61 commits behind head on main.

❗ Current head 4e5f087 differs from pull request most recent head cf0e512. Consider uploading reports for the commit cf0e512 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15455      +/-   ##
==========================================
+ Coverage   65.64%   65.76%   +0.11%     
==========================================
  Files        1563     1561       -2     
  Lines      194389   194884     +495     
==========================================
+ Hits       127602   128157     +555     
+ Misses      66787    66727      -60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: wangweicugw <[email protected]>
Copy link
Contributor

vitess-bot bot commented Mar 18, 2024

Hello! 👋

This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked.

You can find the performance comparison on the arewefastyet website.

@harshit-gangal harshit-gangal removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Mar 26, 2024
@systay
Copy link
Collaborator

systay commented Mar 26, 2024

@wangweicugw Nice stuff!

Did you check that we have relevant end-to-end tests for this?

go/vt/vtgate/engine/dml.go Outdated Show resolved Hide resolved
@wangweicugw
Copy link
Contributor Author

@wangweicugw Nice stuff!

Did you check that we have relevant end-to-end tests for this?

I haven't found relevant end-to-end tests for this; I will supplement them in go/test/endtoend/vtgate/queries/dml/dml_test.go.

Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find any test that shows the bind variables are split across different shards for __vals

@harshit-gangal
Copy link
Member

@wangweicugw Nice stuff!
Did you check that we have relevant end-to-end tests for this?

I haven't found relevant end-to-end tests for this; I will supplement them in go/test/endtoend/vtgate/queries/dml/dml_test.go.

you should also validate this in a unit test that shows what bind variables were sent down.

Signed-off-by: wangweicugw <[email protected]>
Signed-off-by: wangweicugw <[email protected]>
@wangweicugw
Copy link
Contributor Author

@wangweicugw Nice stuff!
Did you check that we have relevant end-to-end tests for this?

I haven't found relevant end-to-end tests for this; I will supplement them in go/test/endtoend/vtgate/queries/dml/dml_test.go.

you should also validate this in a unit test that shows what bind variables were sent down.

I have added some relevant test cases here.
go/vt/vtgate/engine/update_test.go
go/vt/vtgate/engine/delete_test.go

@wangweicugw
Copy link
Contributor Author

@systay @harshit-gangal
Will this PR be backported to release-18.0?

@harshit-gangal
Copy link
Member

@systay @harshit-gangal Will this PR be backported to release-18.0?

This is an enhancement PR therefore it does not qualify for backport

@harshit-gangal
Copy link
Member

As this is an optimization PR, I think we should add benchmark showing if this reduces any memory allocations
reference go/test/endtoend/vtgate/queries/benchmark/benchmark_test.go

@wangweicugw
Copy link
Contributor Author

@systay @harshit-gangal Will this PR be backported to release-18.0?

This is an enhancement PR therefore it does not qualify for backport

Alright, we are still using version 18.0.

Signed-off-by: wangweicugw <[email protected]>
@wangweicugw
Copy link
Contributor Author

As this is an optimization PR, I think we should add benchmark showing if this reduces any memory allocations reference go/test/endtoend/vtgate/queries/benchmark/benchmark_test.go

I have attempted to add some benchmark tests, could you help me review them?

@systay systay merged commit add3887 into vitessio:main Apr 1, 2024
100 checks passed
@wangweicugw wangweicugw deleted the dml_in branch April 3, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark me Add label to PR to run benchmarks Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
3 participants