Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalengine: Add support for enum and set #15783

Merged

Conversation

dbussink
Copy link
Contributor

The evalengine currently doesn't handle enum and set types properly. The comparison function always returns 0 at the moment and we don't consider ordering of elements etc.

Here we add two new native types to the evalengine for set and enum and instantiate those appropriately. We also ensure we can compare them correctly.

In case we don't have the schema information with the values, we do a best effort case of depending on the string representation. This is not correct always of course, but at least makes equality comparison work for those cases and only ordering is off in that scenario.

Related Issue(s)

Fixes #15782

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

@dbussink dbussink added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Evalengine changes to the evaluation engine labels Apr 23, 2024
Copy link
Contributor

vitess-bot bot commented Apr 23, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Apr 23, 2024
@dbussink dbussink removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Apr 23, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Apr 23, 2024
The evalengine currently doesn't handle enum and set types properly.
The comparison function always returns 0 at the moment and we don't
consider ordering of elements etc.

Here we add two new native types to the evalengine for set and enum and
instantiate those appropriately. We also ensure we can compare them
correctly.

In case we don't have the schema information with the values, we do a
best effort case of depending on the string representation. This is not
correct always of course, but at least makes equality comparison work
for those cases and only ordering is off in that scenario.

Signed-off-by: Dirkjan Bussink <[email protected]>
@dbussink dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from 4cb5f7d to b0521ad Compare April 23, 2024 09:27
Copy link

codecov bot commented Apr 23, 2024

Codecov Report

Attention: Patch coverage is 71.28713% with 87 lines in your changes are missing coverage. Please review.

Project coverage is 68.43%. Comparing base (5e2a873) to head (89ffe52).
Report is 4 commits behind head on main.

Files Patch % Lines
go/vt/vtgate/evalengine/compiler_asm_push.go 0.00% 22 Missing ⚠️
go/vt/vtgate/evalengine/arena.go 9.09% 20 Missing ⚠️
go/vt/vtgate/evalengine/eval_numeric.go 0.00% 16 Missing ⚠️
go/vt/vtgate/evalengine/compiler.go 71.42% 8 Missing ⚠️
go/vt/vtgate/evalengine/expr_column.go 25.00% 6 Missing ⚠️
go/vt/vtgate/evalengine/eval.go 78.94% 4 Missing ⚠️
go/vt/vtgate/evalengine/api_hash.go 70.00% 3 Missing ⚠️
go/vt/vtgate/evalengine/translate.go 0.00% 3 Missing ⚠️
go/vt/wrangler/vdiff.go 83.33% 2 Missing ⚠️
go/vt/vtgate/engine/distinct.go 50.00% 1 Missing ⚠️
... and 2 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15783      +/-   ##
==========================================
+ Coverage   68.42%   68.43%   +0.01%     
==========================================
  Files        1556     1558       +2     
  Lines      195494   195822     +328     
==========================================
+ Hits       133758   134017     +259     
- Misses      61736    61805      +69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice addition! I have one big piece of feedback, which is about the Values []string type that has been wired up throughout the codebase. It's heavy! It's super heavy with 24-bytes + backing storage + all the string backing storage.

Clearly, neither ENUM or SET is a commonly used feature (as evidenced by the fact that we've gotten away without implementing support for it so far). Hence, I think this warrants spending some time optimizing the storage of these values so they don't weight down the rest of the evalengine whenever they're not used (which will be 99% of the time). This is particularly important for the evalengine's internal and external Type, which are currently pretty well packed (yey!) and they would almost triple in size after adding the []string values.

I'm not 100% sure of what's the ideal way to optimize this. This is how I would try though: start with a type EnumSetValues string and store values *EnumSetValues everywhere where they must be kept. The full set of values is serialized into a single immutable string, and the EnumSetValues would have helpers to iterate each value individually / get their index/etc without having to understand how they're serialized.

This (or something close to this) feels like the cheapest way to pass along the set of values without requiring the 24 bytes of the slice. Give this a go, or otherwise hit me up and we can pair on it. 😉

@dbussink dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from 705487d to b0ba7f0 Compare April 23, 2024 14:47
Copy link
Collaborator

@vmg vmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing in Slack, the sweet middle ground is probably *[]string to hide the slice behind a pointer without having to bother with a packed serialization format. LGTM!

@dbussink dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from b0ba7f0 to f59e322 Compare April 23, 2024 15:17
@dbussink dbussink force-pushed the dbussink/add-evalengine-enum-set-support branch from f59e322 to 595ec8f Compare April 23, 2024 15:23
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll merge this into #15723 tomorrow.

One tangentially related thing that I also realized in working on that is that for ENUMs we can get the length of the column when building vevents in unit tests this way:

int64(len(col.Type.EnumValues) + 1)

There is no equivalent for Set in the parser.

go/sqltypes/value.go Outdated Show resolved Hide resolved
@@ -303,7 +303,7 @@ func (tp *TablePlan) isOutsidePKRange(bindvars map[string]*querypb.BindVariable,

rowVal, _ := sqltypes.BindVariableToValue(bindvar)
// TODO(king-11) make collation aware
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this TODO now if you want 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one still passes in Unknown for the collation, so I don't think we can remove it (yet)?

Signed-off-by: Dirkjan Bussink <[email protected]>
@dbussink
Copy link
Contributor Author

One tangentially related thing that I also realized in working on that is that for ENUMs we can get the length of the column when building vevents in unit tests this way:

int64(len(col.Type.EnumValues) + 1)

There is no equivalent for Set in the parser.

@mattlord A set should always be backed under the hood storage wise by a single uint64. A set is then stored as a bitmap. So you can always make that assumption then.

@dbussink dbussink merged commit 4c2df48 into vitessio:main Apr 24, 2024
104 checks passed
@dbussink dbussink deleted the dbussink/add-evalengine-enum-set-support branch April 24, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Evalengine changes to the evaluation engine Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Evalengine doesn't support enum and set properly
3 participants