Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement vectorized filters #5915

Merged
merged 1 commit into from
Sep 13, 2023
Merged

Conversation

akuzm
Copy link
Member

@akuzm akuzm commented Aug 1, 2023

Start with supporting "vector ? const" predicates for several arithmetic type pairs.

Disable-check: force-changelog-file

@akuzm akuzm force-pushed the vectorized-filters branch from 182d44b to ca60216 Compare August 1, 2023 09:53
@akuzm akuzm marked this pull request as ready for review August 1, 2023 11:27
@github-actions github-actions bot requested review from erimatnor and gayyappan August 1, 2023 11:28
@github-actions
Copy link

github-actions bot commented Aug 1, 2023

@erimatnor, @gayyappan: please review this pull request.

Powered by pull-review

@svenklemm
Copy link
Member

Since this is all about integer expressions, the numeric naming might be a bit misleading since there is also a numeric datatype for which none of this applies.

@akuzm
Copy link
Member Author

akuzm commented Aug 15, 2023

Since this is all about integer expressions, the numeric naming might be a bit misleading since there is also a numeric datatype for which none of this applies.

Floats as well. What options do we have, "numerical" or maybe "arithmetic"? The postgres numeric probably won't be vectorized because its data structure is not practical (varlena header, another header, two representations, several special values, and it's stored as decimal digits). In other computation engines they use fixed-point, fixed-width decimals stored as binary, which are easily vectorized.

@svenklemm
Copy link
Member

I'd like to see some more tests (even if we dont initially support them)

  • queries without aggregation
  • constraints on columns not selected
  • constraints on multiple columns
  • ORed constraints
  • ANDed constraints

@konskov konskov self-requested a review August 29, 2023 13:24
@jnidzwetzki
Copy link
Contributor

@akuzm Do we have some benchmark data for this PR?

if (OidIsValid(commutator_opno))
{
o->opno = commutator_opno;
o->opfuncid = InvalidOid;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need to be InvalidOid instead of get_opcode(commutator_opno);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not sure about the opfuncid. However, in CommuteOpExpr the opfuncid is also set to InvalidOid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think there's no particular reason, just repeating the CommuteOpExpr. I added a comment about this.

src/guc.c Show resolved Hide resolved
@akuzm
Copy link
Member Author

akuzm commented Aug 30, 2023

@akuzm Do we have some benchmark data for this PR?

I'll rerun it, because there were some changes in main and now the comparison looks weird.

@codecov
Copy link

codecov bot commented Aug 30, 2023

Codecov Report

Merging #5915 (c066d3f) into main (bd9d09e) will increase coverage by 0.02%.
The diff coverage is 87.66%.

@@            Coverage Diff             @@
##             main    #5915      +/-   ##
==========================================
+ Coverage   81.34%   81.37%   +0.02%     
==========================================
  Files         243      246       +3     
  Lines       55971    56193     +222     
  Branches    12395    12457      +62     
==========================================
+ Hits        45532    45726     +194     
  Misses       8095     8095              
- Partials     2344     2372      +28     
Files Changed Coverage Δ
tsl/src/import/ts_explain.c 78.57% <78.57%> (ø)
tsl/src/nodes/decompress_chunk/vector_predicates.c 80.00% <80.00%> (ø)
tsl/src/nodes/decompress_chunk/compressed_batch.c 89.96% <83.67%> (-2.61%) ⬇️
tsl/src/nodes/decompress_chunk/exec.c 94.16% <90.00%> (+0.25%) ⬆️
tsl/src/nodes/decompress_chunk/planner.c 87.46% <94.64%> (+1.74%) ⬆️
...mpress_chunk/pred_vector_const_arithmetic_single.c 96.42% <96.42%> (ø)
src/guc.c 96.66% <100.00%> (+0.03%) ⬆️
tsl/src/nodes/decompress_chunk/batch_array.c 92.98% <100.00%> (+0.12%) ⬆️

... and 41 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@akuzm
Copy link
Member Author

akuzm commented Aug 30, 2023

@akuzm Do we have some benchmark data for this PR?

I'll rerun it, because there were some changes in main and now the comparison looks weird.

https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=2683&var-run2=2684&var-threshold=0.05&var-use_historical_thresholds=false&chunkNotFound

There's 10-70% speedup on some analytical queries. The huge regression seems to be a flaky parallel plan.

@akuzm akuzm mentioned this pull request Aug 31, 2023
5 tasks
@akuzm
Copy link
Member Author

akuzm commented Aug 31, 2023

I'd like to see some more tests (even if we dont initially support them)

  • queries without aggregation
  • constraints on columns not selected
  • constraints on multiple columns
  • ORed constraints
  • ANDed constraints

Added some simple tests for these cases to decompress_vector_qual, although I think they are already mostly present in other files.


#include "pred_vector_const_arithmetic_type_pair.c"

/* int4. functions. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
/* int4. functions. */
/* int4... functions. */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all these semicolons.


OpExpr *o = castNode(OpExpr, qual);

if (list_length(o->args) != 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a test case for this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a case with unary operator.

Copy link
Contributor

@jnidzwetzki jnidzwetzki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few minor comments, but overall it looks good.

@@ -0,0 +1,68 @@
-- This file and its contents are licensed under the Timescale License.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have explain output here currently it is not visible from test output whether vectorized filters are actually used here. also can we have some tests with NULL e.g. col < NULL, col <> NULL, col IS NULL, ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • EXPLAIN would require versioned references, to avoid this I use the guc debug_require_vector_qual to check that the vectorized quals are used or not.
  • NullTest is not vectorized currently, I added cases for is null and is not null.
  • For null values in the column itself, I have tests with the metric4 column.
  • Not sure how to test col < NULL because all the functions we vectorize are strict, and we don't vectorize the initplan parameters yet.

@svenklemm svenklemm added this to the TimescaleDB 2.12 milestone Sep 12, 2023
src/guc.c Outdated
@@ -725,6 +736,22 @@ _guc_init(void)
/* assign_hook= */ NULL,
/* show_hook= */ NULL);

DefineCustomEnumVariable(/* name= */ "timescaledb.debug_require_vector_qual",
/* short_desc= */
"ensure that non-vectorized filters are used in DecompressChunk node",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be vectorized instead of non-vectorized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does both, depending on the setting. I'll make it say vectorized or non-vectorized.

Start with supporting "vector ? const" predicates for several arithmetic
type pairs.
@akuzm akuzm force-pushed the vectorized-filters branch from 6ef942d to c066d3f Compare September 13, 2023 20:04
@akuzm akuzm enabled auto-merge (rebase) September 13, 2023 20:06
@akuzm akuzm merged commit 23b51c9 into timescale:main Sep 13, 2023
34 checks passed
@akuzm akuzm deleted the vectorized-filters branch September 13, 2023 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants