Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) #13637

alamb · 2024-12-04T00:50:29Z

Which issue does this PR close?

Follow on to Add ScalarUDFImpl::invoke_with_args to support passing the return type created for the udf instance #13290
Closes Perf: Allow User defined functions to potentially reuse their argument arrays (to avoid new allocations) #13516

Rationale for this change

Previously, because the functions get the input as &[ColumnarValues], they don't own them (the caller retains a reference) which requires creating new arrays even when the input array is not used anywhere else.

Now that we have changed the signature of invoke to get a struct, it is possible to potentially avoid allocating a new array for each output.

I don't have time myself to apply this pattern to the built in DataFusion functions, but I wanted to make sure it was possible in case it comes up in the future or others want to really optimize expression evaluation

What changes are included in this PR?

Are these changes tested?

Change ScalarFunctionArgs to pass by value
Update documentation
Update tests
Update advanced_udf.rs example to show how to reusing values

Are there any user-facing changes?

Yes, both by CI (that it compiles) as well as by the example (which is also run in CI)

I could pull out the example into its own specific test case too if people thought that was valuable

alamb · 2024-12-04T00:51:51Z

datafusion-examples/examples/advanced_udf.rs

-    /// This is the same way that functions built into DataFusion are invoked,
-    /// which permits important special cases when one or both of the arguments
-    /// are single values (constants). For example `pow(a, 2)`
+    ///1. When one or both of the arguments are single values (constants).


I updated the advanced_udf example to show how to update the arrays in place

FYI @findepi who asked about this I think

alamb · 2024-12-04T00:52:24Z

datafusion-examples/examples/advanced_udf.rs

@@ -191,6 +199,48 @@ impl ScalarUDFImpl for PowUdf {
    }
 }

+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {


Here is the example of how to write a function that reuses the input argument's allocation

alamb · 2024-12-04T00:53:17Z

datafusion/expr/src/udf.rs

-    pub args: &'a [ColumnarValue],
-    // The number of rows in record batch being evaluated
+    /// The evaluated arguments to the function
+    pub args: Vec<ColumnarValue>,


this is an API change, but since this struct has not been released yet, it isn't a user facing change.

alamb · 2024-12-04T00:53:40Z

datafusion/functions/src/datetime/to_local_time.rs

@@ -562,7 +562,7 @@ mod tests {
    fn test_to_local_time_helper(input: ScalarValue, expected: ScalarValue) {
        let res = ToLocalTimeFunc::new()
            .invoke_with_args(ScalarFunctionArgs {
-                args: &[ColumnarValue::Scalar(input)],
+                args: vec![ColumnarValue::Scalar(input)],


Most of the rest of this PR is changes for the new signature

thats interesting, Clippy was suggesting to use arrays for static sized immutable collections instead of vectors

In this case I actually changed ScalarFunctionArgs to get an owned Vec

Perhaps clippy suggested [ColumnarValue::Scalar(input)], instead of &vec![ColumnarValue::Scalar(input)] 🤔

alamb · 2024-12-04T00:55:40Z

datafusion/physical-expr/src/scalar_function.rs

@@ -134,20 +134,20 @@ impl PhysicalExpr for ScalarFunctionExpr {
    }

    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {


I think we could potentially increase the number of places arrays could be reused by changing this particular API to take batch (owned) and threading that more fully through the physical exprs.

Until there is some compelling usecase to do so, however, I don't think we should go mess with the APIs again

comphead · 2024-12-04T18:59:11Z

datafusion-examples/examples/advanced_udf.rs

+        .as_primitive::<Float64Type>()
+        // non-obviously, clone (which increments ref counts, it
+        // doesn't clone the data) to get an typed own array
+        // so we drop the original exp_array (untyped) reference


does it clone the reference?

It clones the Float64Array

Underneath this clones several references (like arrow Buffers). I will try and make this clearer in the comments.

The code is not closing the Arc<ArrayRef>

comphead

lgtm thanks @alamb
wondering should we create a small bench to see the performance gain?

I can take care on that if needed

alamb · 2024-12-05T03:37:39Z

lgtm thanks @alamb wondering should we create a small bench to see the performance gain?

I can take care on that if needed

Thank you very much for the review @comphead

If wanted to see a perfomance gain, I think we need to

have a query where expression evaluation with a function is is a significant portion of the query time
rewrite the funtion in question to use this array reuse strategy

My motivation for changing this API now (without a realistic driving case) was to avoid releasing a version of ScalarFunctionArgs that passed a reference (rather than a Vec)

Then @findepi recommeneded ensuring that an owned API could actually be used to avoid allocating during execution

…udf_invoke

comphead · 2024-12-05T04:20:59Z

I'm keen to test it to get some numbers, apparently reuse already allocated zone is cheaper, but if we have some numbers for same method implemented with 2 approaches would be nice and we can share this knowledge. I'll try to run it soon

alamb · 2024-12-05T04:23:30Z

I'm keen to test it to get some numbers, apparently reuse already allocated zone is cheaper, but if we have some numbers for same method implemented with 2 approaches would be nice and we can share this knowledge. I'll try to run it soon

Thank you 🙏

findepi · 2024-12-05T13:11:26Z

datafusion-examples/examples/advanced_udf.rs

+/// Evaluate `base ^ exp` *without* allocating a new array, if possible
+fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {


so it's "pow maybe in place", not "pow in place"
worth reflecting in the function name (eg with addition of "maybe")? otherwise i'd call this just pow

Yes, I agree -- renamed to maybe_pow_in_place

findepi · 2024-12-05T13:12:06Z

datafusion-examples/examples/advanced_udf.rs

+    // These kernels can only be used if there are no other references to
+    // the arrays (exp_array has to be the last remaining reference).
+    let owned_array = exp_array
+        // as before we downcast to Float64Array


what does "before" refer to here? code above, or previous version of the code?

The code above (aka the previous example) -- I have clarified

findepi · 2024-12-05T13:16:20Z

datafusion-examples/examples/advanced_udf.rs

+        // Once we have the owned Float64Array we can drop the original
+        // exp_array (untyped) reference


This function is called with ArrayRef.
How did we ensure this was the only reference?

I tried to clarify this in the comments -- the call to unary_mut fails if there are other outstanding references.

findepi · 2024-12-05T13:20:13Z

datafusion-examples/examples/advanced_udf.rs

+        Err(_orig_array) => {
+            // unary_mut will return the original array if there are other
+            // references into the underling buffer (and thus reuse is
+            // impossible)
+            //
+            // In a real implementation, this case should fall back to
+            // calling `unary` and allocate a new array; In our example
+            // we will return an error for demonstration purposes
+            exec_err!("Could not reuse array for pow_in_place")


What if my pow udf is called in a query like

select pow(10, f), f from ..

in such case, re-use shouldn't be possible, unless there was some (redundant) data copying before the pow call.

yes, that is correct to the best of my understanding

findepi · 2024-12-05T13:21:36Z

datafusion-examples/examples/advanced_udf.rs

+    // is the last remaining reference
+    drop(exp_array);
+
+    // at this point, exp_array is the only reference in this function. However,


at this point, exp_array doesn't exist anymore.
and we still don't know whether we have only reference, or shared.
we know it only based on return result from compute::unary_mut

Yes, sorry -- I think this meant to say "owned_array" is the only reference. I have tried to clarify the wording. This is a good catch

findepi

LGTM % code comments in datafusion-examples/examples/advanced_udf.rs

findepi · 2024-12-05T13:23:16Z

datafusion-examples/examples/advanced_udf.rs

+    // You can also invoke pow_in_place by passing a constant base and a
+    // column `a` as the exponent . If there is only a single
+    // reference to `a` the code works well
+    ctx.sql("SELECT pow(2, a) FROM t").await?.show().await?;
+
+    // However, if there are multiple references to `a` in the evaluation
+    // the array storage can not be reused
+    let err = ctx
+        .sql("SELECT pow(2, a), pow(3, a) FROM t")
+        .await?
+        .show()
+        .await
+        .unwrap_err();
+    assert_eq!(
+        err.to_string(),
+        "Execution error: Could not reuse array for pow_in_place"
+    );


This are good examples

Went to test if this will work currently, found a bug in existing code :)

> SELECT pow(2, a), pow(3, a) FROM t; Arrow error: Arithmetic overflow: Overflow happened on: 3 ^ 41

... and the example sql does indeed work currently (as expected) so this would be a breaking change if it were to fail now.

cool -- I will respond to these comments later today. Thank you

…udf_invoke

alamb · 2024-12-08T13:28:11Z

Thanks everyone for your comments 🙏

… owned `ColumnReference`) (apache#13637) * Improve documentation * Pass owned args to ScalarFunctionArgs * Update advanced_udf with example of reusing arrays * clarify rationale for cloning * clarify comments * fix expected output

alamb added 3 commits December 3, 2024 19:49

Improve documentation

5bc2a90

Pass owned args to ScalarFunctionArgs

810ff17

Update advanced_udf with example of reusing arrays

bedef3d

github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates functions Changes to functions implementation labels Dec 4, 2024

alamb commented Dec 4, 2024

View reviewed changes

alamb mentioned this pull request Dec 4, 2024

Perf: Allow User defined functions to potentially reuse their argument arrays (to avoid new allocations) #13516

Closed

comphead reviewed Dec 4, 2024

View reviewed changes

comphead approved these changes Dec 4, 2024

View reviewed changes

alamb added 2 commits December 4, 2024 22:40

Merge remote-tracking branch 'apache/main' into alamb/improve_scalar_…

c605a6b

…udf_invoke

clarify rationale for cloning

fa0c3c2

findepi reviewed Dec 5, 2024

View reviewed changes

findepi approved these changes Dec 5, 2024

View reviewed changes

alamb mentioned this pull request Dec 6, 2024

ScalarUDFImpl invoke improvements #13507

Closed

alamb added 3 commits December 7, 2024 14:31

Merge remote-tracking branch 'apache/main' into alamb/improve_scalar_…

6786469

…udf_invoke

clarify comments

b875e19

fix expected output

89e674f

alamb merged commit 3ee9b3d into apache:main Dec 8, 2024
25 checks passed

alamb deleted the alamb/improve_scalar_udf_invoke branch December 8, 2024 13:28

alamb mentioned this pull request Dec 13, 2024

Dec 13, 2024: This week(s) in DataFusion #13760

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) #13637

Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) #13637

alamb commented Dec 4, 2024 •

edited

Loading

alamb Dec 4, 2024

alamb Dec 5, 2024

alamb Dec 4, 2024

alamb Dec 4, 2024

alamb Dec 4, 2024

comphead Dec 4, 2024

alamb Dec 5, 2024 •

edited

Loading

alamb Dec 4, 2024 •

edited

Loading

comphead Dec 4, 2024

alamb Dec 5, 2024

comphead left a comment

alamb commented Dec 5, 2024

comphead commented Dec 5, 2024

alamb commented Dec 5, 2024

findepi Dec 5, 2024

alamb Dec 7, 2024

findepi Dec 5, 2024

alamb Dec 7, 2024

findepi Dec 5, 2024

alamb Dec 7, 2024

findepi Dec 5, 2024

alamb Dec 7, 2024

findepi Dec 5, 2024

alamb Dec 7, 2024

findepi left a comment

findepi Dec 5, 2024

Omega359 Dec 6, 2024

Omega359 Dec 6, 2024

alamb Dec 6, 2024

alamb commented Dec 8, 2024

		@@ -134,20 +134,20 @@ impl PhysicalExpr for ScalarFunctionExpr {
		}

		fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {

		/// Evaluate `base ^ exp` without allocating a new array, if possible
		fn pow_in_place(base: f64, exp_array: ArrayRef) -> Result<ArrayRef> {

		// Once we have the owned Float64Array we can drop the original
		// exp_array (untyped) reference

Performance: enable array allocation reuse (ScalarFunctionArgs gets owned ColumnReference) #13637

Performance: enable array allocation reuse (ScalarFunctionArgs gets owned ColumnReference) #13637

Conversation

alamb commented Dec 4, 2024 • edited Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb Dec 5, 2024 • edited Loading

Choose a reason for hiding this comment

alamb Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

comphead left a comment

Choose a reason for hiding this comment

alamb commented Dec 5, 2024

comphead commented Dec 5, 2024

alamb commented Dec 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findepi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Dec 8, 2024

Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) #13637

Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) #13637

alamb commented Dec 4, 2024 •

edited

Loading

alamb Dec 5, 2024 •

edited

Loading

alamb Dec 4, 2024 •

edited

Loading