-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement trait based API for defining ScalarUDFs #8568
Comments
Ideally, i'd prefer something like this for implementing scalar udfs. pub trait ScalarUdf: Send + Sync {
/// The name for this function
fn name(&self) -> &'static str;
/// Return the signature for this function
fn signature(&self) -> Signature;
/// Return type of the function
fn return_type(&self) -> DataType;
/// udf implementation
fn func(&self, args: &[&ColumnarValue]) -> Result<ColumnarValue, DataFusionError>
} |
I'm also curious. Are there plans to ever allow udfs to support return types based off the input value instead of the type? I know there is currently special handling for |
@universalmind303 -- (Edit: I see you did in fact look at that PR) pub trait ScalarUDFImpl {
/// Returns this object as an [`Any`] trait object
fn as_any(&self) -> &dyn Any;
/// Returns this function's name
fn name(&self) -> &str;
/// Returns the function's [`Signature`] for information about what input
/// types are accepted and the function's Volatility.
fn signature(&self) -> &Signature;
/// What [`DataType`] will be returned by this function, given the types of
/// the arguments
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
/// Invoke the function on `args`, returning the appropriate result
///
/// The function will be invoked passed with the slice of [`ColumnarValue`]
/// (either scalar or array).
///
/// # Zero Argument Functions
/// If the function has zero parameters (e.g. `now()`) it will be passed a
/// single element slice which is a a null array to indicate the batch's row
/// count (so the function can know the resulting array size).
///
/// # Performance
///
/// For the best performance, the implementations of `invoke` should handle
/// the common case when one or more of their arguments are constant values
/// (aka [`ColumnarValue::Scalar`]). Calling [`ColumnarValue::into_array`]
/// and treating all arguments as arrays will work, but will be slower.
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue>;
/// Returns any aliases (alternate names) for this function.
///
/// Aliases can be used to invoke the same function using different names.
/// For example in some databases `now()` and `current_timestamp()` are
/// aliases for the same function. This behavior can be obtained by
/// returning `current_timestamp` as an alias for the `now` function.
///
/// Note: `aliases` should only include names other than [`Self::name`].
/// Defaults to `[]` (no aliases)
fn aliases(&self) -> &[String] {
&[]
}
}
We discussed it in #7657 and I just filed #8624 Once we have a trait based API I believe we will be in a much better position to support this feature |
Is your feature request related to a problem or challenge?
The current way a user implements a ScalarUDF is awkward:
They must wade through several
Arc<dyn<...>
typedefs to figure out how to provide the type signature and implementationThe create_udf is somewhat easier to use, but it still requires Arc's of anonymous functions
Describe the solution you'd like
I am not sure why this API is implemented like it is. If I were a user I would expect to be able to use a trait object
like
Describe alternatives you've considered
No response
Additional context
I want to make ScalarUDFs easy to define for two reasons:
ScalarUDF
[EPIC] Unify Function Interface (removeBuiltInScalarFunction
) #8045 the easer it is to make UDFs the easier it is to complete that taskThe text was updated successfully, but these errors were encountered: