You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Inspired from #10268. I have an idea to improve the current type signature and coercion design
What is the current status
Given the function arguments, we check the arguments with the defined TypeSignature. get_valid_types is the function that calculates the possible valid types based on TypeSignature. After we get all the possible valid types, we find the one of the valid types among all the possible valid types. The core coercion rule is coerced_from. If every type in the valid types is coercible, it is the one we take.
What is the issue of the current approach
Given the signature is not well-supported. We heavily rely on the coercion rule to get the expected types. We end up a complex coerce logic inside coerced_from function. It not only makes it hard to maintain (remove or change might cause the unknown issue to other functions), also contains duplicate (similar) logic to binary::coercion rule that is really confusing.
There are also cases that have coercion rule inside return_type of function which is not the expected place to fight with coercion.
How to fix this
I think it is possible to improve the design of TypeSignature so that we can find the one possible valid types given the current types. The valid types we get are already coercible, so we don't need coerced_from function anymore!
After the change we can eliminate coerced_from function and only the binary::coercion rule is remain.
Additional context
Problematic examples
array_concat has signature variadic any, we have the coercion rule inside return_type. nullif has coercion rule inside return_type
coerced_from has numeric coercion, list coercion, timestamp coercion, and even comparison_binary_numeric_coercion (which will be removed in #10268).
For types,
We have two style, Exact and Coercion. Exact rejects if the type mismatch, Coercion rejects if teh type is not coercible to the target type.
The combination of these are
Uniform(Vec<DataType>) // Exact Number with Exact type
UniformCoercion(Vec<DataType>) // Exact Number with coercion
Variadic(DataType)
VariadicCoercion(DataType)
...
For non-uniform length and more then on data type signature, we could use UserDefined.
The more tricky part is the DataType. We have many functions expect Numeric type that includes integer, float, ....
For function that expects string, there are Utf8, LargeUtf8, Utf8View.
For type checking, it would be nice to have more general Enum that includes more than one DataType to check against with.
enumArgumentType{NumericIntegerFloatList
...
}
Now, we have
Uniform(Vec<FunctionType>) // Exact Number with Exact type
UniformCoercion(Vec<FunctionType>) // Exact Number with coercion
Variadic(FunctionType)
VariadicCoercion(FunctionType)
TypeSignature::Numeric is one of the idea that comes out from it.
For other kinds of complex type check or length check, we fall back to UserDefined
Is your feature request related to a problem or challenge?
Inspired from #10268. I have an idea to improve the current type signature and coercion design
What is the current status
Given the function arguments, we check the arguments with the defined TypeSignature.
get_valid_types
is the function that calculates the possible valid types based on TypeSignature. After we get all the possible valid types, we find the one of the valid types among all the possible valid types. The core coercion rule iscoerced_from
. If every type in the valid types is coercible, it is the one we take.What is the issue of the current approach
Given the signature is not well-supported. We heavily rely on the coercion rule to get the expected types. We end up a complex coerce logic inside
coerced_from
function. It not only makes it hard to maintain (remove or change might cause the unknown issue to other functions), also contains duplicate (similar) logic to binary::coercion rule that is really confusing.There are also cases that have coercion rule inside
return_type
of function which is not the expected place to fight with coercion.How to fix this
I think it is possible to improve the design of TypeSignature so that we can find the one possible valid types given the current types. The valid types we get are already coercible, so we don't need
coerced_from
function anymore!After the change we can eliminate
coerced_from
function and only the binary::coercion rule is remain.Additional context
Problematic examples
array_concat
has signature variadic any, we have the coercion rule insidereturn_type
.nullif
has coercion rule insidereturn_type
coerced_from
has numeric coercion, list coercion, timestamp coercion, and evencomparison_binary_numeric_coercion
(which will be removed in #10268).#10268 is the first step! 🚀
Describe the solution you'd like
coerced_from
function.Describe alternatives you've considered
I assume it is possible to find the only valid types given the argument types. If it is a false statement, we need to find another solution.
Additional context
No response
The text was updated successfully, but these errors were encountered: