Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strengthen TypeSignature and Coercion rule. #10507

Open
jayzhan211 opened this issue May 14, 2024 · 1 comment
Open

Strengthen TypeSignature and Coercion rule. #10507

jayzhan211 opened this issue May 14, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented May 14, 2024

Is your feature request related to a problem or challenge?

Inspired from #10268. I have an idea to improve the current type signature and coercion design

What is the current status

Given the function arguments, we check the arguments with the defined TypeSignature. get_valid_types is the function that calculates the possible valid types based on TypeSignature. After we get all the possible valid types, we find the one of the valid types among all the possible valid types. The core coercion rule is coerced_from. If every type in the valid types is coercible, it is the one we take.

What is the issue of the current approach

Given the signature is not well-supported. We heavily rely on the coercion rule to get the expected types. We end up a complex coerce logic inside coerced_from function. It not only makes it hard to maintain (remove or change might cause the unknown issue to other functions), also contains duplicate (similar) logic to binary::coercion rule that is really confusing.
There are also cases that have coercion rule inside return_type of function which is not the expected place to fight with coercion.

How to fix this

I think it is possible to improve the design of TypeSignature so that we can find the one possible valid types given the current types. The valid types we get are already coercible, so we don't need coerced_from function anymore!

After the change we can eliminate coerced_from function and only the binary::coercion rule is remain.

Additional context

Problematic examples

array_concat has signature variadic any, we have the coercion rule inside return_type.
nullif has coercion rule inside return_type

coerced_from has numeric coercion, list coercion, timestamp coercion, and even comparison_binary_numeric_coercion (which will be removed in #10268).

#10268 is the first step! 🚀

Describe the solution you'd like

  1. Support / Improve TypeSignature so we can get the only possible valid types given the arguments types we have.
  2. Remove coerced_from function.

Describe alternatives you've considered

I assume it is possible to find the only valid types given the argument types. If it is a false statement, we need to find another solution.

Additional context

No response

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 8, 2024

The current ideal state in my mind

Signature does 3 things

  1. Length check
  2. Type check
  3. Coercion

For length,
the common length check are

  1. Exact number
  2. Variadic (Any number)
  3. VariadicNonZero (Any number but at least one)
  4. VariadicEven (Less common, i.e. Map)

For types,
We have two style, Exact and Coercion. Exact rejects if the type mismatch, Coercion rejects if teh type is not coercible to the target type.

The combination of these are

  • Uniform(Vec<DataType>) // Exact Number with Exact type
  • UniformCoercion(Vec<DataType>) // Exact Number with coercion
  • Variadic(DataType)
  • VariadicCoercion(DataType)
    ...

For non-uniform length and more then on data type signature, we could use UserDefined.

The more tricky part is the DataType. We have many functions expect Numeric type that includes integer, float, ....
For function that expects string, there are Utf8, LargeUtf8, Utf8View.

For type checking, it would be nice to have more general Enum that includes more than one DataType to check against with.

enum ArgumentType {
 Numeric
 Integer
 Float
 List
 ...
}

Now, we have

  • Uniform(Vec<FunctionType>) // Exact Number with Exact type
  • UniformCoercion(Vec<FunctionType>) // Exact Number with coercion
  • Variadic(FunctionType)
  • VariadicCoercion(FunctionType)

TypeSignature::Numeric is one of the idea that comes out from it.
For other kinds of complex type check or length check, we fall back to UserDefined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant