Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infer union types from multiple assigments #18568

Open
JukkaL opened this issue Jan 29, 2025 · 4 comments
Open

Infer union types from multiple assigments #18568

JukkaL opened this issue Jan 29, 2025 · 4 comments

Comments

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 29, 2025

These long-standing issues can be solved at least partially by making type inference more flexible:

This would be an alternative to the current --allow-redefinition flag which uses renaming to allow somewhat flexible redefinitions. This would also be an alternative to the more general renaming pass proposed in #18516.

The idea is simply to allow each assignment to refine the type of a variable in the scope where it's being inferred. Example:

def f() -> None:
     if cond():
        x = 0 # Infer initial "int" type for "x"
        reveal_type(x) # int
    else:
        x = "" # Refine inferred type to "int | str"
        reveal_type(x) # str (narrowed)
    reveal_type(x) # int | str

I've done some prototyping and the implementation seems fairly simple, but there could be some edge cases that are tricky. The prototype also seems to be reasonably compatible with existing behavior, though there will likely be some difference. I'm not sure yet if we'd have to wait for 2.0 to enable this by default.

This has some benefits over a renaming based solution (#18516):

It has also some drawbacks:

This would only be enabled for simple variables (x = ...), not for attributes (self.x = ...), at least initially.

It's not clear if we can support general redefinitions which involve generic types. Initially code like this would still generate errors:

x = [1]
f(x)
x = ["a"]  # Error: int item expected, not str (due to type context)
f(x)

The renaming based approach would support the above example without issues. However, the approach proposed here could be used together with --allow-redefinition to support the above use case (with limitations, since --allow-redefinition isn't very general).

Thanks to @ilevkivskyi for #18538 which will make this approach feasible, and for suggesting that we don't need to support redefinitions involving partial types initially.

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 29, 2025

The generics use case shown above seems important for NumPy, in particular. We could first try inferring rvalue with empty type context, and if it doesn't produce a useful type, fall back to using the current variable type as type context. So this would work correctly:

x = [1]
reveal_type(x) # list[int]
x = ["a"]
reveal_type(x) # list[str]

We'd still use the type context here, unless we add some special support for partial types:

x = [1]
reveal_type(x) # list[int]
x = [] # Fall back to type context for item type
reveal_type(x) # list[int]

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 30, 2025

I implemented the idea about using empty type context first and it seems good so far. My prototype supports a good subset of expected use cases, and I'm currently getting only 4 new errors in self check -- because of more precise inferred types. I still need to figure out how to support loops properly and deferrals, but overall I'm pretty optimistic about the approach.

@ilevkivskyi
Copy link
Member

@JukkaL

The generics use case shown above seems important for NumPy, in particular.

Just curious, did you ask them about this, or you found this in their code? I doubt they use regular lists, or do you mean they have a similar problem with ndarray?

Btw note that we already have some special-casing w.r.t. to empty context for optional types, see here https://github.com/python/mypy/blob/master/mypy/checker.py#L4471-L4494

@JukkaL
Copy link
Collaborator Author

JukkaL commented Jan 30, 2025

I doubt they use regular lists, or do you mean they have a similar problem with ndarray?

The same issue is with any call that returns a generic type with a type variable, including ndarray[...], as the lvalue context can cause an invalid type to be inferred that doesn't match the argument types. List isn't special, it's just the most common type that has this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants