Infer union types from multiple assigments #18568

JukkaL · 2025-01-29T14:00:48Z

These long-standing issues can be solved at least partially by making type inference more flexible:

This would be an alternative to the current --allow-redefinition flag which uses renaming to allow somewhat flexible redefinitions. This would also be an alternative to the more general renaming pass proposed in #18516.

The idea is simply to allow each assignment to refine the type of a variable in the scope where it's being inferred. Example:

def f() -> None:
     if cond():
        x = 0 # Infer initial "int" type for "x"
        reveal_type(x) # int
    else:
        x = "" # Refine inferred type to "int | str"
        reveal_type(x) # str (narrowed)
    reveal_type(x) # int | str

I've done some prototyping and the implementation seems fairly simple, but there could be some edge cases that are tricky. The prototype also seems to be reasonably compatible with existing behavior, though there will likely be some difference. I'm not sure yet if we'd have to wait for 2.0 to enable this by default.

This has some benefits over a renaming based solution (#18516):

The implementation seems much simpler (but it needs Use union types instead of join in binder #18538 to be merged first)
We can make it mostly backward compatible
Performance impact should be minimal

It has also some drawbacks:

It may be less flexible than Make each assignment to define a distinct variable with independent type #18516.
It doesn't help mypyc. We'd need a separate renaming/SSA pass in mypyc for performance improvements.
Interactions with deferrals / forward references could be tricky.

This would only be enabled for simple variables (x = ...), not for attributes (self.x = ...), at least initially.

It's not clear if we can support general redefinitions which involve generic types. Initially code like this would still generate errors:

x = [1]
f(x)
x = ["a"]  # Error: int item expected, not str (due to type context)
f(x)

The renaming based approach would support the above example without issues. However, the approach proposed here could be used together with --allow-redefinition to support the above use case (with limitations, since --allow-redefinition isn't very general).

Thanks to @ilevkivskyi for #18538 which will make this approach feasible, and for suggesting that we don't need to support redefinitions involving partial types initially.

The text was updated successfully, but these errors were encountered:

JukkaL · 2025-01-29T14:25:21Z

The generics use case shown above seems important for NumPy, in particular. We could first try inferring rvalue with empty type context, and if it doesn't produce a useful type, fall back to using the current variable type as type context. So this would work correctly:

x = [1]
reveal_type(x) # list[int]
x = ["a"]
reveal_type(x) # list[str]

We'd still use the type context here, unless we add some special support for partial types:

x = [1]
reveal_type(x) # list[int]
x = [] # Fall back to type context for item type
reveal_type(x) # list[int]

JukkaL · 2025-01-30T12:25:59Z

I implemented the idea about using empty type context first and it seems good so far. My prototype supports a good subset of expected use cases, and I'm currently getting only 4 new errors in self check -- because of more precise inferred types. I still need to figure out how to support loops properly and deferrals, but overall I'm pretty optimistic about the approach.

ilevkivskyi · 2025-01-30T13:12:03Z

@JukkaL

The generics use case shown above seems important for NumPy, in particular.

Just curious, did you ask them about this, or you found this in their code? I doubt they use regular lists, or do you mean they have a similar problem with ndarray?

Btw note that we already have some special-casing w.r.t. to empty context for optional types, see here https://github.com/python/mypy/blob/master/mypy/checker.py#L4471-L4494

JukkaL · 2025-01-30T14:54:30Z

I doubt they use regular lists, or do you mean they have a similar problem with ndarray?

The same issue is with any call that returns a generic type with a type variable, including ndarray[...], as the lvalue context can cause an invalid type to be inferred that doesn't match the argument types. List isn't special, it's just the most common type that has this issue.

JukkaL added feature priority-0-high labels Jan 29, 2025

JukkaL mentioned this issue Jan 29, 2025

Make each assignment to define a distinct variable with independent type #18516

Open

JukkaL mentioned this issue Jan 30, 2025

Allow variable redefinition if lifetimes don't overlap #6232

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer union types from multiple assigments #18568

Infer union types from multiple assigments #18568

JukkaL commented Jan 29, 2025 •

edited

Loading

JukkaL commented Jan 29, 2025

JukkaL commented Jan 30, 2025

ilevkivskyi commented Jan 30, 2025

JukkaL commented Jan 30, 2025

Infer union types from multiple assigments #18568

Infer union types from multiple assigments #18568

Comments

JukkaL commented Jan 29, 2025 • edited Loading

JukkaL commented Jan 29, 2025

JukkaL commented Jan 30, 2025

ilevkivskyi commented Jan 30, 2025

JukkaL commented Jan 30, 2025

JukkaL commented Jan 29, 2025 •

edited

Loading