Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Type annotations overhaul, part 1 #257

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Feb 27, 2025

Review all type annotations.

  • dtypes were called a mix of DType and Dtype. I picked DType in accordance with numpy.typing.
  • arrays were called a mix of Array, ndarray, or array. I picked Array as it is more idiomatic.
  • Added Namespace annotation
  • Removed many TYPE_CHECKING guards, which were doing nothing but reduce readability (performance impact at runtime is absolutely inconsequential)
  • Add py.typed. Note that this will typically not work for array_api_compat.<namespace>. modules as they are typically pulled in by array_namespace, so for most people it is exclusively about array_api_compat.common._helpers.

Out of scope

  • Add type checkers to CI
  • Make common._typing public. This will be in a later PR.

@crusaderky crusaderky marked this pull request as draft February 27, 2025 13:38
@ev-br
Copy link
Member

ev-br commented Feb 27, 2025

PSA: CI failing left and right is a transient artifact, will get back to reasonable after gh-250 is in and 1.11 is out.

This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12.

@crusaderky
Copy link
Contributor Author

This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12.

Then we should temporarily remove py.typed introduced in #254 CC @NeilGirdhar

@ev-br
Copy link
Member

ev-br commented Feb 27, 2025

Then we should temporarily remove py.typed

Yes. I'll do this in gh-250

@crusaderky crusaderky force-pushed the annotations branch 2 times, most recently from 30ed6ef to 607d9bb Compare February 27, 2025 14:09
@crusaderky
Copy link
Contributor Author

Discussion

The Array API documentation calls the types array, dtype, and device all lowercase.
I'm not a fan for obvious reasons. Should we change them here to be all lowercase, or should we propose changing them to uppercase in array-api?

@crusaderky crusaderky marked this pull request as ready for review February 27, 2025 14:58
@ev-br ev-br added this to the 1.12 milestone Feb 27, 2025
@crusaderky crusaderky force-pushed the annotations branch 3 times, most recently from 756a3fa to 07e34bb Compare February 27, 2025 17:00
@ev-br
Copy link
Member

ev-br commented Feb 28, 2025

The Array API documentation calls the types array, dtype, and device all lowercase. ... Should we change them here to be all lowercase, or should we propose changing them to uppercase in array-api?

Great idea, would be best to coordinate with the spec indeed. Consortium meetings are weekly on Mondays (hint hint).

@ev-br
Copy link
Member

ev-br commented Mar 2, 2025

Also would be best to coordinate with the array-api-typing initiative, cc
@jorenham @nstarman

Copy link

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, typing! 🎉

Apart from the inline comments, there's also some generally applicable opportunities for improvement

  • Using TypeAlias when defining things like DType: TypeAlias = np.dtype[...], so that type-checkers know how to treat it
  • Avoid dynamic __all__ construction, so that static type-checkers know what's exported
  • Since Python 3.9 typing.Tuple is deprecated in favor of using tuple directly
  • Optional and Union are not needed when annotating things if you use __future__.annotations, even if you use Python 3.9.
  • all *args and **kwargs I've seen are missing annotations. If unsure what these should be, then you should use : object for both (Any is not needed in input positions like these).
  • bool | int simplifies to int, int | float simplifies to float, and float | complex simplifies to complex. Shorter annotations helps to avoid type-checker error message spaghetti.


SupportsBufferProtocol = Any
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In NumPy, we've been using this in the stubs:

@runtime_checkable
class _Buffer(Protocol):
    def __buffer__(self, flags: int, /) -> memoryview: ...

if sys.version_info >= (3, 12):
    _SupportsBuffer: TypeAlias = _Buffer
else:
    import array as _array
    import mmap as _mmap

    _SupportsBuffer: TypeAlias = (
        _Buffer | bytes | bytearray | memoryview | _array.array[Any] | _mmap.mmap | NDArray[Any] | generic
    )

https://github.com/numpy/numtype/blob/main/src/numpy-stubs/__init__.pyi#L469-L483

This very same code should also work in a .py.

And in case you're not comfortable with the <3.12 workaround, you could in that case use Any instead, so e.g.

@runtime_checkable
class _Buffer(Protocol):
    def __buffer__(self, flags: int, /) -> memoryview: ...

if sys.version_info >= (3, 12):
    _SupportsBuffer: TypeAlias = _Buffer
else:
    _SupportsBuffer: TypeAlias = _Buffer | Any

Note that _Buffer | Any is not the same as Any:

The union T | Any is not reducible to a simpler form. It represents an unknown static type with lower bound T. That is, it represents an unknown set of objects which may be as large as object, or as small as T, but no smaller.

https://typing.readthedocs.io/en/latest/spec/concepts.html#union-types

And even before 3.12, _Buffer will already work in the stdlib, because typeshed conveniently chose to lie about the existence of __buffer__ in the stdlib stubs (even though it's very type-unsafe).
But you shouldn't expect other stub packages to also lie about this: In the current numpy stubs, for example, annotates it correctly: https://github.com/numpy/numpy/blob/e1e985cd0e2a8d40ec84fc6aed1ee98934baeea8/numpy/__init__.pyi#L2066-L2067


tldr; this type could be meaningfully narrowed using a Protocol implementing __buffer__, that will even help before Python 3.12 in some cases.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and don't forget about the : TypeAlias

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming-wise, this is rather verbose, and in a typing context the term Protocol already has a special meaning, that's from the "protocol" that's used in "buffer protocol".

For protocols with a single (dunder) method, I personally prefer the Can* prefix over the Supports* prefix. Not only because it's much shorter, but also because it's more meaningful.
From the optype docs:

optype.Can{} types describe what can be done with it. For instance, any CanAbs[T] type can be used as argument to the abs() builtin function with return type T. Most Can{} implement a single special method, whose name directly matched that of the type. CanAbs implements __abs__, CanAdd implements __add__, etc.

So I propose to instead name this CanBuffer, as I've done so myself with the equivalent optype.Buffer protocol (docs).

Copy link
Contributor

@NeilGirdhar NeilGirdhar Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol to array-api-typing?

I was going to suggest adding a comment that says that that group of annotations (Array, DType, etc.) are just placeholders until array-api-typing exports the full definitions?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol to array-api-typing?

My point is that there's no need to wait, because it can already be done without much effort

import inspect
from typing import Any, NamedTuple, Optional, Sequence, Tuple, Union
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the __future__.annotations import, you can already use A | B instead of Union[A, B], and T | None instead of Optional[T], even before 3.10.

So on Python 3.9, this is valid:

from __future__ import annotations

def ensure_str(x: str | bytes) -> str:
    return x if isinstance(x, str) else x.decode()

Which can equivalently we written __future__.annotations as

def ensure_str(x: "str | bytes") -> "str":
    return x if isinstance(x, str) else x.decode()

Note that this will only work within annotations. In case of type aliases, you need to manually wrap the type expression as a string:

from __future__ import annotations
from typing import TypeAlias

# This will raise (and doesn't care about the future)
# Stringy: TypeAlias = str | bytes 

# Take the future into your own hands (apply string theory)
Stringy: TypeAlias = "str | bytes"

The TypeAlias is very mandatory here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Tuple type should not be used; tuple itself should be used instead (which is supported as generic type since 3.9).

Comment on lines +43 to +50
x: Array,
/,
xp: Namespace,
*,
dtype: Optional[DType] = None,
device: Optional[Device] = None,
**kwargs,
) -> Array:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When dtype and device are None, then it's certain that the returned array is of the same type x. This can be annotated by adding an additional generic overload:

# additional required imports
from typing import TypeVar, overload

# [...]

# this is the recommended naming scheme for (invariant) 
# type parameters like this one, i.e. as `_{}T`
# TODO: add `bound=Array` once `Array` is defined (never set `bound` to `Any`)
_ArrayT = TypeVar("_ArrayT") 

@overload
def empty_like(
    x: _ArrayT,
    /,
    xp: Namespace,
    *,
    dtype: None = None, 
    device: None = None,
    **kwargs: object,
) -> _ArrayT: ...
@overload
def empty_like(
    x: Array,
    /,
    xp: Namespace,
    *,
    dtype: DType | None = None, 
    device: Device | None = None,
    **kwargs: object,
) -> Array: ...
def empty_like(
    x: Array,
    /,
    xp: Namespace,
    *,
    dtype: DType | None = None, 
    device: DType | None = None,
    **kwargs: object,  # don't forget about these ones
) -> Array:
    # do the thing

This way, xp.empty_like(np.zeros(1, np.int8)) will result in a np.ndarray[tuple[int], np.dtype[np.int8]] type, instead of Any.

_check_device(xp, device)
return xp.full(shape, fill_value, dtype=dtype, **kwargs)

def full_like(
x: ndarray,
x: Array,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment for empty_like also applies here

SupportsBufferProtocol,
],
/,
*,
dtype: Optional[Dtype] = None,
dtype: Optional[DType] = None,
device: Optional[Device] = None,
copy: "Optional[Union[bool, np._CopyMode]]" = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for quotes here; np._CopyMode even exists at runtime.

@ev-br
Copy link
Member

ev-br commented Mar 2, 2025

A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing?
Then, it probably means the typing stubs live somewhere not inline---similar to scipy-stubs?

I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end.

@jorenham
Copy link

jorenham commented Mar 2, 2025

A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing? Then, it probably means the typing stubs live somewhere not inline---similar to scipy-stubs?

I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end.

Stubs target a specific package. What your talking about sounds more similar to optype, which is a collection of re-usable protocols, aliases, and other typing utilities, that's heavy used by scipy-stubs.

It's indeed that role that array-api-typing is intended to fulfill; a collection of types and typing-related utilities to help annotate array-api-related code.

At the moment I'm busy rewriting the numpy stubs, and building an accompanying optype-like (optional) typing package for NumPy. Once that's off the ground, then I plan to redirect my focus towards array-api-typing.

In the meantime, I don't see any need for holding off on writing Protocols yourself, because such a protocol might be provided by array-api-typing in the future. In fact, it will probably help speed up the development of array-api-typing that way, because that would give us a better idea of what works, and what doesn't.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 2, 2025

Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.)

Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!)

@jorenham
Copy link

jorenham commented Mar 2, 2025

Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.)

Array-api-compat is still going to need annotations. But array-api-typing will make it easier to do this in a more accurate way.

Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!)

Haha, it's good to see that I'm not the only one that thinks typing things is exciting.

@ev-br ev-br modified the milestones: 1.11.1, 1.12 Mar 4, 2025
@crusaderky
Copy link
Contributor Author

crusaderky commented Mar 4, 2025

@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up.
There are places where I added deliberately obsolete notation that is consistent with the notation style that is extensively used across the repo, specifically because I didn't want to update the whole repo in a single PR.

I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking.

@crusaderky crusaderky changed the title ENH: Type annotations overhaul ENH: Type annotations overhaul, part 1 Mar 4, 2025
@jorenham
Copy link

jorenham commented Mar 4, 2025

@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up. There are places where I added deliberately obsolete notation that is consistent with the notation style that is extensively used across the repo, specifically because I didn't want to update the whole repo in a single PR.

I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking.

Yea of course; most of my suggestions fall into the category of dotting the "i"'s. Specifically the stuff about overloads, the **kwargs: object and simplifications for float | int and the NestedSequence unions, are perfectly fine to ignore, and perhaps reconsider at a later point.

It's just that when I see some opportunity for improving annotations, I get enthusiastic, which sometimes leads to out-of-scope nitpicking or mostly off-topic rants. But I can see how that could be experienced as overwhelming 😅.

@crusaderky
Copy link
Contributor Author

bool | int simplifies to int, int | float simplifies to float, and float | complex simplifies to complex. Shorter annotations helps to avoid type-checker error message spaghetti.

I can't find any mention that bool | int simplifies to int?
https://typing.readthedocs.io/en/latest/spec/special-types.html#special-cases-for-float-and-complex

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 5, 2025

I can't find any mention that bool | int simplifies to int?

bool is already a subtype of int (which is unfortunate, in my opinion), so bool | int simplifies to int by subsumption.

For the others, here's the link to the spec, and this is a discussion about it.

@crusaderky
Copy link
Contributor Author

I can't find any mention that bool | int simplifies to int?

bool is already a subtype of int (which is unfortunate, in my opinion), so bool | int simplifies by subsumption.

For the others, is this a reference?

That thread, while interesting, makes no mention of bool?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Mar 5, 2025

bool is already a subtype of int in Python, so int | bool is int because int subsumes bool.

@jorenham
Copy link

jorenham commented Mar 5, 2025

bool | int simplifies to int, int | float simplifies to float, and float | complex simplifies to complex. Shorter annotations helps to avoid type-checker error message spaghetti.

I can't find any mention that bool | int simplifies to int? typing.readthedocs.io/en/latest/spec/special-types.html#special-cases-for-float-and-complex

TLDR;

in case of bool <: int, it reflects the runtime behaviour:

>>> issubclass(bool, int)
True

but in case of int <: float and float <: complex, it doesn't:

>>> issubclass(int, float)
False
>>> issubclass(float, complex)
False

@crusaderky
Copy link
Contributor Author

@jorenham I believe I addressed all comments within the scope of this PR, e.g. remarks on what this PR changes.
Did I miss anything?

@jorenham
Copy link

jorenham commented Mar 5, 2025

@jorenham I believe I addressed all comments within the scope of this PR, e.g. remarks on what this PR changes. Did I miss anything?

I believe that about covers it 👌🏻.

Do you want me to work on a follow-up that includes my remaining suggestions, once this is merged?

@crusaderky
Copy link
Contributor Author

Do you want me to work on a follow-up that includes my remaining suggestions, once this is merged?

That would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants