-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Type annotations overhaul, part 1 #257
base: main
Are you sure you want to change the base?
Conversation
a7a848e
to
807ce41
Compare
PSA: CI failing left and right is a transient artifact, will get back to reasonable after gh-250 is in and 1.11 is out. This is too large to sneak into 1.11 unless we want to delay it (I don't). So this PR is the first large feature of 1.12. |
Then we should temporarily remove |
Yes. I'll do this in gh-250 |
30ed6ef
to
607d9bb
Compare
DiscussionThe Array API documentation calls the types |
756a3fa
to
07e34bb
Compare
Great idea, would be best to coordinate with the spec indeed. Consortium meetings are weekly on Mondays (hint hint). |
Also would be best to coordinate with the array-api-typing initiative, cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay, typing! 🎉
Apart from the inline comments, there's also some generally applicable opportunities for improvement
- Using
TypeAlias
when defining things likeDType: TypeAlias = np.dtype[...]
, so that type-checkers know how to treat it - Avoid dynamic
__all__
construction, so that static type-checkers know what's exported - Since Python 3.9
typing.Tuple
is deprecated in favor of usingtuple
directly Optional
andUnion
are not needed when annotating things if you use__future__.annotations
, even if you use Python 3.9.- all
*args
and**kwargs
I've seen are missing annotations. If unsure what these should be, then you should use: object
for both (Any
is not needed in input positions like these). bool | int
simplifies toint
,int | float
simplifies tofloat
, andfloat | complex
simplifies tocomplex
. Shorter annotations helps to avoid type-checker error message spaghetti.
|
||
SupportsBufferProtocol = Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In NumPy, we've been using this in the stubs:
@runtime_checkable
class _Buffer(Protocol):
def __buffer__(self, flags: int, /) -> memoryview: ...
if sys.version_info >= (3, 12):
_SupportsBuffer: TypeAlias = _Buffer
else:
import array as _array
import mmap as _mmap
_SupportsBuffer: TypeAlias = (
_Buffer | bytes | bytearray | memoryview | _array.array[Any] | _mmap.mmap | NDArray[Any] | generic
)
https://github.com/numpy/numtype/blob/main/src/numpy-stubs/__init__.pyi#L469-L483
This very same code should also work in a .py
.
And in case you're not comfortable with the <3.12 workaround, you could in that case use Any
instead, so e.g.
@runtime_checkable
class _Buffer(Protocol):
def __buffer__(self, flags: int, /) -> memoryview: ...
if sys.version_info >= (3, 12):
_SupportsBuffer: TypeAlias = _Buffer
else:
_SupportsBuffer: TypeAlias = _Buffer | Any
Note that _Buffer | Any
is not the same as Any
:
The union
T | Any
is not reducible to a simpler form. It represents an unknown static type with lower boundT
. That is, it represents an unknown set of objects which may be as large as object, or as small asT
, but no smaller.
https://typing.readthedocs.io/en/latest/spec/concepts.html#union-types
And even before 3.12, _Buffer
will already work in the stdlib, because typeshed conveniently chose to lie about the existence of __buffer__
in the stdlib stubs (even though it's very type-unsafe).
But you shouldn't expect other stub packages to also lie about this: In the current numpy
stubs, for example, annotates it correctly: https://github.com/numpy/numpy/blob/e1e985cd0e2a8d40ec84fc6aed1ee98934baeea8/numpy/__init__.pyi#L2066-L2067
tldr; this type could be meaningfully narrowed using a Protocol
implementing __buffer__
, that will even help before Python 3.12 in some cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and don't forget about the : TypeAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming-wise, this is rather verbose, and in a typing context the term Protocol
already has a special meaning, that's from the "protocol" that's used in "buffer protocol".
For protocols with a single (dunder) method, I personally prefer the Can*
prefix over the Supports*
prefix. Not only because it's much shorter, but also because it's more meaningful.
From the optype docs:
optype.Can{}
types describe what can be done with it. For instance, anyCanAbs[T]
type can be used as argument to theabs()
builtin function with return typeT
. MostCan{}
implement a single special method, whose name directly matched that of the type.CanAbs
implements__abs__
,CanAdd
implements__add__
, etc.
So I propose to instead name this CanBuffer
, as I've done so myself with the equivalent optype.Buffer
protocol (docs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a placeholder? Aren't they ultimately going to point SupportsBufferProtocol
to array-api-typing
?
I was going to suggest adding a comment that says that that group of annotations (Array
, DType
, etc.) are just placeholders until array-api-typing
exports the full definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a placeholder? Aren't they ultimately going to point
SupportsBufferProtocol
toarray-api-typing
?
My point is that there's no need to wait, because it can already be done without much effort
array_api_compat/common/_aliases.py
Outdated
import inspect | ||
from typing import Any, NamedTuple, Optional, Sequence, Tuple, Union |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the __future__.annotations
import, you can already use A | B
instead of Union[A, B]
, and T | None
instead of Optional[T]
, even before 3.10.
So on Python 3.9, this is valid:
from __future__ import annotations
def ensure_str(x: str | bytes) -> str:
return x if isinstance(x, str) else x.decode()
Which can equivalently we written __future__.annotations
as
def ensure_str(x: "str | bytes") -> "str":
return x if isinstance(x, str) else x.decode()
Note that this will only work within annotations. In case of type aliases, you need to manually wrap the type expression as a string:
from __future__ import annotations
from typing import TypeAlias
# This will raise (and doesn't care about the future)
# Stringy: TypeAlias = str | bytes
# Take the future into your own hands (apply string theory)
Stringy: TypeAlias = "str | bytes"
The TypeAlias
is very mandatory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Tuple
type should not be used; tuple
itself should be used instead (which is supported as generic type since 3.9).
x: Array, | ||
/, | ||
xp: Namespace, | ||
*, | ||
dtype: Optional[DType] = None, | ||
device: Optional[Device] = None, | ||
**kwargs, | ||
) -> Array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When dtype
and device
are None
, then it's certain that the returned array is of the same type x
. This can be annotated by adding an additional generic overload:
# additional required imports
from typing import TypeVar, overload
# [...]
# this is the recommended naming scheme for (invariant)
# type parameters like this one, i.e. as `_{}T`
# TODO: add `bound=Array` once `Array` is defined (never set `bound` to `Any`)
_ArrayT = TypeVar("_ArrayT")
@overload
def empty_like(
x: _ArrayT,
/,
xp: Namespace,
*,
dtype: None = None,
device: None = None,
**kwargs: object,
) -> _ArrayT: ...
@overload
def empty_like(
x: Array,
/,
xp: Namespace,
*,
dtype: DType | None = None,
device: Device | None = None,
**kwargs: object,
) -> Array: ...
def empty_like(
x: Array,
/,
xp: Namespace,
*,
dtype: DType | None = None,
device: DType | None = None,
**kwargs: object, # don't forget about these ones
) -> Array:
# do the thing
This way, xp.empty_like(np.zeros(1, np.int8))
will result in a np.ndarray[tuple[int], np.dtype[np.int8]]
type, instead of Any
.
_check_device(xp, device) | ||
return xp.full(shape, fill_value, dtype=dtype, **kwargs) | ||
|
||
def full_like( | ||
x: ndarray, | ||
x: Array, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment for empty_like
also applies here
SupportsBufferProtocol, | ||
], | ||
/, | ||
*, | ||
dtype: Optional[Dtype] = None, | ||
dtype: Optional[DType] = None, | ||
device: Optional[Device] = None, | ||
copy: "Optional[Union[bool, np._CopyMode]]" = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for quotes here; np._CopyMode
even exists at runtime.
A naive question. Ideally, we'd use the same annotations for array api stubs, array-api-compat wrappers and array-api-strict objects. Can we actually have this kind of reuse? If yes in principle, what are the blockers and what's missing? I suppose it doesn't matter all that much whether it matures here or in array-api-typing or somewhere else, as long as it does mature and gets reused. Wherever is more convenient, as long as we have some sort of a rough plan for the grand unification in the end. |
Stubs target a specific package. What your talking about sounds more similar to optype, which is a collection of re-usable protocols, aliases, and other typing utilities, that's heavy used by It's indeed that role that At the moment I'm busy rewriting the numpy stubs, and building an accompanying In the meantime, I don't see any need for holding off on writing |
Awesome, so ultimately most of these typing definitions will be lifted into array-api-typing, and this repo will point to it and depend on it? (And therefore, most of this PR will be deleted from this repo in the long run.) Sounds like a great plan to me. (Sorry if I'm so excited about this. Can't wait to have my Array API code be type checked!) |
Array-api-compat is still going to need annotations. But array-api-typing will make it easier to do this in a more accurate way.
Haha, it's good to see that I'm not the only one that thinks typing things is exciting. |
@jorenham, I really do appreciate your extensive review, but it goes much beyond the intended scope of my PR. Most of your comments look like great material for a follow-up. I must confess that I'm struggling to pick out which of your comments request changes within the original scope of the PR, e.g. where you believe that what the PR changes needs tweaking. |
Yea of course; most of my suggestions fall into the category of dotting the "i"'s. Specifically the stuff about overloads, the It's just that when I see some opportunity for improving annotations, I get enthusiastic, which sometimes leads to out-of-scope nitpicking or mostly off-topic rants. But I can see how that could be experienced as overwhelming 😅. |
I can't find any mention that bool | int simplifies to int? |
For the others, here's the link to the spec, and this is a discussion about it. |
That thread, while interesting, makes no mention of bool? |
|
TLDR; in case of >>> issubclass(bool, int)
True but in case of >>> issubclass(int, float)
False
>>> issubclass(float, complex)
False |
@jorenham I believe I addressed all comments within the scope of this PR, e.g. remarks on what this PR changes. |
I believe that about covers it 👌🏻. Do you want me to work on a follow-up that includes my remaining suggestions, once this is merged? |
That would be great! |
Review all type annotations.
DType
andDtype
. I pickedDType
in accordance withnumpy.typing
.Array
,ndarray
, orarray
. I pickedArray
as it is more idiomatic.TYPE_CHECKING
guards, which were doing nothing but reduce readability (performance impact at runtime is absolutely inconsequential)py.typed
. Note that this will typically not work forarray_api_compat.<namespace>.
modules as they are typically pulled in byarray_namespace
, so for most people it is exclusively aboutarray_api_compat.common._helpers
.Out of scope
common._typing
public. This will be in a later PR.