Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make mypy more strict for prototype datasets #4513

Merged
merged 18 commits into from
Oct 21, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ files = torchvision
show_error_codes = True
pretty = True

[mypy-torchvision.prototype.*]

; untyped definitions and calls
disallow_untyped_defs = True

; None and Optional handling
no_implicit_optional = True

; warnings
warn_unused_ignores = True
warn_return_any = True
warn_unreachable = True

; miscellaneous strictness flags
allow_redefinition = True
Comment on lines +11 to +23
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the default values for those options?
If they're not the default, do we have a strong reason to use them instead of the defaults? Is this going to be clearly beneficial to the code-base and to us as developers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the default values for those options?

Nope.

If they're not the default, do we have a strong reason to use them instead of the defaults? Is this going to be clearly beneficial to the code-base and to us as developers?

Let's go through them one by one:

  • disallow_untyped_defs: by default mypy simply accepts untyped functions and uses Any for the input and output annotations. If our ultimate goal is to declare torchvision typed, we should make sure that we don't miss some functions. This flag enforces that.

  • no_implicit_optional: By default mypy allows this:

    def foo(bar: int = None) -> int:
        pass

    With this option enabled, it has to be

    def foo(bar: Optional[int] = None) -> int:
        pass

    Given that None is a valid input, we should also explicitly mention it in the annotation.

  • warn_unused_ignores: Sometimes we use # type: ignore directives on stuff that is actually wrong in other libraries. For example fix annotation for Demultiplexer pytorch#65998 will make some ignore directives obsolete that are needed now. Without this flag, we would never know.

  • warn_return_any: If a function does something with dynamic types, mypy usually falls back to treating the output as Any. This will warn us if something like this happened, but we specified a more concrete output type.

  • warn_unreachable: This is more a test functionality, as mypy will now warn us if some code is unreachable. For example, with this flag set, mypy will warn that the if branch is unreachable.

    def foo(bar: str) -> str:
        if isinstance(bar, int):
            bar = str(bar)
        return bar
  • allow_redefinition: See Set allow_redefinition = True for mypy #4531. If we have this globally, we can of course remove it here.

Apart from warn_return_any and warn_unreachable I think these flags are clearly beneficial. For the other two, they were beneficial for me in the past, but I can others object to them.


[mypy-torchvision.io._video_opt.*]

ignore_errors = True
Expand Down
13 changes: 8 additions & 5 deletions torchvision/prototype/datasets/_builtin/caltech.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,10 @@ def _anns_key_fn(self, data: Tuple[str, Any]) -> Tuple[str, str]:
return category, id

def _collate_and_decode_sample(
self, data, *, decoder: Optional[Callable[[io.IOBase], torch.Tensor]]
self,
data: Tuple[Tuple[str, str], Tuple[str, io.IOBase], Tuple[str, io.IOBase]],
*,
decoder: Optional[Callable[[io.IOBase], torch.Tensor]],
) -> Dict[str, Any]:
key, image_data, ann_data = data
category, _ = key
Expand Down Expand Up @@ -117,11 +120,11 @@ def _make_datapipe(
images_dp, anns_dp = resource_dps

images_dp = TarArchiveReader(images_dp)
images_dp = Filter(images_dp, self._is_not_background_image)
images_dp: IterDataPipe = Filter(images_dp, self._is_not_background_image)
images_dp = Shuffler(images_dp, buffer_size=INFINITE_BUFFER_SIZE)

anns_dp = TarArchiveReader(anns_dp)
anns_dp = Filter(anns_dp, self._is_ann)
anns_dp: IterDataPipe = Filter(anns_dp, self._is_ann)

dp = KeyZipper(
images_dp,
Expand All @@ -136,7 +139,7 @@ def _make_datapipe(
def generate_categories_file(self, root: Union[str, pathlib.Path]) -> None:
dp = self.resources(self.default_config)[0].to_datapipe(pathlib.Path(root) / self.name)
dp = TarArchiveReader(dp)
dp = Filter(dp, self._is_not_background_image)
dp: IterDataPipe = Filter(dp, self._is_not_background_image)
dir_names = {pathlib.Path(path).parent.name for path, _ in dp}
create_categories_file(HERE, self.name, sorted(dir_names))

Expand Down Expand Up @@ -185,7 +188,7 @@ def _make_datapipe(
) -> IterDataPipe[Dict[str, Any]]:
dp = resource_dps[0]
dp = TarArchiveReader(dp)
dp = Filter(dp, self._is_not_rogue_file)
dp: IterDataPipe = Filter(dp, self._is_not_rogue_file)
dp = Shuffler(dp, buffer_size=INFINITE_BUFFER_SIZE)
return Mapper(dp, self._collate_and_decode_sample, fn_kwargs=dict(decoder=decoder))

Expand Down
6 changes: 3 additions & 3 deletions torchvision/prototype/datasets/_folder.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def _collate_and_decode_data(
*,
root: pathlib.Path,
categories: List[str],
decoder,
decoder: Optional[Callable[[io.IOBase], torch.Tensor]],
) -> Dict[str, Any]:
path, buffer = data
data = decoder(buffer) if decoder else buffer
Expand All @@ -49,8 +49,8 @@ def from_data_folder(
root = pathlib.Path(root).expanduser().resolve()
categories = sorted(entry.name for entry in os.scandir(root) if entry.is_dir())
masks: Union[List[str], str] = [f"*.{ext}" for ext in valid_extensions] if valid_extensions is not None else ""
dp: IterDataPipe = FileLister(str(root), recursive=recursive, masks=masks)
dp = Filter(dp, _is_not_top_level_file, fn_kwargs=dict(root=root))
dp = FileLister(str(root), recursive=recursive, masks=masks)
dp: IterDataPipe = Filter(dp, _is_not_top_level_file, fn_kwargs=dict(root=root))
dp = Shuffler(dp, buffer_size=INFINITE_BUFFER_SIZE)
dp = FileLoader(dp)
return (
Expand Down
3 changes: 2 additions & 1 deletion torchvision/prototype/datasets/decoder.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import io
from typing import cast

import PIL.Image
import torch
Expand All @@ -8,4 +9,4 @@


def pil(buffer: io.IOBase, mode: str = "RGB") -> torch.Tensor:
return pil_to_tensor(PIL.Image.open(buffer).convert(mode.upper()))
return cast(torch.Tensor, pil_to_tensor(PIL.Image.open(buffer).convert(mode.upper())))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call cast because pil_to_tensor is not typed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. For untyped functions mypy assumes Any and then complains because we return the more specific torch.Tensor here. I've added a warn_redundant_casts = True option that will emit a warning that this cast can be removed as soon as pil_to_tensor is typed.

12 changes: 7 additions & 5 deletions torchvision/prototype/datasets/utils/_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
NoReturn,
Iterable,
Tuple,
Iterator,
cast,
)

import torch
Expand All @@ -27,7 +29,7 @@
from ._resource import OnlineResource


def make_repr(name: str, items: Iterable[Tuple[str, Any]]):
def make_repr(name: str, items: Iterable[Tuple[str, Any]]) -> str:
def to_str(sep: str) -> str:
return sep.join([f"{key}={value}" for key, value in items])

Expand All @@ -46,18 +48,18 @@ def to_str(sep: str) -> str:


class DatasetConfig(Mapping):
def __init__(self, *args, **kwargs):
def __init__(self, *args: Any, **kwargs: Any) -> None:
data = dict(*args, **kwargs)
self.__dict__["__data__"] = data
self.__dict__["__final_hash__"] = hash(tuple(data.items()))

def __getitem__(self, name: str) -> Any:
return self.__dict__["__data__"][name]

def __iter__(self):
def __iter__(self) -> Iterator[Any]:
return iter(self.__dict__["__data__"].keys())

def __len__(self):
def __len__(self) -> int:
return len(self.__dict__["__data__"])

def __getattr__(self, name: str) -> Any:
Expand All @@ -79,7 +81,7 @@ def __delattr__(self, item: Any) -> NoReturn:
raise RuntimeError(f"'{type(self).__name__}' object is immutable")

def __hash__(self) -> int:
return self.__dict__["__final_hash__"]
return cast(int, self.__dict__["__final_hash__"])

def __eq__(self, other: Any) -> bool:
if not isinstance(other, DatasetConfig):
Expand Down
2 changes: 1 addition & 1 deletion torchvision/prototype/datasets/utils/_resource.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


# FIXME
def compute_sha256(_) -> str:
def compute_sha256(path: pathlib.Path) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I'm afraid to ask

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs heavy refactoring as soon as the torchdata download API is stable-ish. Adding the type was just faster than adding an ignore.

return ""


Expand Down