Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute resulting types of groups and groupdicts methods of Match objects from fullmatch #9871

Closed
djasa opened this issue Mar 12, 2023 · 1 comment

Comments

@djasa
Copy link

djasa commented Mar 12, 2023

The re.fullmatch() method either matches or not. The tuple and dict returned by .groups() and .groupdict() respectively are thus of fixed length equal to the number of groups defined in the regex and their elements corresponding to mandatory groups are always at the same type as input, elements corresponding to optional groups can also be None if not matched.

Hence, type checking should see this example script as correct, because:

  • the regex is known at type-checking time
  • the return statement of the parse_url() function is only reached when re.fullmatch() succeeds
$ cat mypy_fullmatch_groups.py 
#!/usr/bin/env python3
import re
import sys
from typing import Tuple, TypedDict, TYPE_CHECKING, Union


ParsedTuple = Tuple[str, str, str|None]
ParsedDict = TypedDict("ParsedDict", {"proto": str, "host": str, "rest": Union[str, None]})


def parse_url(url: str) -> Tuple[ParsedTuple, ParsedDict]:
    fm = re.fullmatch(r"(?P<proto>[^:]+)://(?P<host>[^/]+)(?P<rest>/[^/]+)*", url)
    if not fm:
        raise Exception(f"Failed to parse the URL: {url}")

    return (fm.groups(), fm.groupdict())


def main() -> None:
    if len(sys.argv) > 1:
        urls = sys.argv[1:]
    else:
        urls = ["https://hello", "https://hello/1", "https://hello/1/2", "http//spam"]

    for u in urls:
        try:
            p = parse_url(u)
            print(f"Parsed URL: {u}")
            print(f"  * as a tuple: {p[0]=}")
            print(f"  * as a dict:  {p[1]=}\n")
        except Exception as e:
            print(e)

if __name__ == '__main__':
    main()

However, it does not, because it treats fullmatch.groups() and fullmatch.groupdict() as variable length of str|None like those of other Match objects, where we indeed can't tell the length of sequences and whether the members are None or not:

$ mypy mypy_fullmatch_groups.py 
mypy_fullmatch_groups.py:16: error: Incompatible return value type (got "Tuple[Tuple[Union[str, Any], ...], Dict[str, Union[str, Any]]]", expected "Tuple[Tuple[str, str, Optional[str]], ParsedDict]")
Found 1 error in 1 file (checked 1 source file)

Environment: Fedora 37 with:

$ python3 --version
Python 3.11.1
$ mypy --version
mypy 0.982 (compiled: no)
@AlexWaygood
Copy link
Member

Everything you say is correct, but there's no way to express this in typeshed's stubs. More precise inference along the lines of what you're describing would have to be provided via special-cased logic in the type checker.

python/mypy#7803, a long-open PR, proposes to add such support to mypy, but work on that PR appears to have stalled.

@AlexWaygood AlexWaygood closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants