Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match.group can return None #3902

Closed
rmorshea opened this issue Apr 3, 2020 · 14 comments · Fixed by #5557
Closed

Match.group can return None #3902

rmorshea opened this issue Apr 3, 2020 · 14 comments · Fixed by #5557
Labels
stubs: false negative Type checkers do not report an error, but should

Comments

@rmorshea
Copy link
Contributor

rmorshea commented Apr 3, 2020

All usages of Match.group appear to return strings however, for optional named groups it can return None:

import re
assert re.match("(?P<numbers>[1-9]+)?", "abc").group("numbers") is None
@srittau
Copy link
Collaborator

srittau commented Apr 5, 2020

There is a note in the stubs:

# TODO: The return for a group may be None, except if __group is 0 or not given.

This needs to be amended about it being potentially None in case of strings. Overloads and literals should be able to narrow this down. PR welcome.

@srittau srittau added size-medium stubs: false negative Type checkers do not report an error, but should labels Apr 5, 2020
@djb4ai
Copy link
Contributor

djb4ai commented Apr 7, 2020

@rmorshea Are you working on it? I can submit a patch if you want

@djb4ai
Copy link
Contributor

djb4ai commented Apr 7, 2020

@srittau When I am trying to replicate it since return type of re.match is Optional[Match[AnyStr]], mypy complains about calling .group on None is there any way to handle this scenario?

@srittau
Copy link
Collaborator

srittau commented Apr 7, 2020

@lladhibhutall This should use overloads, e.g. the return type should depend on whether an argument is given, it is 0, another int or a str.

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Apr 7, 2020

Note it can return None for (non-zero) numbered groups as well: assert re.match("(x)?y", "y").group(1) is None. This change, while correct, might hurt ergonomics in the happy case, pretty much all uses of numbered groups I've seen assume if there's a match their groups will be not None.

@rmorshea
Copy link
Contributor Author

rmorshea commented Apr 7, 2020

IMO having a false positive (i.e. the return type is Optional[str] for required match groups) is better than a false negative, but if the maintainers want to keep as-is to avoid inconvenience I'd understand that as well.

@hauntsaninja
Copy link
Collaborator

typeshed generally prefers false negatives over false positives (https://github.com/python/typeshed/blob/master/CONTRIBUTING.md#what-to-include), but maybe Sebastian has an opinion on whether to make Optional the return type of zero, one or both of named and (non-zero) numbered groups.

@hauntsaninja
Copy link
Collaborator

It looks like the change was made, and then reverted in #3190

@rmorshea
Copy link
Contributor Author

rmorshea commented Apr 7, 2020

Ok, seems like false-positive caused problems. Thanks @hauntsaninja.

@rmorshea rmorshea closed this as completed Apr 7, 2020
@hauntsaninja
Copy link
Collaborator

There's also some hope a plugin could improve results here, e.g.: python/mypy#7803

@mr-c
Copy link
Contributor

mr-c commented May 13, 2020

This is quite needed for mypyc users. Can this issue be re-opened?

@srittau
Copy link
Collaborator

srittau commented May 13, 2020

Reopening. I believe we should strive towards correctness here and we can ensure this at least somewhat using overloads and literals.

@srittau srittau reopened this May 13, 2020
@hauntsaninja
Copy link
Collaborator

An ideal solution for mypy users here is a plugin (python/mypy#7803).

The no argument case and the Literal[0] case should return str; this should be uncontroversial. The multiple argument case should return a tuple of whatever we decide the single argument case should return.

However everything else, which comprises most of the usage of group, can potentially return None. If you assume that non-optional capture groups are more common than optional capture groups, this is potentially a lot of false positives.

As discussed in mypyc/mypyc#676, one option here is to mark the return type as Union[Any, str]. This would sidestep the false positive problem, and since mypyc was mentioned, fix things for mypyc users. This option looks increasingly good to me!

Anecdotally, I literally wrote code today that would trigger a false positive if we instead went with Optional[str]. But a quick scan of https://grep.app/search?q=.group%28&filter[lang][0]=Python seems to indicate at least half of all usage of group would get flagged incorrectly.

@KotlinIsland
Copy link
Contributor

Basedmypy works correctly in these cases:

import re

s: str
if m := re.match("a(b)?(c)", s):
    reveal_type(m.groups())  #  (str | None, str)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stubs: false negative Type checkers do not report an error, but should
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants