-
Notifications
You must be signed in to change notification settings - Fork 4
Python type hints and migration to Python 3
See these Google Slides for our plan overview for migrating to Python 3 and adding mypy type hints to help catch problems in the migration and elsewhere (esp. with the string to unicode vs. bytes change).
Python 2 | Python 2+3 | Python 3 | |
---|---|---|---|
basestring | basestring | -- | -- |
unicode | unicode | -- | -- |
typing.Text | unicode | typing.Text | str |
typing.AnyStr # any type of string but not mixed | typing.AnyStr | typing.AnyStr | typing.AnyStr |
six.string_types # for instanceof() | (basestring,) | six.string_types | (str,) |
six.text_type | unicode | six.text_type | str |
String = Union[str, Text] # type alias | unicode or str [or bytes] | a text string | str |
Python 2 | Python 2+3 | Python 3 | |
---|---|---|---|
test a key | key in d d.has_key(key) |
key in d | key in d |
snapshot as a list | list(d) d.keys() |
list(d) | list(d) list(d.keys()) |
d.values() list(d.values()) # extra list copy |
list(six.viewvalues(d)) list(d.values()) |
list(d.values()) | |
d.items() list(d.items()) # extra list copy |
list(six.viewitems(d)) list(d.items()) |
list(d.items()) | |
iterable view | d.viewkeys() | six.viewkeys(d) | d.keys() d # if the context will call iter() on it |
d.viewvalues() | six.viewvalues(d) | d.values() | |
d.viewitems() | six.viewitems(d) | d.items() | |
iterator | for key in d: ... iter(d) d.iterkeys() |
for key in d: ... iter(d) six.iterkeys(d) |
for key in d: ... iter(d) iter(d.keys()) |
d.itervalues() | six.itervalues(d) | iter(d.values()) | |
d.iteritems() | six.iteritems(d) | iter(d.items()) |
- "Snapshot as a list" takes more RAM but isn't always slower and it lets you modify the dict while iterating through the snapshot. Even when it is slower, that might not matter in a unit test or a development utility.
- A dictionary "View" has set operations,
in
,iter
, andreversed
iter. It's Iterable, which means it can construct an iterator on demand, and in that sense it can be iterated multiple times although each iterator is one-shot. While iterating, it can handle dict value changes but not size changes. - Stop using the
d.iterxyz()
methods. They aren't in Python 3 since the View methods fill that role and more. Most code (likefor...in
) that needs an Iterator will accept an Iterable. If you really need to pass an Iterator to a function, then call e.g.iter(a_view)
.
- style-guide.md#type-hints -- our notes on using Python type hints including numpy stubs.
- Porting Python 2 Code to Python 3.
- What's New in Python.
- The "six" compatibility library.
- Python's "Future" conversion tool.
-
Why Python 3 exists
- Text vs. binary data in Python 2 is error prone. Mixing encoded and unencoded text is unreliable and confusing, e.g. both
str
andunicode
types have.encode()
and.decode()
methods. Python predates the Unicode standard. Inconsistent unicode handling, e.g. in a script vs. interactive interpreter; alsoopen().read()
. - Python 3 fixes that problem by distinguishing unicode text from binary bytes as separate types. The new approach is known the "Unicode sandwich": "use bytes in I/O; unicode in all the app code in between." However, the Python team threw in a lot of other incompatible changes. Big mistake!
- Text vs. binary data in Python 2 is error prone. Mixing encoded and unencoded text is unreliable and confusing, e.g. both
- The Unicode HOWTO.
- Nick Coghlan's Python Notes.
- The Story of Python 2 and 3.
- Ned Batchelder’s Pragmatic Unicode talk/essay
- Supporting Python 3: An in-depth guide.
- differences.
- @twouters’s old TransitionToPython3 wiki.
- What's New in Python for really comprehensive and exhaustive documentation about all language changes since Python 2.7.
- Conservative Python 3 Porting Guide..
- From Dropbox, Incrementally migrating over one million lines of code from Python 2 to Python 3
- Don't use unicode literals.
- Use Mypy type checks, unit tests, and
Py2 -3
in CI to check for backsliding esp. on unicode/bytes types.
- Python 3 for Scientists.
- Adopt all the
__future__
imports.- Division is the challenging one. It's mostly in use already, with the big exception that
wholecell/utils/units.py
has truediv turned off for its callers due to Issue #433.
- Division is the challenging one. It's mostly in use already, with the big exception that
- Adopt Python 3 compatible libraries.
- The pips should now be Python 3 compatible, but they aren't all clearly marked that way.
- Finish adopting
subprocess32
in place ofsubprocess
. It's a back-port of the Python 3subprocess
with improvements and bug fixes in process launching.
- Incrementally convert to Python 3 compatible syntax and semantics. Use a tool like "future" to do much of the conversion. As we ratchet up the Python 3 compatibility, let everyone know and update the checker tool configuration.
- Use a checker tool in CI to catch backsliding on Python 3 compatibility changes.
- Add type hints, esp. for the
str
,bytes
,unicode
, andbasestring
types and theAnyStr
type hint.- Add a type checker in CI, most likely pytest (see below).
- Drop support for Python 2.
- Phase out use of the "six" compatibility library.
Type hints look like this:
def emphasize(message):
# type: (str) -> str
"""Construct an emphatic message."""
return message + '!'
A few type hints -- esp. one per function definition -- can go a long way to catching problems and documenting types.
PyCharm checks types interactively, while you edit. You don't need any other tools to check types. See Python Type Checking (Guide).
Batch programs mypy and pytest are other ways to check types, particularly in Continuous Integration builds (CI).
Typeshed is a repository for "stub" files that associate type definitions with existing libraries. It's bundled with PyCharm, mypy, and pytype. It does not have types for Numpy.
There are experimental type stubs in the numpy repo numpy-stubs
that define types for dtype
and ndarray
. It's not fancy but it does catch some mistakes and it
improves PyCharm autocompletion. The numpy team might improve these stubs but numpy, scipy, and matplotlib use types more flexibly than type checker tools can handle.
With this stub file, you can write type hints like np.ndarray
and .
It has no way to express the element type or array shape so use docstrings for that.np.ndarray[int]
import numpy as np
def f(a):
# type: (np.ndarray]) -> np.ndarray
return np.asarray(a, dtype=int)
The wcEcoli project includes numpy-stubs.
To install more stub files:
- Copy them into the
stubs/
directory in the project. Mark thestubs/
directory as a source root in PyCharm.