You are writing code that you and other people (and future you!) will read and modify multiple times, so there are few things to keep in mind that will simplify your life:
- Keep it simple, clean, and documented. Your code and your documentation are your best methods section! Write it so that your colleagues can understand it easily. This may mean the difference between being able to easily understand and reuse your own code six months from now and having to rewrite everything from scratch.
- Make it robust and extensible. Writing robust code means you will spend less time debugging, or even worse, analyzing results from an incorrect code. Writing extensible code means you will spend less time modifying it.
This article tries to offer practical suggestions on how to achieve these two (sometimes clashing) goals:
- Use a modern environment
- Keep clean and adhere to PEP8
- Document your code
- Find bugs before they find you
- OOP design principles can make your project sustainable
-
Start with Python 3! Python 3 should be your default interpreter. Many scientific projects signed on to terminate support for Python 2 in 2020. If you start with, or only support python 2.x, you will be left in the dust.
-
Use a good editor. Until you try, you have no idea how much it can speed you up.
- It will catch syntactic (and some semantic) errors for you.
- Autocompletion will save you from misspelling variable/function names.
- The "jump to definition" (and documentation) feature will spare you a lot of file browsing looking for other code.
Great examples of editors are PyCharm and Atom, but any editor with those features will do. If you are still on plain
gedit
,emacs
, orvi
, bite the bullet and move to any editor made for the job. It is a time investment with high returns.
PEP8 is a document defining a stylistic standard to make Python code more readable/understandable. Almost the entirety of Python packages out there adopt it. The document is not the most exciting read, but the good news is you don't have to read it if you are using a good editor that will highlight PEP8 violations. PyCharm has this feature enabled by default. For Atom, simply install the linter-pep8 package.
Black is the uncompromising Python code formatter. It formats your code so as to comply with most of pep8 requirements except the fact it does not help you with naming convention. In a team context, it enforces a common and pep8 compliant coding styles.
iSort a Python utility / library to sort imports alphabetically, and automatically separated into sections and by type. It makes your import section tidy, reasonable and readable. In a team context, it makes code review easier. Because there is a single right place for your new import statement, when two people do the same imports in different location in a source file, iSort will deduplicate it.
PEP8 is a standard set of guidelines, not rules. If you are working on a codebase adopting other conventions, stick with them rather than writing new code in a dissimilar style; just use a consistent style throughout your project. If you are writing new code, there is no really good reason to not adhere to PEP8.
It is much easier to understand the code if you can immediately distinguish functions from classes just based on their name.
Public | Internal usage only |
---|---|
modulename and packagename |
_modulename and _packagename |
ClassName (and ExceptionName ) |
_ClassName (and _ExceptionName ) |
variable_name |
_variable_name |
function_name() |
_function_name() |
CONSTANT_NAME |
_CONSTANT_NAME |
It will be easier to search things in your code and browse it.
spam(ham[1], {eggs: 2}) # Yes
spam( ham[ 1 ], { eggs: 2 } ) # No
x = 1; y <= x; x and y # Yes
x=1; y <=x; x and y # No
function(arg1=3, arg2=4.0) # Exception: no spaces for keyword arguments
if x == 4: print x, y; x, y = y, x # Yes
if x == 4 : print x , y ; x , y = y , x # No
ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:] # Exception: when using colon as slice operator
It allows you to have two columns of code side by side, which is handy when reviewing diff in version control. You can use parentheses or \
to go to a new line.
my_long_string = ("can be divided over multiple "
"lines using parentheses.")
Docstrings are special strings used to document your code that are internally supported by Python. The general guidelines are defined by PEP257, but there are specific formats that can be parsed by automatic documentation generators, most commonly Sphinx:
- reStructuredText: The most common format. Very flexible and natively supported by Sphinx.
- Numpydoc: Generally more readable, but harder to comply to. A plugin adds Numpydoc support to Sphinx.
- Google style: The Numpydoc syntax is based on this.
- Epydoc: Based on Javadoc. It is now discontinued (last release was 2008), but it can still be found in few projects.
See this tutorial on documentation to get started.
We highly recommend adopting the Numpy docstring standard. It is an excellent compromise between human- and machine-readability.
Here is a minimal example:
If you have a module called mymodule.py
"""Docstring for my module.py.
All docstrings start and end with triple double quotes.
"""
def add_numbers(number1, number2):
"""For obvious functions a single-line description is enough."""
return number1 + number2
class MyClass:
"""One-line summary of MyClass.
For more complex code, give a one-line summary followed by a
further description that can go over multiple lines.
Attributes
----------
my_float : float
Description of attribute.
"""
def __init__(self, my_float):
self.my_float = my_float
def multiply(self, number):
"""Return number * self.my_float.
Parameters
----------
number : float
This is multiplied by self.my_float.
Returns
-------
float
Result of multiplication.
"""
return self.my_float * number
you can use
python
>>> import mymodule
>>> help(mymodule)
>>> help(mymodule.MyClass.multiply)
to obtain information on how to use the code in mymodule.py
.
Nothing wastes your time like a bug that does not raise an error. You can spend several weeks on some results before you realize your code is incorrect. It is even worse when you realize it after publication. There are few practices that can help you to prevent introducing bugs.
-
First and foremost, write tests! The paradigm of test-driven development is simple:
- Create a test that fails, so that you know you are testing the right thing.
- Implement only the code that makes you pass the test.
- Clean up your code (refactor, update docs, etc.) and repeat.
Several Python tools (pytest, nose2, unittest), allow you to easily run tests automatically. This will lower the chance of introducing regressions when you modify your code. See this tutorial on testing for more information, and to get started.
Do not compare floats for equality. Because computers represent real numbers in binary with finite precision, you can obtain unexpected results. For example, 0.1
has a periodic binary representation, and it cannot be expressed by a finite number of bits.
>>> 0.1 + 0.1 + 0.1 == 0.3
False
It is generally fine to compare floats for equality up to some finite precision
>>> round(0.1 + 0.1 + 0.1, 10) == round(0.3, 10)
True
Do not use mutable types as default arguments. Python creates default arguments when functions are defined, not when they are called, so changes to your default object inside your function will be permanent.
def append(element, list_=[]):
list_.append(element)
return list_
>>> print(append(0))
[0]
>>> print(append(1))
[0, 1]
instead prefer
def append(element, list_=None):
if list_ is None:
list_ = []
list_.append(element)
return list_
Avoid generic type coercion when possible. It is easy to catch unwanted types in the mix and mask a bug upstream.
squared_lookup_table = [x*x for x in range(100)]
def square_integers(integer_list):
"""Element-wise square of the given list of integers."""
integer_list = [int(x) for x in integer_list] # let's convert floats to int
return [squared_lookup_table[i] for i in integer_list]
>>> square_integers([0, 4, 10])
[0, 16, 100]
>>> square_integers('92') # you want this to fail to catch a bug!
[81, 4]
If you absolutely have to coerce a type, tighten it up to be sure you are coercing only those you want.
def square_integers(integer_list):
integer_list = [int(round(x)) for x in integer_list] # This always fails with strings.
return [squared_lookup_table[i] for i in integer_list]
It is generally safer to leave any type of coercion to the user of the function, since they know what kind of data they intend to pass.
Avoid abusing **kwargs
. Using **kwargs
is a general way to forward keyword arguments to another (possibly unknown) function, but when used instead of normal keyword arguments, it can mask bugs in your program.
def decorate_str(decorated, **kwargs):
"""Prepend and append a decoration string."""
separator = kwargs.get('separator', '_') # default is '_'
decoration = kwargs.get('decoration_string', '') # default is ''
return decoration + separator + decorated + separator + decoration
>>> decorate_str('decorated', decoration_string='foo')
'foo_decorated_foo'
>>> decorate_str('decorated', decoration_str='foo') # Whops! No Error because kwargs disrupt introspection!
'_decorated_'
If you want to use **kwargs
instead of normal keyword argument, add manually some control logic.
def decorate_str(decorated, **kwargs):
"""Prepend and append a decoration string."""
separator = kwargs.pop('separator', '_') # default is '_'
decoration = kwargs.pop('decoration_string', '') # default is empty string
if kwargs != {}:
raise TypeError('got unexpected keyword arguments {}'.format(kwargs))
return decoration + separator + decorated + separator + decoration
But it is usually more readable to just use keyword arguments (plus it does not break autocompletion).
def decorate_str(decorated, separator='_', decoration_string=''):
return decoration_string + separator + decorated + separator + decoration_string
The main goal of object-oriented programming (OOP) design principles is to provide guidelines to write code that is fast to change, and changeable without breaking backward compatibility. In science, breaking backward compatibility often means impeding easy reproducibility. While this is especially important if you are writing a Python tool that other people will use, consider that you are the first user of your code, and learning about this principles (and when not to apply them) can speed you up considerably. We are working on a separate article with practical advice on Python library/tool-making (that you should help us improve!).
A large variety of automated tools can be integrated with GitHub to check your code for departures from the above guidelines, and even teach you better coding practices. Our favorites right now are landscape.io, intended to help keep technical debt under control by spotting problems early, and Quantified Code, which can actually teach you better coding practices when it spots problematic code. Enabling these tools to watch your repository and monitor your pull requests is easy, and can save you a lot of hassle in the long run.
Want to contribute to this repository? Have a suggestion for an improvement? Spot a typo? We're always looking to improve this document for the betterment of all.
- Please feel free to open a new issue with your feedback and suggestions!
- Or make a pull request from your branch or fork!
Previous: | Next: | |
---|---|---|
Version Control | Back to Top | Unit Testing |