This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This project will take you through turning a script with hardcoded parameters into a reusable package that is easier to use, more flexible, and can be used to automate your research.
You'll need Python 3.8+ and pip
installed, and a moderate amount of
Python knowledge. While this tutorial uses Python, many of the
techniques we'll go through are more broadly applicable to most other
programming languages.
You'll also need to be able to use git
for version control.
Start by forking this project on GitHub, and cloning it locally. Then work through the exercises in each of the four steps:
- Reusable module
- Packaging
- Input
- Testing
You might find it easiest to view the exercises by looking at the
README.md
file in each directory on GitHub, rather than locally.
Each step comes with a set of tests that you should run after each exercise. The tests will start off all failing, and successfully completing each exercise will make more and more tests pass. You can use this to assess your progress through the whole tutorial.
You should regularly commit your work, at least after completing each exercise, possibly more frequently.
There are some bonus and advanced exercises throughout this tutorial. Bonus exercises are good to go through if you find you have some extra time during the session, and are about techniques that are generally useful to most people. Advanced exercises, on the other hand, are usually a bit more specialist, or require a bit more time and/or research to implement. They are good next steps for the interested learner to look into after the session.
We can't cover everything in this tutorial, but there is always something more to learn. After applying the techniques you've learnt here to your own projects, you might like to investigate the following tools and resources:
- automate running tests with GitHub actions
- This more generally falls under the names "Continuous Integration", "Continuous Development", or "CI/CD"
- You can use CI to automate all sorts of things, such as running formatters and linters, publishing packages, building containers, and so on
- self-describing output files using netCDF or HDF5
- These file formats are portable across systems, and can help both structure and describe your data through labels with things like units or plain language descriptions
- It's useful to store things like the exact input parameters, the version of the code used, when the code was run, and other metadata
- better analysis using Pandas or xarray
- Pandas works very well with tabular data
- Xarray is designed for labelled, multi-dimensional data
- documentation using Sphinx and ReadTheDocs
- Sphinx uses ReStructuredText (a kind of text markup, like LaTeX or HTML) to make documentation websites from source code
- Sphinx can also automatically pull out docstrings from Python packages to make API documentation (there are plugins for other languages too)
- ReadTheDocs hosts and automatically generates websites using Sphinx (the xarray docs, for instance, are written in Sphinx and built with ReadTheDocs)
For a longer and more in-depth course on packaging Python please see this Software Carpentries incubator course, which includes more details on the project metadata, publishing packages on PyPI, and the sometimes confusing history behind python packages. This course was written by Liam Pattinson, another member of PlasmaFAIR.
A local equilibrium of the magnetic field of a tokamak can be represented with the so-called Miller parameterisation, defined in Phys. Plasmas, Vol. 5, No. 4, April 1998 Miller et al.:
where
The three parameters,