Even though computers are often considered deterministic, computational software is a rapidly evolving and changing landscape. Libraries are constantly adding new features and fixing issues. Even libraries with the strictest backwards-compatibility policies can change in significant ways. As computer hardware evolves, software is forced to adapt accordingly.
A reproducible computational environment is sufficiently consistent for the computational task at hand. For example, this can consist of a similar CPU instruction set, libraries and executables available with a specific version and configuration options, a specific operating system version, etc.
Docker is an open-source engine that automates the deployment of any application as a lightweight, portable, self-sufficient container that will run virtually anywhere.
Docker works with images that consume minimal disk space, versioned, archiveable, and shareable. Executing applications in these images does not require dedicated resources and is high performance. For more information on Docker please visit this page.
As the name suggests, a Virtual Machine (VM) emulates a physical computer. In the last few years, VM's have become very popular because of their scalability, ease of maintenance, and reproducibility.
Vagrant is a software that defines and controls the virtual machine environment. These machines are generally able to work together and can be associated with each other. Some use-cases people are using multi-machine environments for today:
- Accurately modeling a multi-server production topology, such as separating a web and database server.
- Modeling a distributed system and how they interact with each other.
- Testing an interface, such as an API to a service component.
- Disaster-case testing: machines dying, network partitions, slow networks, inconsistent world views, etc.
Many software packages provide virtualization on modern computer architectures such as VMWare, Virtual Box, Parallels etc.
Part of the problem in creating a computational environment is the procurement of necessary libraries and other dependencies. Linux distributions have long been a source of extensive scientific development resources. Package managers like HomeBrew and Chocolatey are also available for OSX and Windows. On Linux, yum (http://yum.baseurl.org) is the command line package management for rpm systems, and dpkg (https://wiki.debian.org/Teams/Dpkg) is the software for Debian package management system. Scientific Python distributions are also available like Anaconda, Canopy or PythonX,Y.
- Create your own reproducible computational environment.
- Upload your Docker image to DockerHub.