This is one of the 100+ free recipes of the IPython Cookbook, Second Edition, by Cyrille Rossant, a guide to numerical computing and data science in the Jupyter Notebook. The ebook and printed book are available for purchase at Packt Publishing.
▶ Text on GitHub with a CC-BY-NC-ND license
▶ Code on GitHub with a MIT license
Chapter 2 : Best practices in Interactive Computing
Using a version control system is an absolute requirement in programming and research. This is the tool that makes it barely impossible to lose one's work. In this recipe, we will cover the basics of Git.
Notable distributed version control systems include Git, Mercurial, and Bazaar, among others. In this chapter, we will use the popular Git system. You can download the Git program and Git GUI clients from http://git-scm.com.
Distributed systems tend to be more popular than centralized systems such as SVN or CVS. Distributed systems allow local (offline) changes and offer more flexible collaboration systems.
An online provider allows you to host your code in the cloud. You can use it as a backup of your work and as a platform to share your code with your colleagues. These services include GitHub (https://github.com), Gitlab (https://gitlab.com), and Bitbucket (https://bitbucket.org). All of these websites offer free and paid plans with unlimited public and/or private repositories.
GitHub offers desktop applications for Windows and macOS at https://desktop.github.com/.
This book's code is stored on GitHub. Most Python libraries are also developed on GitHub.
- The very first thing to do when starting a new project or computing experiment is create a new folder locally:
mkdir myproject
cd myproject
- We initialize a Git repository:
git init
Initialized empty Git repository in
~/git/cookbook-2nd/chapter02/myproject/.git/
pwd
~/git/cookbook-2nd/chapter02/myproject
ls -a
. .. .git
Git created a .git
subdirectory that contains all the parameters and history of the repository.
- Let's set our name and e-mail address globally:
git config --global user.name "My Name"
git config --global user.email "[email protected]"
- We create a new file, and we tell Git to track it:
echo "Hello world" > file.txt
git add file.txt
- Let's create our first commit:
git commit -m "Initial commit"
[master (root-commit) 02971c0] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 file.txt
- We can check the list of commits:
git log
commit 02971c0e1176cd26ec33900e359b192a27df2821
Author: My Name <[email protected]>
Date: Tue Dec 12 10:50:37 2017 +0100
Initial commit
- Next, we edit the file by appending an exclamation mark:
echo "Hello world!" > file.txt
cat file.txt
Hello world!
- We can see the differences between the current state of our repository, and the state in the last commit:
git diff
diff --git a/file.txt b/file.txt
index 802992c..cd08755 100644
--- a/file.txt
+++ b/file.txt
@@ -1 +1 @@
-Hello world
+Hello world!
The output of git diff
shows that the contents of file.txt
were changed from Hello world
to Hello world!
. Git compares the states of all tracked files and computes the differences between the files.
- We can also get a summary of the changes as follows:
git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will
be committed)
modified: file.txt
no changes added to commit (use "git add")
git diff --stat
file.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
The git status
command gives a summary of all changes since the last commit. The git diff --stat
command shows, for each modified text file, the number of changed lines.
- Finally, we commit our change with a shortcut that automatically adds all changes in the tracked files (
-a
option):
git commit -am "Add exclamation mark to file.txt"
[master 045df6a] Add exclamation mark to file.txt
1 file changed, 1 insertion(+), 1 deletion(-)
git log
commit 045df6a6f8a62b19f45025d15168d6d7382a8429
Author: My Name <[email protected]>
Date: Tue Dec 12 10:59:39 2017 +0100
Add exclamation mark to file.txt
commit 02971c0e1176cd26ec33900e359b192a27df2821
Author: My Name <[email protected]>
Date: Tue Dec 12 10:50:37 2017 +0100
Initial commit
When you start a new project or a new computing experiment, create a new folder on your computer. You will eventually add code, text files, datasets, and other resources in this folder. The distributed version control system keeps track of the changes you make to your files as your project evolves. It is more than a simple backup, as every change you make on any file can be saved along with the corresponding timestamp. You can even revert to a previous state at any time; never be afraid of breaking your code anymore!
Git works best with text files. It can handle binary files but with limitations. It is better to use a separate system such as Git Large File Storage, or Git LFS (see https://git-lfs.github.com/).
Specifically, you can take a snapshot of your project at any time by doing a commit. The snapshot includes all staged (or tracked) files. You are in total control of which files and changes will be tracked. With Git, you specify a file as staged for your next commit with git add
, before committing your changes with git commit
. The git commit -a
command allows you to commit all changes in the files that are already being tracked.
When committing, you should provide a clear and short message describing the changes you made. This makes the repository's history considerably more informative than just writing "work in progress". If the commit message is long, write a short title (less than 50 characters), insert two line breaks, and write a longer description.
How often should you commit? The answer is very often. Git only takes responsibility of your work when you commit changes. What happens between two commits may be lost, so you'd better commit very regularly. Besides, commits are quick and cheap as they are local; that is, they do not involve any remote communication with an external server.
Git is a distributed version control system; your local repository does not need to synchronize with an external server. However, you should synchronize if you need to work on several computers, or if you prefer to have a remote backup. Synchronization with a remote repository can be done with git push
(send your local commits on the remote server), git fetch
(download remote branches and objects), and git pull
(synchronize the remote changes on your local repository), after you've set up remotes.
We can also create a new repository on an online git provider such as GitHub:
On the main webpage of the newly created project, click on the Clone or download button to get the repository URL and type in a terminal:
git clone https://github.com/mylogin/myproject.git
If the local repository already exists, do not tick the Initialize this repository with a README box on the GitHub page, and add the remote with git remote add origin https://github.com/yourlogin/myproject.git
. See https://help.github.com/articles/adding-a-remote/ for more details.
The simplistic workflow shown in this recipe is linear. In practice though, workflows with Git are typically nonlinear; this is the concept of branching. We will describe this idea in the next recipe, A typical workflow with Git branching.
Here are some references on Git:
- Hands-on tutorial, available at https://try.github.io
- Git, a simple guide by Roger Dudler, available at http://rogerdudler.github.io/git-guide/
- Git Immersion, a guided tour, at http://gitimmersion.com
- Atlassian Git tutorial, available at http://www.atlassian.com/git
- Online Git course, available at http://www.codeschool.com/courses/try-git
- Git tutorial by Lars Vogel, available at http://www.vogella.com/tutorials/Git/article.html
- GitHub and Git tutorial, available at http://git-lectures.github.io
- Intro to Git for scientists, available at http://karthik.github.io/git_intro/
- GitHub help, available at https://help.github.com
- A typical workflow with Git branching