Our top priority is doing reproducible science. This means establishing an efficient workflow that allows for collaboration in a convenient manner. The following set of guidelines and suggestions aim to this objective.
There are a few things that each new member should do when he/she joins the research team. These are:
- Create a GitHub account.
- Check that your owncloud account is working.
- We use the slack for communication and wrike for project management. Make sure that you have been invited to both of them.
- Download and begin to maintain a reference manager. We recommend Zotero, as the free software interfaces well with both Microsoft Word and Google Docs.
- Read the common scripting practices described here.
- We generally use a folder called "data" within each repository.
- We keep our raw data in the github repository related to the project, unless the data files are too large.
- We store our raw data with metadata describing what’s in the file and what the columns mean. We consider these data as read-only.
- If we clean the data, we often use a folder called something like "raw" to differentiate data in its original form from data that has been manipulated.
- If we are using data downloaded from another data source, we include the data source in a README.
- If our data are too large to store on GitHub (file size> 100 MB), we store them in owncloud and include the link to README file for reproducibility.
- Some useful suggestions and ideas can be found in the Cambridge Data Management Website.
- We do our data analysis in GitHub repositories to facilitate collaboration and sharing.
- We use scripts to process data, make models, do analyses, etc. and avoid spreedsheets, gis applications or other software.
R
is used by the most team members and thedata.table
,ggplot2
packages in specific. Of course other software/packages are welcome.- We try to comment a lot in our code.
- We aim to fully reproducible papers, as in this example.
- We publish git repositories through Zenodo upon publication of a manuscript.
- Collaborative manuscripts are written either in Overleaf or Google Drive.
- R for data science
- R/R studio for pure beginners
- Rstudio with GitHub
- R with git
- Zotero
- data.table vignette
- writing good software
Much of the inspiration for preserving an efficient, reproducible workflow came from openscapes, while some of the material used here were copied from Pinsky lab.