Reproducible science is our top priority. Therefore, it is imperative to establish an efficient workflow that allows for collaboration in a convenient manner. To achieve those above, please stick to the guidelines and suggestions below as much as possible.
Every new member that joins the research team should:
- Apply for vpn through the university helpdesk service.
- Create a GitHub account.
- Verify that owncloud account is active and working.
- Teams is used for communication and project management. Make sure to have been invited to both of them.
- Download and start to maintain a reference manager. We recommend Zotero, as the freeware is compatible with both Microsoft Word and Google Docs.
- Read the standard scripting practices described here.
- In general, we use a folder called "data" within each repository.
- If data comes from another data source, we include the data source in a README.
- Raw data is available in the corresponding GitHub repository unless file-size is too large (> 100MB).
- When data file-size is too large, we store it in owncloud and include the link to the README file for reproducibility.
- When storing raw data, always include metadata describing what is in the file and what the columns mean. Remember, we consider this data as read-only.
- If we clean the data, we use a folder called "raw" or similar to differentiate original data from manipulated data easily.
- Some useful suggestions and ideas can be found in the Cambridge Data Management Website.
- We do our data analysis in GitHub repositories to facilitate collaboration and sharing.
- We use scripts to process data, make models, do analyses, and many others. In addition, we avoid spreadsheets, gis applications, or other software.
R
is used by most team members and thedata.table
,ggplot2
packages in specific. Of course, other software/packages are welcome.- We try to comment a lot in our code.
- We aim to fully reproducible papers, like this example.
- We publish git repositories through Zenodo upon publication of a manuscript.
- Collaborative manuscripts are written either in Overleaf or Google Drive.
- R for data science
- R/R studio for pure beginners
- Rstudio with GitHub
- R with git
- Zotero
- data.table vignette
- writing good software
Much of the inspiration for preserving an efficient, reproducible workflow came from openscapes, while some of the material used here was adapted from Pinsky lab.