title | author | date | output | ||
---|---|---|---|---|---|
R for Data Sciences Syllabus |
Brian S Yandell |
July 2017 |
|
Description: This material is aimed at providing teams in the data sciences with an understanding of and experience with professional skills in data science. Researchers today must organize data projects to be able to repeat tasks and share data, ideas, reports and code with others in diverse teams. They need to do this quickly in real time and on a longer term, being able to reproduce tasks -- either their own or those of others -- months or years later. This involves building documents as a project evolves to capture work flow,
and sharing data methods and results with team collaborators.
To do this well, researchers as data scientists need to be skilled with internet tools, sophisticated use of statistical languages (such as R
) and other emerging topics.
Learning Objectives: After completing this material, an individual will be able to
- use
R
andRStudio
as platform for statistical computing - curate data in
R
, including- read, manipulate and display data summaries in concise tables
- work with data frames using tidyverse tools
- create functions to collapse repeated steps into one-line "verbs"
- write cleaned up data table out in CSV format
- visualize data with plots
- organize data methods and documentation
- document ongoing work with R Markdown
- use git and github to keep track of code and document changes with version control
- organize functions, documentation and data into packages (
R
libraries) to share - create and manage external databases from
R
objects
- analyze data with statistical models
- profile code for efficiency and error checking
- connect with other data science tools beyond R
- use unix/linux shell to search and modify project
- build a basic pipeline or workflow in the shell
- high throughput computing