Skip to content

Latest commit

 

History

History
111 lines (70 loc) · 4.47 KB

README.md

File metadata and controls

111 lines (70 loc) · 4.47 KB

Introduction to programming in R

Overview

This workshop is beginner-level introduction to programming in R. The course is designed to be taught in one session of 4 hours and is focused on the application of R to the analysis of tabular data from medical datasets.

Prior knowledge

  • No previous programming experience is required.
  • Please review the following topics if you're not comfortable with them:
    • basic linear algebra (operations with vectors and metrices)
    • mathematical operations (exponentials, logarithms)
    • basic statistics (mean, variance, median, standard deviation)
    • logical statements (AND, OR, NOT)

Sofware requirements

Once you have setup R and RStudio copy the code below to install the packages required for the workshop.

install.packages(c("data.table","datasets","devtools","dplyr","ggplot2","plyr","medicaldata","gapminder","RColorBrewer","rmarkdown","stringr","tidyr","tidyverse","viridis"))

Workshop Outline

1. R basics

In the first module of the workshop, the goals are to (1) familiarize with the language and the logic behind it; (2) Get started with R studio and create your first project; (3) Configure the working directory with a common standard structure; (4) Create your first .R file to write down the live code; (5) Compute arithmetic operations; (6) Use logical operators; (7) Get fluent in R's console; (8) Learn how to ask for help within R ; and (9) get comfortabble with installing packages.

Module content:

  • Syntax
  • Aritmetic Operations
  • Creating variables
  • Logical operators
  • Seeking help
  • Installing packages
  • Hands on: basics

2. Data types: attributes and built-in functions

In this section participants will (1) understand the differences between classes, objects and data types in R; (2) create objects of different types, learn about their attributes and apply some built-in functions in R; (3) Subset and index objects; and (4) get comfortable with vectorized operations.

Module content:

  • Vectors
  • Lists
  • Factors
  • Data frames
  • Arrays
  • Coercion
  • Hands-on: data types

3. Basic data manipulation

In this module participants will (1) learn how to read/write data to/from files with different formats (.tsv, .csv); (2) become familiar with basic data-frame operations; (3) index and subset data frames using base R; (4) manipulate individual data frame columns; and (5) learn how to join columns and rows of different data frames using base R.

Module content:

  • Reading/writing data
  • Exploring data frames
  • Hands-on: basic data manipulation

4. Advanced data manipulation

The fourth module participants will (1) familiarize with the dplyr syntax; (2) create pipes with the operator %>%; (3) perform operations on data frames using dplyr and tidyr functions; and (4) implement functions from external packages by reading their documentation in R

Module content:

  • Handling data frames with dplyr
  • Other useful packages
  • Hands-on: advanced data manipulation

5. Generating visual outputs

This section will show participants how to (1) Create basic plots using base R functions; (2) Understand how to connect data frames with ggplot2; (3) create basic graphs with ggplot2; (4) use factors to customize graphics in ggplot2; (5) use RMarkdown to generate customized reports.

Module content:

  • Figures with base R
  • Graphics with ggplot2

6. Real life application: COVID testing dataset

This hands-on activity will familiarize participants with a real-life use of R in the pharma industry environment and encorage them to apply the knowledge from previous modules to create an analysis pipeline.

7. Software development concepts

To conclude the workshop, participants will (1) acquire good coding practices; (2) be familiar with code documentation standatds; (3) know what to avoid when programming in R; (4) learn how to debug and troubbleshoot their own code.

Module content:

  • Good coding practices
  • Style guide
  • Debugging and troubleshooting

References

The materials for this workshop were based on the following sources:

Workshop created as part of the McGill Initiative in Computational Medicine