This site contains the materials for an R course run by Peter Mac.
This workshop is designed to provide beginners with foundational understanding of R programming language. Through a combination of theoretical explanations, hands-on coding exercises, and practical applications, participants will learn how to leverage R for data analysis, manipulation and visualization cancer biology datasets.
The workshop will cover essential programming concepts and gradually introduce more advanced topics, with a focus on using the tidyverse package suite for efficient data handling, analysis and visualization. The aim of this workshop is to improve the reproducibility and efficiency of scientific research by teaching powerful tools in data analysis and creating informative plots.
Sandun Rajapaksa
Participants will gain the following skills:
- Proficiency in using R and RStudio for data analysis.
- Basic R programming skills.
- Reading, tidying, and joining datasets using
readr
andtidyr
packages. - Data manipulation and transformation using
dplyr
package. - Creating various types of plots using
ggplot2
package.
The Metabric study characterized the genomic mutations and gene expression profiles for 2509 primary breast tumours. In addition to the gene expression data generated using microarrays, genome-wide copy number profiles were obtained using SNP microarrays. Targeted sequencing was performed for 2509 primary breast tumours, along with 548 matched normals, using a panel of 173 of the most frequently mutated breast cancer genes as part of the Metabric study.
**Refrences: **
Both the clinical data and the gene expression values were downloaded from cBioPortal.
We excluded observations for patient tumor samples lacking expression data, resulting in a data set with fewer rows.
The core tidyverse includes the packages that you're likely to use in everyday data analyses. Therefore, this workshop offers an introduction to these core packages. As of tidyverse 1.3.0, the following packages are included in the core tidyverse:
Hex logos for the eight core tidyverse packages and their primary purposes. Image source:https://education.rstudio.com/blog/2020/07/teaching-the-tidyverse-in-2020-part-1-getting-started/
- ggplot2: Grammar of Graphics. Enables the creation of graphics in a declarative manner.
- dplyr: Grammar for data manipulation. Presents a set of verbs to address common challenges in data manipulation.
- tidyr: Provides a collection of functions for achieving tidy data.
- readr: Facilitates the rapid and user-friendly reading of rectangular data (e.g., csv, tsv, and fwf).
- purrr: Functional programming toolkit. Offers a set of tools for efficient work with functions and vectors.
- tibble: Tibbles, a modern re-imagining of the data frame, offering a more user-friendly and efficient approach to handling tabular data.
- stringr: Provides a set of functions designed to simplify and enhance string manipulations.
- forcats: Provides a suite of useful tools for handling and manipulating categorical variables (factors).
These content were adapted from the following course materials:
- OHI Data Science Training
- Data Carpentry
- WEHI tidyr coursebook by Brendan R. E. Ansell
- content developed by Maria Doyle.