diff --git a/data/data_description.pdf b/data/data_description.pdf new file mode 100644 index 0000000..cdcfa3a Binary files /dev/null and b/data/data_description.pdf differ diff --git a/exercise-solutions.pdf b/exercise-solutions.pdf index 20ede6a..0518744 100644 Binary files a/exercise-solutions.pdf and b/exercise-solutions.pdf differ diff --git a/images/github_qr.png b/images/github_qr.png new file mode 100644 index 0000000..cf97511 Binary files /dev/null and b/images/github_qr.png differ diff --git a/session1_notes.Rmd b/session1_notes.Rmd index 5499e8c..595ce98 100644 --- a/session1_notes.Rmd +++ b/session1_notes.Rmd @@ -5,6 +5,7 @@ author: Sophie Lee fontfamily: lmodern output: pdf_document: + highlight: tango toc: TRUE latex_engine: xelatex fig_height: 4 @@ -21,7 +22,7 @@ url_color: blue --- ```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) +knitr::opts_chunk$set(echo = TRUE, collapse = TRUE) ``` \newpage @@ -42,7 +43,7 @@ The screenshot below shows the RStudio interface which comprises of four windows ![RStudio interface](images/rstudio_ide.png) \newpage -#### Window A: R script files +**Window A: R script files** All analysis and actions in R are carried out using the R syntax language. R script files allow you to write and edit code before running it in the console window. @@ -55,31 +56,30 @@ The main advantage of using the script file rather than entering the code direct Past script files can be opened using *File -> Open Fileā€¦* from the drop-down menu or by clicking the ![open icon](images/open_shortcut.png) icon and selecting a `.R` file. The keyboard shortcut to open an existing script file is `Ctrl + o` on Windows, and `Command + o` on Macs. -#### Window B: The R console +**Window B: The R console** The R console window is where all commands run from the script file, results (other than plots), and messages, such as errors, are displayed. Commands can be written directly into the R console after the `>` symbol and executed using `Enter` on the keyboard. It is not recommended to write code directly into the console as it is cannot be saved or replicated. Every time a new R session is opened, details about version and citations of R will be given by default. To clear text from the console window, use the keyboard shortcut `control + l` (this is the same for both Windows and Mac users). Be aware that this clears all text from the console, including any results. Before running this command, check that any results can be replicated within the script file. +\newpage -#### Window C: Environment and history +**Window C: Environment and history** This window lists all data and objects currently loaded into R. More details on the types of objects and how to use the Environment window are given in later sections. -#### Window D: Files, plots, packages and help +**Window D: Files, plots, packages and help** This window has many potential uses: graphics are displayed and can be saved from here, and R help files will appear here. This window is only available in the RStudio interface and not in the basic R package. ## Exercise 1 - 1. Open a new script file if you have not already done so. 2. Save this script file into an appropriate location. \newpage # Chapter 2: R syntax - All analyses within R are carried out using **syntax**, the R programming language. It is important to note that R is case-sensitive, so always ensure that you use the correct combination of upper and lower case letters when running functions or calling objects. Any text written in the R console or script file can be treated the same as text from other documents or programmes: text can be highlighted, copied and pasted to make coding more efficient. @@ -109,14 +109,13 @@ The choice of brackets in R coding is particularly important as they all have di All standard notation for mathematical calculations (`+`, `-`, `*`, `/`, `^`, etc.) are compatible with R. At its simplest level, R is just a very powerful calculator! ## Exercise 2 - 1. Add your name and the date to the top of your script file (hint: comment this out so R does not try to run it) 2. Use R to calculate the following calculations. Add the result to the same line of the script file in a way that ensures there are no errors in the code. a. $64^2$ b. $3432 \div 8$ c. $96 \times 72$ -When you have finished this exercise, select the entire script file (using `Ctrl + a` on windows or `Command + a` on Mac) and run it to ensure there are no errors in the code. +When you have finished this exercise, select the entire script file (using `ctrl + a` on windows or `Command + a` on Mac) and run it to ensure there are no errors in the code. \newpage @@ -125,17 +124,17 @@ When you have finished this exercise, select the entire script file (using `Ctrl ## 3.1 Objects One of the main advantages to using R over other software packages such as SPSS is that more than one dataset can be accessed at the same time. A collection of data stored in any format within the R session is known as an **object**. Objects can include single numbers, single variables, entire datasets, lists of datasets, or even tables and graphs. -Objects are defined in R using the `<-` symbol or `=`. For example, +Objects are defined in R using the `<-` symbol. For example, ```{r assign qone} object_1 <- 81 ``` -Creates an object in the environment named `object_1`, which takes the value `81`. This will appear in the environment window of the console (window C from the interface shown in the first chapter). +Creates an object in the environment named `object_1`, which takes the value `81`. This will appear in the environment window of the console (window C from the interface shown earlier). To retrieve an object, type its name into the script or console and run it. This object can then be included in functions or operations in place of the value assigned to it: -```{r qone} +```{r qone, collapse = TRUE} object_1 object_1 * 2 diff --git a/session1_notes.pdf b/session1_notes.pdf index 78319ee..ebd5bdc 100644 Binary files a/session1_notes.pdf and b/session1_notes.pdf differ diff --git a/session2_notes.Rmd b/session2_notes.Rmd index 30ccfdb..6c6a644 100644 --- a/session2_notes.Rmd +++ b/session2_notes.Rmd @@ -5,6 +5,7 @@ author: "Sophie Lee" fontfamily: lmodern output: pdf_document: + highlight: tango toc: TRUE latex_engine: xelatex fig_caption: FALSE @@ -19,7 +20,7 @@ url_color: blue --- ```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) +knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, message = FALSE) library(tidyverse) ``` @@ -32,7 +33,9 @@ Up to this point, we have not thought about the style of R coding we will be usi The choice of R 'dialect' depends on personal preference. Some prefer to use the 'base R' approach that does not rely on any packages that may need updating, making it a more stable approach. However, base R can be difficult to read for those not comfortable with coding. -![boyfriend tidyverse meme](images/r_meme.png) +```{r boyfriend meme, echo=FALSE, out.width="75%"} +knitr::include_graphics("images/r_meme.png") +``` The alternative approach that we will be adopting in this course is the 'tidyverse' approach. Tidyverse is a set of packages that have been designed to make R coding more readable and efficient. They have been designed with reproducibility in mind, which means there is a wealth of online (mostly free), well-written resources available to help use these packages. @@ -145,13 +148,13 @@ select(csp_2020, ons_code:region) The `select` function can also be combined with a number of 'selection helper' functions that help us select variables based on naming conventions: -`starts_with("xyz")` returns all variables with names beginning `xyz` -`ends_with("xyz")` returns all variables with names ending `xyz` -`contains("xyz")` returns all variables that have `xyz` within their name +- `starts_with("xyz")` returns all variables with names beginning `xyz` +- `ends_with("xyz")` returns all variables with names ending `xyz` +- `contains("xyz")` returns all variables that have `xyz` within their name Or based on whether they match a condition: -`where(is.numeric)` returns all variables that are classed as numeric +- `where(is.numeric)` returns all variables that are classed as numeric For a full list of these selection helpers, access the helpfile using `?tidyr_tidy_select`. diff --git a/session2_notes.pdf b/session2_notes.pdf index e626886..1996686 100644 Binary files a/session2_notes.pdf and b/session2_notes.pdf differ diff --git a/session3_notes.Rmd b/session3_notes.Rmd index eeb2f18..947d180 100644 --- a/session3_notes.Rmd +++ b/session3_notes.Rmd @@ -5,6 +5,7 @@ author: "Sophie Lee" fontfamily: lmodern output: pdf_document: + highlight: tango toc: TRUE latex_engine: xelatex fig_height: 4 @@ -21,7 +22,7 @@ url_color: blue --- ```{r setup, include=FALSE} -knitr::opts_chunk$set(echo = TRUE) +knitr::opts_chunk$set(echo = TRUE, collapse = FALSE, message = FALSE) library(tidyverse) ``` \newpage @@ -96,7 +97,7 @@ csp_201520 <- list.files(path = "data", pattern = "CSP_20") %>% The dataset containing core spending power in England between 2015 and 2020 is currently in what is known as **wide format**. This means there is a variable per measure per year, making the object very wide. -Some analyses and visualisations, particularly those used for temporal data, require a time variable in the dataset (for example, year). This requires the data to be in a different format, known as**long format**. Long format is where each row contains an observation per year (making the data much longer and narrower). +Some analyses and visualisations, particularly those used for temporal data, require a time variable in the dataset (for example, year). This requires the data to be in a different format, known as **long format**. Long format is where each row contains an observation per year (making the data much longer and narrower). To convert data between wide and long formats, we can use the tidyverse functions `pivot_longer` and `pivot_wider`. @@ -113,9 +114,13 @@ Using a combination of the helpfile (`?pivot_longer`) and vignette, the argument csp_long <- pivot_longer(csp_201520, # Pivot columns sfa_2015 up to and including rsdg_2020 cols = sfa_2015:rsdg_2020, - # Separate the old variable names in two, keep the prefix as it was, and put the suffix into a new variable, year + # Separate the old variable names in two, + # keep the prefix as it was, and put the suffix + # into a new variable, year names_to = c(".value", "year"), - # The name prefix and suffix were separated by an _, the prefix can take different lengths, the suffix is always the final 4 characters + # The name prefix and suffix were separated by an _, + # the prefix can take different lengths, the suffix + # is always the final 4 characters names_pattern = "(.*)_(....)") # Check the new, long dataset's structure diff --git a/session3_notes.pdf b/session3_notes.pdf index 45ce9b1..1f39135 100644 Binary files a/session3_notes.pdf and b/session3_notes.pdf differ diff --git a/session4_notes.Rmd b/session4_notes.Rmd index e444cf3..c8558fd 100644 --- a/session4_notes.Rmd +++ b/session4_notes.Rmd @@ -5,6 +5,7 @@ author: "Sophie Lee" fontfamily: lmodern output: pdf_document: + highlight: tango toc: TRUE latex_engine: xelatex fig_height: 4 @@ -24,7 +25,7 @@ url_color: blue ```{r setup, include=FALSE} pacman::p_load(flextable, tidyverse) -knitr::opts_chunk$set(echo = TRUE) +knitr::opts_chunk$set(echo = TRUE, collapse = TRUE, message = FALSE) csp_2020 <- read_csv("data/CSP_2020.csv") csp_long2 <- read_csv("data/CSP_long_201520.csv") @@ -40,7 +41,7 @@ Data visualisation is a powerful tool with multiple important uses. First, visua The most appropriate choice of visualisation will depend on the type of variable(s) we wish to display, the number of variables and the message we are trying to disseminate. Common plots used to display combinations of different types of data are given in following table: -```{r Visualisation table, echo = FALSE, message = FALSE} +```{r Visualisation table, include = FALSE} vis_tab <- data.frame(n_vars = c(rep("One variable", 5), rep("Two variables", 5), rep("> 2 variables", 2)), type_vars = c(rep("Categorical", 2), "Numerical", "Spatial", @@ -121,7 +122,8 @@ This outlier is the Greater London Authority which is a combination of local aut ```{r Scatter without London} # Take the csp_2020 data, and then csp_2020 %>% - # Return all rows where authority is not equal to Greater London Authority, and then + # Return all rows where authority is not equal to Greater London Authority, + # and then filter(authority != "Greater London Authority") %>% # Generate a plot ggplot( ) + @@ -190,7 +192,8 @@ Aesthetic properties of the geom object may also be set manually, outside of the ```{r Manually setting aesthetics} ggplot(csp_nolon_2020_new) + geom_point(aes(x = sfa_2020, y = ct_total_2020), - # Adding the colour outside of the aes wrapper as it is not from the data + # Adding the colour outside of the aes wrapper as it is not + # from the data colour = "blue") ``` diff --git a/session4_notes.pdf b/session4_notes.pdf index 33fda98..ebe009f 100644 Binary files a/session4_notes.pdf and b/session4_notes.pdf differ diff --git a/session5_notes.pdf b/session5_notes.pdf index 5f05f80..51ad3ef 100644 Binary files a/session5_notes.pdf and b/session5_notes.pdf differ