diff --git a/posts/2022-11-29-purrr-fun-programming/index.qmd b/posts/2022-11-29-purrr-fun-programming/index.qmd index af3fc9c..972991e 100644 --- a/posts/2022-11-29-purrr-fun-programming/index.qmd +++ b/posts/2022-11-29-purrr-fun-programming/index.qmd @@ -22,15 +22,34 @@ categories: [module 3, week 6, functions, functional, programming, purrr] **Before class, you can prepare by reading the following materials:** -1. Add here. -2. Add here. +1. +2. ::: + +### Prerequisites +Before starting you must install the additional package: + +* `purrr` - this provides a consistent functional programming interface to work with functions and vectors + +You can do this by calling + +```{r} +#| eval: false +install.packages("purrr") +``` + +or use the “Install Packages…” option from the “Tools” menu in RStudio. + + + ### Acknowledgements Material for this lecture was borrowed and adopted from -- Add here. +- +- +- # Learning objectives @@ -39,25 +58,295 @@ Material for this lecture was borrowed and adopted from **At the end of this lesson you will:** -- Add here. +- Be familiar with the concept of _functional programming_ +- Get comfortable with the major functions R package `purrr`, e.g. `map`, `reduce`, predicate functions +- Write your loops with `map` functions instead of `for` loops ::: -# Add lecture here +# Functional Programming +## The characteristics +At it is core, functional programming treats functions equally as other data structures, namely **first class functions**. + +> In R, this means that you can do many of the things with a function that you can do with a vector: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function. + +## What do you mean? + +* Assign names to functions +```{r} +foo <- function(){ + return("This is foo.") +} +``` + +* Store functions in a list +```{r} +foo_list <- list( + fun_1 = function() return("foo_1"), + fun_2 = function() return("foo_2") +) + +str(foo_list) +``` + +* Pass functions as arguments to other functions +```{r} +shell <- function(f) f() +shell(foo_list$fun_1) +shell(foo_list$fun_2) +``` + +* Create functions inside of functions & return them as the result of a function +```{r} +foo_wrap <- function(){ + foo_2 <- function(){ + return("This is foo_2.") + } + return(foo_2) +} + +foo_wrap() +(foo_wrap())() +``` + +The bottom line, you can manipulate functions as the same way as you can to a vector or a matrix. + +## Why functional programming important? +Functional programming introduces a new style of programming, namely **functional style**. Broadly speaking, this programming style encourage one to write a big function as many smaller isolated functions, where each function address one specific task. + +As a by-product, **funcitonal style** motivates more humanly readable code, and recyclable code. +```{r} +#| eval: false + +"data_set.csv" |> + import_data_from_file() |> + run_regression() |> + model_diagnostic() |> + model_visualization() + +"data_set2.csv" |> + import_data_from_file() |> + run_different_regression() |> + model_diagnostic() |> + model_visualization() +``` + +::: callout-tip +### Pipe operators +R provides some pipe operators to make code readable, e.g. `|>` from the base R, `%>%` from the package `magrittr`. These pipe operators operate like a pipe, piping the output from the previous function (left hand side of the pipe operator) to the following function (right hand side of the pipe operator). The pipe operator `|>` was introduced in R 4.1.0 and requires no loading of additional packages, unlike `%>%`. + +A keyboard short cut to type a pipe operator in RStudio is `shift+cmd+m` for Mac or `shift+ctrl+m` in Windows. +::: + +# `purrr`: the functional programming toolkit + +The R package `purrr`, as one important component of the [`tidyverse`](https://www.tidyverse.org/), provides a interface to manipulate vectors in the _functional style_. + +> `purrr` enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. + +::: callout-tip +### `purrr` cheatsheet +It is very difficulty, if not impossible, to remember all functions that a package offers as well as their use cases. Hence, `purrr` developers offer a nice compact cheatsheet with visualizations at . Similar cheatsheets are available for other `tidyverse` packages. +::: + +The most popular function in `purrr` is `map()` which iterates over the supplied data structure and apply a function during this iteration. `purrr` also offers a series of useful functions to manipulate `list`. + + +## The `map` family +The `map` family of functions provides a convenient way to iterate through vectors or lists and apply functions during this iteration. Depending on the dimension of the input and the format of the output, there are many different variants of the basic `map` function. + +::: callout-tip +### How does `map` relate to functional programming +Because their arguments include functions (`.f`) besides data (`.x`), `map` functions are considered as a convinient interface to implement functional programming. +::: -# Post-lecture materials -### Final Questions +### `map` as a foor loop +```{r} +library(purrr) -Here are some post-lecture questions to help you think about the material discussed. +triple <- function(x) x * 3 + +# for loop +loop_ret <- list() +for(i in 1:3){ + loop_ret[i] <- triple(i) +} + +# map implementation +map_eg1 <- map(.x = 1:3, .f = triple) +map_eg2 <- map(.x = 1:3, .f = ~triple(.x)) +map_eg3 <- map(.x = 1:3, .f = function(x) triple(x)) + +identical(loop_ret,map_eg1) +identical(loop_ret,map_eg2) +identical(loop_ret,map_eg3) +``` + +### `map` with a data frame +```{r} +tmp_dat <- data.frame( + x = 1:5, + y = 6:10 +) + +tmp_dat |> + map(.f = mean) +``` + +::: callout-tip +### `data.frame` vs `list` +`data.frame` is a special case of `list`, where each column as one item of the list. Don't confuse with each row as an item. +```{r} +class(data.frame()) +typeof(data.frame()) +``` +::: + +### Extra arguments for functions +```{r} +tmp_dat2 <- as.list(tmp_dat) +tmp_dat2$y[6] <- NA +str(tmp_dat2) + +tmp_dat2 |> map(.f = mean) # No extra arguments +tmp_dat2 |> + map(.f = mean, na.rm = TRUE) # With extra arguments +tmp_dat2 |> + map(.f = function(x, remove_na) mean(x, na.rm = remove_na), + remove_na = TRUE) +``` + +### Stratified model with `map` +We use the `mtcar` from the package `datasets` to demonstrate +```{r} +library(datasets) +str(mtcars) + +unique(mtcars$cyl) # different numbers of cylinders +``` + +We are interested in the averaged miles per gallon for vehicles with different numbers of cylinders +```{r} +# Create a dataset for cylinders level +str_dat <- mtcars |> split(mtcars$cyl) +length(str_dat) +str(str_dat) + +str_dat |> + map(.f = ~mean(.x$mpg)) +``` +### Matrix as the output +The `map` family include functions that organize the output in a different data structure, whose names follow the pattern `map_*`. As we've seen, the `map` function return a list. The following functions will return a vector of a specific kind, e.g. `map_lgl` returns a vector of logical variables, `map_chr` returns a vector of strings. It is also possible to return the the results as data frames by row binding (`map_dfr`) or column binding (`map_dfc`). + +```{r} +str_dat |> + map_dbl(.f = ~mean(.x$mpg)) # returns a vector of doubles + +str_dat |> + map_dfr(.f = ~colMeans(.x)) # return a data frame by row binding + +str_dat |> + map_dfc(.f = ~colMeans(.x)) # return a data frame by col binding +``` + + +### Multiple Input +It is possible that an operation requires a pair of variables as input. While it is still managable in `map` to achieve this. There are better options provided in `purrr`, specifically `map2` and `pmap`. + +```{r} +map_avg <- map_dbl(.x = mtcars, .f = mean) + +map2_avg <- map2_dbl(.x = mtcars, + .y = list(weight = 1/nrow(mtcars)), + .f = ~sum(.x*.y)) +identical(map_avg, map2_avg) + +pmap_avg <- pmap_dbl(list(x = mtcars, + y = list(weight = 1/(2*nrow(mtcars))), + z = list(weight2 = 2)), + .f = ~sum(..1*..2*..3)) +identical(map_avg, pmap_avg) + +# Use element names in pmap +mtcars$weight <- 1/2 +mtcars$weight2 <- 2 +pmap_eg2 <- pmap_dbl(mtcars, + .f = function(mpg, weight, weight2, ...){ + mpg * weight * weight2 + }) + +identical(pmap_eg2, mtcars$mpg) +``` + +### No output +It is possible that some operations don't need any output during the iteration, e.g. saving the dataset. In this case, `map` will force an output, e.g. `NULL`. One can consider using `walk` instead. The function `walk` behaves exactly the same as `map` but does not output anything. +```{R} +tmp_fldr <- tempdir() + + +map2(.x = str_dat, + .y = 1:length(str_dat), + .f = ~saveRDS(.x, + file = paste0(tmp_fldr, "/",.y, ".rds")) +) + +# No output +walk2(.x = str_dat, + .y = (1:length(str_dat)), + .f = ~saveRDS(.x, + file = paste0(tmp_fldr, "/",.y, ".rds")) +) +``` + +## Other functions in `purrr` +### `reduce` and `accumulate` +`purrr` also provides functions to summarize a list by a preferred operator, namesly `reduce`. Its variant `accumulate` provides the history of this reduction process. + + +```{r} +mtcars$weight <- 1/(2*nrow(mtcars)) +mtcars$weight2 <- 2 +reduce_eg <- + pmap_dbl(mtcars, + .f = function(mpg, weight, weight2, ...){ + mpg * weight * weight2 + }) |> + reduce(`+`) + +pmap_dbl(mtcars, + .f = function(mpg, weight, weight2, ...){ + mpg * weight * weight2 + })|> + head() |> # Only show the first 7 operations + accumulate(`+`) +``` + +### Working with list + +Let's move to the `purrr` cheatsheet at . + +# Summary +* Introduction to functional programming. +* The R package `purrr` provides a nice interface to functional programming and list manipulation. +* The function `map` and its aternative `map_*` provide a neat way to iterate over a list or vector with the output in different data structures. +* The function `map2` and `pmap` allow having more than one list as input. +* The function `walk` and its alternatives `walk2`, `walk_*` do not provide any output. +* The functions `reduce` and `accumulate` help to summarize a list with a preferred operator or function. + +# Post-lecture materials ::: callout-note ### Questions +1. What does `imap` and `iwalk` do? In this lecture note, can you find the one example possible to substitute with `imap` and `iwalk`? Hint: see the sub-section named _No output_ + +1. Is there any function in the R base package provide nice interface for functional programming? Hint: `?with`, `?within` -1. Add here. +2. Can you write a section of code to demonstrate the central limited theorem primarily using the `purrr` package and/or using the R base package? ::: ### Additional Resources ::: callout-tip -- Add here. +- +- :::