Skip to content

Commit

Permalink
Lecture notes for 11/29 on fp and purrrr (#2)
Browse files Browse the repository at this point in the history
  • Loading branch information
boyiguo1 committed Nov 28, 2022
1 parent b3ee93b commit b754181
Showing 1 changed file with 299 additions and 10 deletions.
309 changes: 299 additions & 10 deletions posts/2022-11-29-purrr-fun-programming/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,34 @@ categories: [module 3, week 6, functions, functional, programming, purrr]

**Before class, you can prepare by reading the following materials:**

1. Add here.
2. Add here.
1. <https://adv-r.hadley.nz/functionals.html>
2. <https://raw.githubusercontent.com/rstudio/cheatsheets/main/purrr.pdf>
:::


### Prerequisites
Before starting you must install the additional package:

* `purrr` - this provides a consistent functional programming interface to work with functions and vectors

You can do this by calling

```{r}
#| eval: false
install.packages("purrr")
```

or use the “Install Packages…” option from the “Tools” menu in RStudio.



### Acknowledgements

Material for this lecture was borrowed and adopted from

- Add here.
- <https://adv-r.hadley.nz/fp.html>
- <https://adv-r.hadley.nz/functionals.html>
- <https://raw.githubusercontent.com/rstudio/cheatsheets/main/purrr.pdf>

# Learning objectives

Expand All @@ -39,25 +58,295 @@ Material for this lecture was borrowed and adopted from

**At the end of this lesson you will:**

- Add here.
- Be familiar with the concept of _functional programming_
- Get comfortable with the major functions R package `purrr`, e.g. `map`, `reduce`, predicate functions
- Write your loops with `map` functions instead of `for` loops
:::

# Add lecture here
# Functional Programming
## The characteristics
At it is core, functional programming treats functions equally as other data structures, namely **first class functions**.

> In R, this means that you can do many of the things with a function that you can do with a vector: you can assign them to variables, store them in lists, pass them as arguments to other functions, create them inside functions, and even return them as the result of a function.
## What do you mean?

* Assign names to functions
```{r}
foo <- function(){
return("This is foo.")
}
```

* Store functions in a list
```{r}
foo_list <- list(
fun_1 = function() return("foo_1"),
fun_2 = function() return("foo_2")
)
str(foo_list)
```

* Pass functions as arguments to other functions
```{r}
shell <- function(f) f()
shell(foo_list$fun_1)
shell(foo_list$fun_2)
```

* Create functions inside of functions & return them as the result of a function
```{r}
foo_wrap <- function(){
foo_2 <- function(){
return("This is foo_2.")
}
return(foo_2)
}
foo_wrap()
(foo_wrap())()
```

The bottom line, you can manipulate functions as the same way as you can to a vector or a matrix.

## Why functional programming important?
Functional programming introduces a new style of programming, namely **functional style**. Broadly speaking, this programming style encourage one to write a big function as many smaller isolated functions, where each function address one specific task.

As a by-product, **funcitonal style** motivates more humanly readable code, and recyclable code.
```{r}
#| eval: false
"data_set.csv" |>
import_data_from_file() |>
run_regression() |>
model_diagnostic() |>
model_visualization()
"data_set2.csv" |>
import_data_from_file() |>
run_different_regression() |>
model_diagnostic() |>
model_visualization()
```

::: callout-tip
### Pipe operators
R provides some pipe operators to make code readable, e.g. `|>` from the base R, `%>%` from the package `magrittr`. These pipe operators operate like a pipe, piping the output from the previous function (left hand side of the pipe operator) to the following function (right hand side of the pipe operator). The pipe operator `|>` was introduced in R 4.1.0 and requires no loading of additional packages, unlike `%>%`.

A keyboard short cut to type a pipe operator in RStudio is `shift+cmd+m` for Mac or `shift+ctrl+m` in Windows.
:::

# `purrr`: the functional programming toolkit

The R package `purrr`, as one important component of the [`tidyverse`](https://www.tidyverse.org/), provides a interface to manipulate vectors in the _functional style_.

> `purrr` enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
::: callout-tip
### `purrr` cheatsheet
It is very difficulty, if not impossible, to remember all functions that a package offers as well as their use cases. Hence, `purrr` developers offer a nice compact cheatsheet with visualizations at <https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf>. Similar cheatsheets are available for other `tidyverse` packages.
:::

The most popular function in `purrr` is `map()` which iterates over the supplied data structure and apply a function during this iteration. `purrr` also offers a series of useful functions to manipulate `list`.


## The `map` family
The `map` family of functions provides a convenient way to iterate through vectors or lists and apply functions during this iteration. Depending on the dimension of the input and the format of the output, there are many different variants of the basic `map` function.

::: callout-tip
### How does `map` relate to functional programming
Because their arguments include functions (`.f`) besides data (`.x`), `map` functions are considered as a convinient interface to implement functional programming.
:::

# Post-lecture materials

### Final Questions
### `map` as a foor loop
```{r}
library(purrr)
Here are some post-lecture questions to help you think about the material discussed.
triple <- function(x) x * 3
# for loop
loop_ret <- list()
for(i in 1:3){
loop_ret[i] <- triple(i)
}
# map implementation
map_eg1 <- map(.x = 1:3, .f = triple)
map_eg2 <- map(.x = 1:3, .f = ~triple(.x))
map_eg3 <- map(.x = 1:3, .f = function(x) triple(x))
identical(loop_ret,map_eg1)
identical(loop_ret,map_eg2)
identical(loop_ret,map_eg3)
```

### `map` with a data frame
```{r}
tmp_dat <- data.frame(
x = 1:5,
y = 6:10
)
tmp_dat |>
map(.f = mean)
```

::: callout-tip
### `data.frame` vs `list`
`data.frame` is a special case of `list`, where each column as one item of the list. Don't confuse with each row as an item.
```{r}
class(data.frame())
typeof(data.frame())
```
:::

### Extra arguments for functions
```{r}
tmp_dat2 <- as.list(tmp_dat)
tmp_dat2$y[6] <- NA
str(tmp_dat2)
tmp_dat2 |> map(.f = mean) # No extra arguments
tmp_dat2 |>
map(.f = mean, na.rm = TRUE) # With extra arguments
tmp_dat2 |>
map(.f = function(x, remove_na) mean(x, na.rm = remove_na),
remove_na = TRUE)
```

### Stratified model with `map`
We use the `mtcar` from the package `datasets` to demonstrate
```{r}
library(datasets)
str(mtcars)
unique(mtcars$cyl) # different numbers of cylinders
```

We are interested in the averaged miles per gallon for vehicles with different numbers of cylinders
```{r}
# Create a dataset for cylinders level
str_dat <- mtcars |> split(mtcars$cyl)
length(str_dat)
str(str_dat)
str_dat |>
map(.f = ~mean(.x$mpg))
```
### Matrix as the output
The `map` family include functions that organize the output in a different data structure, whose names follow the pattern `map_*`. As we've seen, the `map` function return a list. The following functions will return a vector of a specific kind, e.g. `map_lgl` returns a vector of logical variables, `map_chr` returns a vector of strings. It is also possible to return the the results as data frames by row binding (`map_dfr`) or column binding (`map_dfc`).

```{r}
str_dat |>
map_dbl(.f = ~mean(.x$mpg)) # returns a vector of doubles
str_dat |>
map_dfr(.f = ~colMeans(.x)) # return a data frame by row binding
str_dat |>
map_dfc(.f = ~colMeans(.x)) # return a data frame by col binding
```


### Multiple Input
It is possible that an operation requires a pair of variables as input. While it is still managable in `map` to achieve this. There are better options provided in `purrr`, specifically `map2` and `pmap`.

```{r}
map_avg <- map_dbl(.x = mtcars, .f = mean)
map2_avg <- map2_dbl(.x = mtcars,
.y = list(weight = 1/nrow(mtcars)),
.f = ~sum(.x*.y))
identical(map_avg, map2_avg)
pmap_avg <- pmap_dbl(list(x = mtcars,
y = list(weight = 1/(2*nrow(mtcars))),
z = list(weight2 = 2)),
.f = ~sum(..1*..2*..3))
identical(map_avg, pmap_avg)
# Use element names in pmap
mtcars$weight <- 1/2
mtcars$weight2 <- 2
pmap_eg2 <- pmap_dbl(mtcars,
.f = function(mpg, weight, weight2, ...){
mpg * weight * weight2
})
identical(pmap_eg2, mtcars$mpg)
```

### No output
It is possible that some operations don't need any output during the iteration, e.g. saving the dataset. In this case, `map` will force an output, e.g. `NULL`. One can consider using `walk` instead. The function `walk` behaves exactly the same as `map` but does not output anything.
```{R}
tmp_fldr <- tempdir()
map2(.x = str_dat,
.y = 1:length(str_dat),
.f = ~saveRDS(.x,
file = paste0(tmp_fldr, "/",.y, ".rds"))
)
# No output
walk2(.x = str_dat,
.y = (1:length(str_dat)),
.f = ~saveRDS(.x,
file = paste0(tmp_fldr, "/",.y, ".rds"))
)
```

## Other functions in `purrr`
### `reduce` and `accumulate`
`purrr` also provides functions to summarize a list by a preferred operator, namesly `reduce`. Its variant `accumulate` provides the history of this reduction process.


```{r}
mtcars$weight <- 1/(2*nrow(mtcars))
mtcars$weight2 <- 2
reduce_eg <-
pmap_dbl(mtcars,
.f = function(mpg, weight, weight2, ...){
mpg * weight * weight2
}) |>
reduce(`+`)
pmap_dbl(mtcars,
.f = function(mpg, weight, weight2, ...){
mpg * weight * weight2
})|>
head() |> # Only show the first 7 operations
accumulate(`+`)
```

### Working with list

Let's move to the `purrr` cheatsheet at <https://github.com/rstudio/cheatsheets/blob/main/purrr.pdf>.

# Summary
* Introduction to functional programming.
* The R package `purrr` provides a nice interface to functional programming and list manipulation.
* The function `map` and its aternative `map_*` provide a neat way to iterate over a list or vector with the output in different data structures.
* The function `map2` and `pmap` allow having more than one list as input.
* The function `walk` and its alternatives `walk2`, `walk_*` do not provide any output.
* The functions `reduce` and `accumulate` help to summarize a list with a preferred operator or function.

# Post-lecture materials

::: callout-note
### Questions
1. What does `imap` and `iwalk` do? In this lecture note, can you find the one example possible to substitute with `imap` and `iwalk`? Hint: see the sub-section named _No output_

1. Is there any function in the R base package provide nice interface for functional programming? Hint: `?with`, `?within`

1. Add here.
2. Can you write a section of code to demonstrate the central limited theorem primarily using the `purrr` package and/or using the R base package?
:::

### Additional Resources

::: callout-tip
- Add here.
- <https://r4ds.had.co.nz/iteration.html>
- <https://www.stat.umn.edu/geyer/8054/notes/functional.html>
:::

0 comments on commit b754181

Please sign in to comment.