dynamic.Rmd

# Dynamic branching {#dynamic}

```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(
  drake_make_menu = FALSE,
  drake_clean_menu = FALSE,
  warnPartialMatchArgs = FALSE,
  crayon.enabled = FALSE,
  readr.show_progress = FALSE,
  tidyverse.quiet = TRUE
)
```

```{r, echo = FALSE, message = FALSE}
library(broom)
library(drake)
library(gapminder)
library(tidyverse)
```

## A note about versions

The first release of dynamic branching was in `drake` version 7.8.0. In subsequent versions, dynamic branching behaves differently. This manual describes how dynamic branching works in development `drake` (to become version 7.9.0 in early January 2020). If you are using version 7.8.0, please refer to [this version of the chapter](https://github.com/ropensci-books/drake/blob/c4dfa6dd71b5ffa4c6027633ae048d2ab0513c6d/dynamic.Rmd) instead.

## Motivation

In large workflows, you may need more targets than you can easily type in a plan, and you may not be able to fully specify all targets in advance. Dynamic branching is an interface to declare new targets while `make()` is running. It lets you create more compact plans and graphs, it is easier to use than [static branching](#static), and it improves the startup speed of `make()` and friends.

## Which kind of branching should I use?

With dynamic branching, `make()` is faster to initialize, and you have far more flexibility. With [static branching](#static), you have meaningful target names, and it is easier to predict what the plan is going to do in advance. There is a ton of room for overlap and personal judgement, and you can even use both kinds of branching together.

## Dynamic targets

A dynamic target is a [vector](https://vctrs.r-lib.org/) of *sub-targets*. We let `make()` figure out which sub-targets to create and how to aggregate them.

As an example, let's fit a regression model to each continent in [Gapminder data](https://github.com/jennybc/gapminder). To activate dynamic branching, use the `dynamic` argument of `target()`.

```{r}
library(broom)
library(drake)
library(gapminder)
library(tidyverse)

# Split the Gapminder data by continent.
gapminder_continents <- function() {
  gapminder %>%
    mutate(gdpPercap = scale(gdpPercap)) %>%
    split(f = .$continent)
}

# Fit a model to a continent.
fit_model <- function(continent_data) {
  data <- continent_data[[1]]
  data %>%
    lm(formula = gdpPercap ~ year) %>%
    tidy() %>%
    mutate(continent = data$continent[1]) %>%
    select(continent, term, statistic, p.value)
}

plan <- drake_plan(
  continents = gapminder_continents(),
  model = target(fit_model(continents), dynamic = map(continents))
)

make(plan)
```

The data type of every sub-target is the same as the dynamic target it belongs to. In other words, `model` and `model_23022788` are both data frames, and `readd(model)` and friends automatically concatenate all the `model_*` sub-targets.

```{r}
readd(model)
```

This behavior is powered by the [`vctrs`](https://vctrs.r-lib.org/). A dynamic target like `model` above is really a "`vctr`" of sub-targets. Under the hood, the aggregated value of `model` is what you get from calling `vec_c()` on all the `model_*` sub-targets. When you dynamically `map()` over a non-dynamic object, you are taking slices with `vec_slice()`. (When you `map()` over a dynamic target, each element is a sub-target and `vec_slice()` is not necessary.)

```{r}
library(vctrs)

# same as readd(model)
s <- subtargets(model)
vec_c(
  readd(s[1], character_only = TRUE),
  readd(s[2], character_only = TRUE),
  readd(s[3], character_only = TRUE),
  readd(s[4], character_only = TRUE),
  readd(s[5], character_only = TRUE)
)

loadd(model)

# Second slice if you were to map() over mtcars.
vec_slice(mtcars, 2)

# Fifth slice if you were to map() over letters.
vec_slice(letters, 5)
```

You can use `vec_c()` and `vec_slice()` to anticipate edge cases in dynamic branching.

```{r}
# If you map() over a list, each sub-target is a single-element list.
vec_slice(list(1, 2), 1)
```

```{r}
# If each sub-target has multiple elements,
# the aggregated target (e.g. from readd())
# will have more elements than sub-targets.
subtarget1 <- c(1, 2)
subtarget2 <- c(3, 4)
vec_c(subtarget1, subtarget2)
```

Back in our plan, `target(fit_model(continents), dynamic = map(continents))` is equivalent to commands `fit_model(continents[1])` through `fit_model(continents[5])`. Since `continents` is really a list of data frames, `continents[1]` through `continents[5]` are also lists of data frames, which is why we need the line `data <- continent_data[[1]]` in `fit_model()`.

To post-process our models, we can work with either the individual sub-targets or the whole vector of all the models. Below, `year` uses the former and `intercept` uses the latter.

```{r}
plan <- drake_plan(
  continents = gapminder_continents(),
  model = target(fit_model(continents), dynamic = map(continents)),
  # Filter each model individually:
  year = target(filter(model, term == "year"), dynamic = map(model)),
  # Aggregate all the models, then filter the whole vector:
  intercept = filter(model, term != "year")
)

make(plan)
```

```{r}
readd(year)
```

```{r}
readd(intercept)
```


If automatic concatenation of sub-targets is confusing (e.g. if some sub-targets are `NULL`, as in <https://github.com/ropensci-books/drake/issues/142>) you can read the dynamic target as a named list (only in `drake` version 7.10.0 and above).

```{r}
readd(model, subtarget_list = TRUE) # Requires drake >= 7.10.0.
```

Alternatively, you can identify an individual sub-target by its index.

```{r}
subtargets(model)

readd(model, subtargets = 2) # equivalent to readd() on a single model_* sub-target
```

If you don't know the index offhand, you can find out using the sub-target's name.

```{r, echo = FALSE}
subtarget <- subtargets(model)[2]
```

```{r}
print(subtarget)

which(subtarget == subtargets(model))
```

If the sub-target errored out and `subtargets()` fails, the individual sub-target metadata will have a `subtarget_index` field.

```{r, eval = FALSE}
diagnose(subtarget, character_only = TRUE)$subtarget_index
#> [1] 2
```

Either way, once you have the sub-target's index, you can retrieve the section of data that the sub-target took as input. Below, we load the part of `contenents` that the second sub-target of `model` used during `make()`.

```{r}
vctrs::vec_slice(readd(continents), 2)
```

If `continents` were dynamic, we could have just used `readd(continents, subtargets = 2)`. But `continents` was a static target, so we needed to replicate `drake`'s dynamic branching behavior using `vctrs`.

## Dynamic transformations

Dynamic branching supports transformations `map()`, `cross()`, and `group()`. These transformations tell `drake` how to create sub-targets.

### `map()`

`map()` iterates over the [vector slices](https://vctrs.r-lib.org/reference/vec_slice.html) of the targets you supply as arguments. We saw above how `map()` iterates over lists. If you give it a data frame, it will map over the rows.

```{r}
plan <- drake_plan(
  subset = head(gapminder),
  row = target(subset, dynamic = map(subset))
)

make(plan)
```

```{r}
readd(row_9939cae3)
```

If you supply multiple targets, `map()` iterates over the slices of each.

```{r}
plan <- drake_plan(
  numbers = seq_len(2),
  letters = c("a", "b"),
  zipped = target(paste0(numbers, letters), dynamic = map(numbers, letters))
)

make(plan)
```

```{r}
readd(zipped)
```

### `cross()`

`cross()` creates a new sub-target for each combination of targets you supply as arguments.

```{r}
plan <- drake_plan(
  numbers = seq_len(2),
  letters = c("a", "b"),
  combo = target(paste0(numbers, letters), dynamic = cross(numbers, letters))
)

make(plan)
```

```{r}
readd(combo)
```

### `group()`

With `group()`, you can create multiple aggregates of a given target. Use the `.by` argument to set a grouping variable.

```{r}
plan <- drake_plan(
  data = gapminder,
  by = data$continent,
  gdp = target(
    tibble(median = median(data$gdpPercap), continent = by[1]),
    dynamic = group(data, .by = by)
  )
)

make(plan)
```

```{r}
readd(gdp)
```

## Trace

All dynamic transforms have a `.trace` argument to record optional metadata for each sub-target. In the example from `group()`, the trace is another way to keep track of the continent of each median GDP value.

```{r}
plan <- drake_plan(
  data = gapminder,
  by = data$continent,
  gdp = target(
    median(data$gdpPercap),
    dynamic = group(data, .by = by, .trace = by)
  )
)

make(plan)
```

The `gdp` target no longer contains any explicit reference to continent.

```{r}
readd(gdp)
```

However, we can look up the continents in the trace.

```{r}
read_trace("by", gdp)
```


## `max_expand`

Suppose we want a model for each *country*.

```{r}
gapminder_countries <- function() {
  gapminder %>%
    mutate(gdpPercap = scale(gdpPercap)) %>%
    split(f = .$country)
}

plan <- drake_plan(
  countries = gapminder_countries(),
  model = target(fit_model(countries), dynamic = map(countries))
)
```

The Gapminder dataset has 142 countries, which can get overwhelming. In the early stages of the workflow when we are still debugging and testing, we can limit the number of sub-targets using the `max_expand` argument of `make()`.

```{r}
make(plan, max_expand = 2)
```

```{r}
readd(model)
```

Then, when we are confident and ready, we can scale up to the full number of models.

```{r, eval = FALSE}
make(plan)
```