-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathstorage.Rmd
156 lines (109 loc) · 4.06 KB
/
storage.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# Storage {#storage}
```{r, message = FALSE, warning = FALSE, echo = FALSE}
knitr::opts_knit$set(root.dir = fs::dir_create(tempfile()))
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
options(
drake_make_menu = FALSE,
drake_clean_menu = FALSE,
warnPartialMatchArgs = FALSE,
crayon.enabled = FALSE,
readr.show_progress = FALSE,
tidyverse.quiet = TRUE
)
```
```{r, message = FALSE, warning = FALSE, echo = FALSE}
library(drake)
library(tidyverse)
```
## `drake`'s cache
When you run `make()`, `drake` stores your targets in a hidden storage cache.
```{r}
library(drake)
load_mtcars_example() # from https://github.com/wlandau/drake-examples/tree/main/mtcars
make(my_plan, verbose = 0L)
```
The default cache is a hidden `.drake` folder.
```{r, eval = FALSE}
find_cache()
### [1] "/home/you/project/.drake"
```
`drake`'s `loadd()` and `readd()` functions load targets into memory.
```{r}
loadd(large)
head(large)
head(readd(small))
```
## Efficient target storage
`drake` supports custom formats for large and specialized targets. For example, the `"fst"` format uses the [`fst`](https://github.com/fstpackage/fst) package to save data frames faster. Simply enclose the command and the format together with the `target()` function.
```{r, eval = FALSE}
library(drake)
n <- 1e8 # Each target is 1.6 GB in memory.
plan <- drake_plan(
data_fst = target(
data.frame(x = runif(n), y = runif(n)),
format = "fst"
),
data_old = data.frame(x = runif(n), y = runif(n))
)
make(plan)
#> target data_fst
#> target data_old
build_times(type = "build")
#> # A tibble: 2 x 4
#> target elapsed user system
#> <chr> <Duration> <Duration> <Duration>
#> 1 data_fst 13.93s 37.562s 7.954s
#> 2 data_old 184s (~3.07 minutes) 177s (~2.95 minutes) 4.157s
```
For more details and a complete list of formats, see <https://books.ropensci.org/drake/plans.html#special-data-formats-for-targets>.
## Why is my cache so big?
### Old targets
By default, `drake` holds on to all your targets from all your runs of `make()`. Even if you run `clean()`, the data stays in the cache in case you need to recover it.
```{r}
clean()
make(my_plan, recover = TRUE)
```
If you really want to remove old historical values of targets, run `drake_gc()` or `drake_cache()$gc()`.
```{r}
drake_gc()
```
`clean()` also has a `garbage_collection` argument for this purpose. Here is a slick way to remove historical targets and targets no longer in your plan.
```{r}
clean(list = cached_unplanned(my_plan), garbage_collection = TRUE)
```
### Garbage from interrupted builds
If `make()` crashes or gets interrupted, old files can accumulate in `.drake/scratch/` and `.drake/drake/tmp/`. As long as `make()` is no longer running, can safely remove the files in those folders (but keep the folders themselves).
## Interfaces to the cache
`drake` uses the [storr](https://github.com/richfitz/storr) package to create and modify caches.
```{r}
library(storr)
cache <- storr_rds(".drake")
head(cache$list())
head(cache$get("small"))
```
`drake` has its own interface on top of [storr](https://github.com/richfitz/storr) to make it easier to work with the default `.drake/` cache. The `loadd()`, `readd()`, and `cached()` functions explore saved targets.
```{r}
head(cached())
head(readd(small))
loadd(large)
head(large)
rm(large) # Does not remove `large` from the cache.
```
`new_cache()` create caches and `drake_cache()` recovers existing ones. (`drake_cache()` is only supported in `drake` version 7.4.0 and above.)
```{r}
cache <- drake_cache()
cache$driver$path
cache <- drake_cache(path = ".drake") # File path to drake's cache.
cache$driver$path
```
You can supply your own cache to `make()` and friends (including specialized `storr` caches like [`storr_dbi()`](http://richfitz.github.io/storr/reference/storr_dbi.html)).
```{r}
plan <- drake_plan(x = 1, y = sqrt(x))
make(plan, cache = cache)
vis_drake_graph(plan, cache = cache)
```
Destroy caches to remove them from your file system.
```{r}
cache$destroy()
file.exists(".drake")
```