forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Environments.Rmd
915 lines (662 loc) · 34.7 KB
/
Environments.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
# Environments {#environments}
\index{environments}
```{r, include = FALSE}
source("common.R")
```
## Introduction
The environment is the data structure that powers scoping. This chapter dives deep into environments, describing their structure in depth, and using them to improve your understanding of the four scoping rules described in Section \@ref(lexical-scoping).
Understanding environments is not necessary for day-to-day use of R. But they are important to understand because they power many important R features like lexical scoping, namespaces, and R6 classes, and interact with evaluation to give you powerful tools for making domain specific languages, like dplyr and ggplot2.
### Quiz {-}
If you can answer the following questions correctly, you already know the most important topics in this chapter. You can find the answers at the end of the chapter in Section \@ref(env-answers).
1. List at least three ways that an environment differs from a list.
1. What is the parent of the global environment? What is the only
environment that doesn't have a parent?
1. What is the enclosing environment of a function? Why is it
important?
1. How do you determine the environment from which a function was called?
1. How are `<-` and `<<-` different?
### Outline {-}
* Section \@ref(env-basics) introduces you to the basic properties
of an environment and shows you how to create your own.
* Section \@ref(env-recursion) provides a function template
for computing with environments, illustrating the idea with a useful
function.
* Section \@ref(special-environments) describes environments used for special
purposes: for packages, within functions, for namespaces, and for
function execution.
* Section \@ref(call-stack) explains the last important environment: the
caller environment. This requires you to learn about the call stack,
that describes how a function was called. You'll have seen the call stack
if you've ever called `traceback()` to aid debugging.
* Section \@ref(explicit-envs) briefly discusses three places where
environments are useful data structures for solving other problems.
### Prerequisites {-}
This chapter will use [rlang](https://rlang.r-lib.org) functions for working with environments, because it allows us to focus on the essence of environments, rather than the incidental details.
```{r setup, message = FALSE}
library(rlang)
```
The `env_` functions in rlang are designed to work with the pipe: all take an environment as the first argument, and many also return an environment. I won't use the pipe in this chapter in the interest of keeping the code as simple as possible, but you should consider it for your own code.
## Environment basics {#env-basics}
Generally, an environment is similar to a named list, with four important exceptions:
* Every name must be unique.
* The names in an environment are not ordered.
* An environment has a parent.
* Environments are not copied when modified.
Let's explore these ideas with code and pictures.
### Basics
\index{environments!creating}
\indexc{env()}
\indexc{new.env()}
\index{assignment}
To create an environment, use `rlang::env()`. It works like `list()`, taking a set of name-value pairs:
```{r}
e1 <- env(
a = FALSE,
b = "a",
c = 2.3,
d = 1:3,
)
```
::: base
Use `new.env()` to create a new environment. Ignore the `hash` and `size` parameters; they are not needed. You cannot simultaneously create and define values; use `$<-`, as shown below.
:::
The job of an environment is to associate, or __bind__, a set of names to a set of values. You can think of an environment as a bag of names, with no implied order (i.e. it doesn't make sense to ask which is the first element in an environment). For that reason, we'll draw the environment as so:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/bindings.png")
```
As discussed in Section \@ref(env-modify), environments have reference semantics: unlike most R objects, when you modify them, you modify them in place, and don't create a copy. One important implication is that environments can contain themselves.
```{r}
e1$d <- e1
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/loop.png")
```
Printing an environment just displays its memory address, which is not terribly useful:
```{r}
e1
```
Instead, we'll use `env_print()` which gives us a little more information:
```{r}
env_print(e1)
```
You can use `env_names()` to get a character vector giving the current bindings
```{r}
env_names(e1)
```
::: base
In R 3.2.0 and greater, use `names()` to list the bindings in an environment. If your code needs to work with R 3.1.0 or earlier, use `ls()`, but note that you'll need to set `all.names = TRUE` to show all bindings.
:::
### Important environments
\index{environments!current}
\index{environments!global}
We'll talk in detail about special environments in \@ref(special-environments), but for now we need to mention two. The current environment, or `current_env()` is the environment in which code is currently executing. When you're experimenting interactively, that's usually the global environment, or `global_env()`. The global environment is sometimes called your "workspace", as it's where all interactive (i.e. outside of a function) computation takes place.
To compare environments, you need to use `identical()` and not `==`. This is because `==` is a vectorised operator, and environments are not vectors.
```{r, error = TRUE}
identical(global_env(), current_env())
global_env() == current_env()
```
:::base
Access the global environment with `globalenv()` and the current environment with `environment()`. The global environment is printed as `R_GlobalEnv` and `.GlobalEnv`.
:::
### Parents
\index{environments!parent}
\indexc{env\_parent()}
Every environment has a __parent__, another environment. In diagrams, the parent is shown as a small pale blue circle and arrow that points to another environment. The parent is what's used to implement lexical scoping: if a name is not found in an environment, then R will look in its parent (and so on). You can set the parent environment by supplying an unnamed argument to `env()`. If you don't supply it, it defaults to the current environment. In the code below, `e2a` is the parent of `e2b`.
```{r}
e2a <- env(d = 4, e = 5)
e2b <- env(e2a, a = 1, b = 2, c = 3)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/parents.png")
```
To save space, I typically won't draw all the ancestors; just remember whenever you see a pale blue circle, there's a parent environment somewhere.
You can find the parent of an environment with `env_parent()`:
```{r}
env_parent(e2b)
env_parent(e2a)
```
Only one environment doesn't have a parent: the __empty__ environment. I draw the empty environment with a hollow parent environment, and where space allows I'll label it with `R_EmptyEnv`, the name R uses.
```{r}
e2c <- env(empty_env(), d = 4, e = 5)
e2d <- env(e2c, a = 1, b = 2, c = 3)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/parents-empty.png")
```
The ancestors of every environment eventually terminate with the empty environment. You can see all ancestors with `env_parents()`:
```{r}
env_parents(e2b)
env_parents(e2d)
```
By default, `env_parents()` stops when it gets to the global environment. This is useful because the ancestors of the global environment include every attached package, which you can see if you override the default behaviour as below. We'll come back to these environments in Section \@ref(search-path).
```{r}
env_parents(e2b, last = empty_env())
```
::: base
Use `parent.env()` to find the parent of an environment. No base function returns all ancestors.
:::
### Super assignment, `<<-`
\indexc{<<-}
\index{assignment}
\index{super assignment}
The ancestors of an environment have an important relationship to `<<-`. Regular assignment, `<-`, always creates a variable in the current environment. Super assignment, `<<-`, never creates a variable in the current environment, but instead modifies an existing variable found in a parent environment.
```{r}
x <- 0
f <- function() {
x <<- 1
}
f()
x
```
If `<<-` doesn't find an existing variable, it will create one in the global environment. This is usually undesirable, because global variables introduce non-obvious dependencies between functions. `<<-` is most often used in conjunction with a function factory, as described in Section \@ref(stateful-funs).
### Getting and setting
\index{environments!bindings}
\indexc{env\_bind\_*}
You can get and set elements of an environment with `$` and `[[` in the same way as a list:
```{r}
e3 <- env(x = 1, y = 2)
e3$x
e3$z <- 3
e3[["z"]]
```
But you can't use `[[` with numeric indices, and you can't use `[`:
```{r, error = TRUE}
e3[[1]]
e3[c("x", "y")]
```
`$` and `[[` will return `NULL` if the binding doesn't exist. Use `env_get()` if you want an error:
```{r, error = TRUE}
e3$xyz
env_get(e3, "xyz")
```
If you want to use a default value if the binding doesn't exist, you can use the `default` argument.
```{r}
env_get(e3, "xyz", default = NA)
```
There are two other ways to add bindings to an environment:
* `env_poke()`[^poke] takes a name (as string) and a value:
```{r}
env_poke(e3, "a", 100)
e3$a
```
[^poke]: You might wonder why rlang has `env_poke()` instead of `env_set()`.
This is for consistency: `_set()` functions return a modified copy;
`_poke()` functions modify in place.
* `env_bind()` allows you to bind multiple values:
```{r}
env_bind(e3, a = 10, b = 20)
env_names(e3)
```
You can determine if an environment has a binding with `env_has()`:
```{r}
env_has(e3, "a")
```
Unlike lists, setting an element to `NULL` does not remove it, because sometimes you want a name that refers to `NULL`. Instead, use `env_unbind()`:
```{r}
e3$a <- NULL
env_has(e3, "a")
env_unbind(e3, "a")
env_has(e3, "a")
```
Unbinding a name doesn't delete the object. That's the job of the garbage collector, which automatically removes objects with no names binding to them. This process is described in more detail in Section \@ref(gc).
::: base
\indexc{rm()}\index{assignment!assign()@\texttt{assign()}}\indexc{get()}\indexc{exists()}
See `get()`, `assign()`, `exists()`, and `rm()`. These are designed interactively for use with the current environment, so working with other environments is a little clunky. Also beware the `inherits` argument: it defaults to `TRUE` meaning that the base equivalents will inspect the supplied environment and all its ancestors.
:::
### Advanced bindings
\index{bindings!delayed}
\index{promises}
\index{bindings!active}
\index{active bindings}
\indexc{env\_bind\_*}
There are two more exotic variants of `env_bind()`:
* `env_bind_lazy()` creates __delayed bindings__, which are evaluated the
first time they are accessed. Behind the scenes, delayed bindings create
promises, so behave in the same way as function arguments.
```{r, cache = TRUE}
env_bind_lazy(current_env(), b = {Sys.sleep(1); 1})
system.time(print(b))
system.time(print(b))
```
The primary use of delayed bindings is in `autoload()`, which
allows R packages to provide datasets that behave like they are loaded in
memory, even though they're only loaded from disk when needed.
* `env_bind_active()` creates __active bindings__ which are re-computed every
time they're accessed:
```{r}
env_bind_active(current_env(), z1 = function(val) runif(1))
z1
z1
```
Active bindings are used to implement R6's active fields, which you'll learn
about in Section \@ref(active-fields).
::: base
See `?delayedAssign()` and `?makeActiveBinding()`.
:::
### Exercises
1. List three ways in which an environment differs from a list.
1. Create an environment as illustrated by this picture.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/recursive-1.png")
```
1. Create a pair of environments as illustrated by this picture.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/recursive-2.png")
```
1. Explain why `e[[1]]` and `e[c("a", "b")]` don't make sense when `e` is
an environment.
1. Create a version of `env_poke()` that will only bind new names, never
re-bind old names. Some programming languages only do this, and are known
as [single assignment languages][single assignment].
1. What does this function do? How does it differ from `<<-` and why
might you prefer it?
```{r, error = TRUE}
rebind <- function(name, value, env = caller_env()) {
if (identical(env, empty_env())) {
stop("Can't find `", name, "`", call. = FALSE)
} else if (env_has(env, name)) {
env_poke(env, name, value)
} else {
rebind(name, value, env_parent(env))
}
}
rebind("a", 10)
a <- 5
rebind("a", 10)
a
```
## Recursing over environments {#env-recursion}
\index{recursion!over environments}
If you want to operate on every ancestor of an environment, it's often convenient to write a recursive function. This section shows you how, applying your new knowledge of environments to write a function that given a name, finds the environment `where()` that name is defined, using R's regular scoping rules.
The definition of `where()` is straightforward. It has two arguments: the name to look for (as a string), and the environment in which to start the search. (We'll learn why `caller_env()` is a good default in Section \@ref(call-stack).)
```{r}
where <- function(name, env = caller_env()) {
if (identical(env, empty_env())) {
# Base case
stop("Can't find ", name, call. = FALSE)
} else if (env_has(env, name)) {
# Success case
env
} else {
# Recursive case
where(name, env_parent(env))
}
}
```
There are three cases:
* The base case: we've reached the empty environment and haven't found the
binding. We can't go any further, so we throw an error.
* The successful case: the name exists in this environment, so we return the
environment.
* The recursive case: the name was not found in this environment, so try the
parent.
These three cases are illustrated with these three examples:
```{r, error = TRUE}
where("yyy")
x <- 5
where("x")
where("mean")
```
It might help to see a picture. Imagine you have two environments, as in the following code and diagram:
```{r}
e4a <- env(empty_env(), a = 1, b = 2)
e4b <- env(e4a, x = 10, a = 11)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/where-ex.png")
```
* `where("a", e4b)` will find `a` in `e4b`.
* `where("b", e4b)` doesn't find `b` in `e4b`, so it looks in its parent, `e4a`,
and finds it there.
* `where("c", e4b)` looks in `e4b`, then `e4a`, then hits the empty environment
and throws an error.
It's natural to work with environments recursively, so `where()` provides a useful template. Removing the specifics of `where()` shows the structure more clearly:
```{r}
f <- function(..., env = caller_env()) {
if (identical(env, empty_env())) {
# base case
} else if (success) {
# success case
} else {
# recursive case
f(..., env = env_parent(env))
}
}
```
::: sidebar
### Iteration versus recursion {-}
It's possible to use a loop instead of recursion. I think it's harder to understand than the recursive version, but I include it because you might find it easier to see what's happening if you haven't written many recursive functions.
```{r}
f2 <- function(..., env = caller_env()) {
while (!identical(env, empty_env())) {
if (success) {
# success case
return()
}
# inspect parent
env <- env_parent(env)
}
# base case
}
```
:::
### Exercises
1. Modify `where()` to return _all_ environments that contain a binding for
`name`. Carefully think through what type of object the function will
need to return.
1. Write a function called `fget()` that finds only function objects. It
should have two arguments, `name` and `env`, and should obey the regular
scoping rules for functions: if there's an object with a matching name
that's not a function, look in the parent. For an added challenge, also
add an `inherits` argument which controls whether the function recurses up
the parents or only looks in one environment.
## Special environments {#special-environments}
Most environments are not created by you (e.g. with `env()`) but are instead created by R. In this section, you'll learn about the most important environments, starting with the package environments. You'll then learn about the function environment bound to the function when it is created, and the (usually) ephemeral execution environment created every time the function is called. Finally, you'll see how the package and function environments interact to support namespaces, which ensure that a package always behaves the same way, regardless of what other packages the user has loaded.
### Package environments and the search path {#search-path}
\indexc{search()}
\index{search path}
\indexc{Autoloads}
\index{environments!base}
Each package attached by `library()` or `require()` becomes one of the parents of the global environment. The immediate parent of the global environment is the last package you attached[^attach], the parent of that package is the second to last package you attached, ...
[^attach]: Note the difference between attached and loaded. A package is loaded automatically if you access one of its functions using `::`; it is only __attached__ to the search path by `library()` or `require()`.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/search-path.png")
```
If you follow all the parents back, you see the order in which every package has been attached. This is known as the __search path__ because all objects in these environments can be found from the top-level interactive workspace. You can see the names of these environments with `base::search()`, or the environments themselves with `rlang::search_envs()`:
```{r}
search()
search_envs()
```
The last two environments on the search path are always the same:
* The `Autoloads` environment uses delayed bindings to save memory by only
loading package objects (like big datasets) when needed.
* The base environment, `package:base` or sometimes just `base`, is the
environment of the base package. It is special because it has to be able
to bootstrap the loading of all other packages. You can access it directly
with `base_env()`.
Note that when you attach another package with `library()`, the parent environment of the global environment changes:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/search-path-2.png")
```
### The function environment {#function-environments}
\index{environments!function}
\indexc{fn\_env()}
A function binds the current environment when it is created. This is called the __function environment__, and is used for lexical scoping. Across computer languages, functions that capture (or enclose) their environments are called __closures__, which is why this term is often used interchangeably with _function_ in R's documentation.
You can get the function environment with `fn_env()`:
```{r}
y <- 1
f <- function(x) x + y
fn_env(f)
```
::: base
Use `environment(f)` to access the environment of function `f`.
:::
In diagrams, I'll draw a function as a rectangle with a rounded end that binds an environment.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/binding.png")
```
In this case, `f()` binds the environment that binds the name `f` to the function. But that's not always the case: in the following example `g` is bound in a new environment `e`, but `g()` binds the global environment. The distinction between binding and being bound by is subtle but important; the difference is how we find `g` versus how `g` finds its variables.
```{r}
e <- env()
e$g <- function() 1
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/binding-2.png")
```
### Namespaces
\index{namespaces}
In the diagram above, you saw that the parent environment of a package varies based on what other packages have been loaded. This seems worrying: doesn't that mean that the package will find different functions if packages are loaded in a different order? The goal of __namespaces__ is to make sure that this does not happen, and that every package works the same way regardless of what packages are attached by the user.
For example, take `sd()`:
```{r}
sd
```
`sd()` is defined in terms of `var()`, so you might worry that the result of `sd()` would be affected by any function called `var()` either in the global environment, or in one of the other attached packages. R avoids this problem by taking advantage of the function versus binding environment described above. Every function in a package is associated with a pair of environments: the package environment, which you learned about earlier, and the __namespace__ environment.
* The package environment is the external interface to the package. It's how
you, the R user, find a function in an attached package or with `::`. Its
parent is determined by search path, i.e. the order in which packages have
been attached.
* The namespace environment is the internal interface to the package. The
package environment controls how we find the function; the namespace
controls how the function finds its variables.
Every binding in the package environment is also found in the namespace environment; this ensures every function can use every other function in the package. But some bindings only occur in the namespace environment. These are known as internal or non-exported objects, which make it possible to hide internal implementation details from the user.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/namespace-bind.png")
```
Every namespace environment has the same set of ancestors:
* Each namespace has an __imports__ environment that contains bindings to all
the functions used by the package. The imports environment is controlled by
the package developer with the `NAMESPACE` file.
* Explicitly importing every base function would be tiresome, so the parent
of the imports environment is the base __namespace__. The base namespace
contains the same bindings as the base environment, but it has a different
parent.
* The parent of the base namespace is the global environment. This means that
if a binding isn't defined in the imports environment the package will look
for it in the usual way. This is usually a bad idea (because it makes code
depend on other loaded packages), so `R CMD check` automatically warns about
such code. It is needed primarily for historical reasons, particularly due
to how S3 method dispatch works.
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/namespace-env.png")
```
Putting all these diagrams together we get:
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/namespace.png")
```
So when `sd()` looks for the value of `var` it always finds it in a sequence of environments determined by the package developer, but not by the package user. This ensures that package code always works the same way regardless of what packages have been attached by the user.
There's no direct link between the package and namespace environments; the link is defined by the function environments.
### Execution environments
\index{environments!execution}
\index{functions!environment}
The last important topic we need to cover is the __execution__ environment. What will the following function return the first time it's run? What about the second?
```{r}
g <- function(x) {
if (!env_has(current_env(), "a")) {
message("Defining a")
a <- 1
} else {
a <- a + 1
}
a
}
```
Think about it for a moment before you read on.
```{r}
g(10)
g(10)
```
This function returns the same value every time because of the fresh start principle, described in Section \@ref(fresh-start). Each time a function is called, a new environment is created to host execution. This is called the execution environment, and its parent is the function environment. Let's illustrate that process with a simpler function. Figure \@ref(fig:execution-env) illustrates the graphical conventions: I draw execution environments with an indirect parent; the parent environment is found via the function environment.
```{r}
h <- function(x) {
# 1.
a <- 2 # 2.
x + a
}
y <- h(1) # 3.
```
```{r execution-env, echo = FALSE, out.width = NULL, fig.cap = "The execution environment of a simple function call. Note that the parent of the execution environment is the function environment."}
knitr::include_graphics("diagrams/environments/execution.png")
```
An execution environment is usually ephemeral; once the function has completed, the environment will be garbage collected. There are several ways to make it stay around for longer. The first is to explicitly return it:
```{r}
h2 <- function(x) {
a <- x * 2
current_env()
}
e <- h2(x = 10)
env_print(e)
fn_env(h2)
```
Another way to capture it is to return an object with a binding to that environment, like a function. The following example illustrates that idea with a function factory, `plus()`. We use that factory to create a function called `plus_one()`.
There's a lot going on in the diagram because the enclosing environment of `plus_one()` is the execution environment of `plus()`.
```{r}
plus <- function(x) {
function(y) x + y
}
plus_one <- plus(1)
plus_one
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/closure.png")
```
What happens when we call `plus_one()`? Its execution environment will have the captured execution environment of `plus()` as its parent:
```{r}
plus_one(2)
```
```{r, echo = FALSE, out.width = NULL}
knitr::include_graphics("diagrams/environments/closure-call.png")
```
You'll learn more about function factories in Section \@ref(factory-fundamentals).
### Exercises
1. How is `search_envs()` different from `env_parents(global_env())`?
1. Draw a diagram that shows the enclosing environments of this function:
```{r, eval = FALSE}
f1 <- function(x1) {
f2 <- function(x2) {
f3 <- function(x3) {
x1 + x2 + x3
}
f3(3)
}
f2(2)
}
f1(1)
```
1. Write an enhanced version of `str()` that provides more information
about functions. Show where the function was found and what environment
it was defined in.
## Call stacks {#call-stack}
\index{environments!calling}
\indexc{parent.frame()}
\index{call stacks}
There is one last environment we need to explain, the __caller__ environment, accessed with `rlang::caller_env()`. This provides the environment from which the function was called, and hence varies based on how the function is called, not how the function was created. As we saw above this is a useful default whenever you write a function that takes an environment as an argument.
::: base
`parent.frame()` is equivalent to `caller_env()`; just note that it returns an environment, not a frame.
:::
To fully understand the caller environment we need to discuss two related concepts: the __call stack__, which is made up of __frames__. Executing a function creates two types of context. You've learned about one already: the execution environment is a child of the function environment, which is determined by where the function was created. There's another type of context created by where the function was called: this is called the call stack.
<!-- HW: mention that this is actually a tree! -->
### Simple call stacks {#simple-stack}
\indexc{cst()}
\indexc{traceback()}
Let's illustrate this with a simple sequence of calls: `f()` calls `g()` calls `h()`.
```{r}
f <- function(x) {
g(x = 2)
}
g <- function(x) {
h(x = 3)
}
h <- function(x) {
stop()
}
```
The way you most commonly see a call stack in R is by looking at the `traceback()` after an error has occurred:
```{r, eval = FALSE}
f(x = 1)
#> Error:
traceback()
#> 4: stop()
#> 3: h(x = 3)
#> 2: g(x = 2)
#> 1: f(x = 1)
```
Instead of `stop()` + `traceback()` to understand the call stack, we're going to use `lobstr::cst()` to print out the **c**all **s**tack **t**ree:
```{r, eval = FALSE}
h <- function(x) {
lobstr::cst()
}
f(x = 1)
#> █
#> └─f(x = 1)
#> └─g(x = 2)
#> └─h(x = 3)
#> └─lobstr::cst()
```
This shows us that `cst()` was called from `h()`, which was called from `g()`, which was called from `f()`. Note that the order is the opposite from `traceback()`. As the call stacks get more complicated, I think it's easier to understand the sequence of calls if you start from the beginning, rather than the end (i.e. `f()` calls `g()`; rather than `g()` was called by `f()`).
### Lazy evaluation {#lazy-call-stack}
\index{lazy evaluation}
The call stack above is simple: while you get a hint that there's some tree-like structure involved, everything happens on a single branch. This is typical of a call stack when all arguments are eagerly evaluated.
Let's create a more complicated example that involves some lazy evaluation. We'll create a sequence of functions, `a()`, `b()`, `c()`, that pass along an argument `x`.
```{r, eval = FALSE}
a <- function(x) b(x)
b <- function(x) c(x)
c <- function(x) x
a(f())
#> █
#> ├─a(f())
#> │ └─b(x)
#> │ └─c(x)
#> └─f()
#> └─g(x = 2)
#> └─h(x = 3)
#> └─lobstr::cst()
```
`x` is lazily evaluated so this tree gets two branches. In the first branch `a()` calls `b()`, then `b()` calls `c()`. The second branch starts when `c()` evaluates its argument `x`. This argument is evaluated in a new branch because the environment in which it is evaluated is the global environment, not the environment of `c()`.
### Frames
\index{frame}
\indexc{parent.frame()}
Each element of the call stack is a __frame__[^frame], also known as an evaluation context. The frame is an extremely important internal data structure, and R code can only access a small part of the data structure because tampering with it will break R. A frame has three key components:
* An expression (labelled with `expr`) giving the function call. This is
what `traceback()` prints out.
* An environment (labelled with `env`), which is typically the execution
environment of a function. There are two main exceptions: the environment of
the global frame is the global environment, and calling `eval()` also
generates frames, where the environment can be anything.
* A parent, the previous call in the call stack (shown by a grey arrow).
Figure \@ref(fig:calling) illustrates the stack for the call to `f(x = 1)` shown in Section \@ref(simple-stack).
[^frame]: NB: `?environment` uses frame in a different sense: "Environments consist of a _frame_, or collection of named objects, and a pointer to an enclosing environment." We avoid this sense of frame, which comes from S, because it's very specific and not widely used in base R. For example, the frame in `parent.frame()` is an execution context, not a collection of named objects.
```{r calling, echo = FALSE, out.width = NULL, fig.cap = "The graphical depiction of a simple call stack"}
knitr::include_graphics("diagrams/environments/calling.png")
```
(To focus on the calling environments, I have omitted the bindings in the global environment from `f`, `g`, and `h` to the respective function objects.)
The frame also holds exit handlers created with `on.exit()`, restarts and handlers for the condition system, and which context to `return()` to when a function completes. These are important internal details that are not accessible with R code.
### Dynamic scope
\index{scoping!dynamic}
Looking up variables in the calling stack rather than in the enclosing environment is called __dynamic scoping__. Few languages implement dynamic scoping (Emacs Lisp is a [notable exception](http://www.gnu.org/software/emacs/emacs-paper.html#SEC15).) This is because dynamic scoping makes it much harder to reason about how a function operates: not only do you need to know how it was defined, you also need to know the context in which it was called. Dynamic scoping is primarily useful for developing functions that aid interactive data analysis, and one of the topics discussed in Chapter \@ref(evaluation).
### Exercises
1. Write a function that lists all the variables defined in the environment
in which it was called. It should return the same results as `ls()`.
## As data structures {#explicit-envs}
\index{hashmaps}
\index{dictionaries|see {hashmaps}}
As well as powering scoping, environments are also useful data structures in their own right because they have reference semantics. There are three common problems that they can help solve:
* __Avoiding copies of large data__. Since environments have reference
semantics, you'll never accidentally create a copy. But bare environments
are painful to work with, so instead I recommend using R6 objects, which
are built on top of environments. Learn more in Chapter \@ref(r6).
* __Managing state within a package__. Explicit environments are useful in
packages because they allow you to maintain state across function calls.
Normally, objects in a package are locked, so you can't modify them
directly. Instead, you can do something like this:
```{r}
my_env <- new.env(parent = emptyenv())
my_env$a <- 1
get_a <- function() {
my_env$a
}
set_a <- function(value) {
old <- my_env$a
my_env$a <- value
invisible(old)
}
```
Returning the old value from setter functions is a good pattern because
it makes it easier to reset the previous value in conjunction with
`on.exit()` (Section \@ref(on-exit)).
* __As a hashmap__. A hashmap is a data structure that takes constant, O(1),
time to find an object based on its name. Environments provide this
behaviour by default, so can be used to simulate a hashmap. See the
hash package [@hash] for a complete development of this idea.
## Quiz answers {#env-answers}
1. There are four ways: every object in an environment must have a name;
order doesn't matter; environments have parents; environments have
reference semantics.
1. The parent of the global environment is the last package that you
loaded. The only environment that doesn't have a parent is the empty
environment.
1. The enclosing environment of a function is the environment where it
was created. It determines where a function looks for variables.
1. Use `caller_env()` or `parent.frame()`.
1. `<-` always creates a binding in the current environment; `<<-`
rebinds an existing name in a parent of the current environment.
[single assignment]:http://en.wikipedia.org/wiki/Assignment_(computer_science)#Single_assignment