-
Notifications
You must be signed in to change notification settings - Fork 6
/
Eval.html
1123 lines (1119 loc) · 65.4 KB
/
Eval.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>R's Evaluation Rules and Environments</title><link rel="stylesheet" type="text/css" href="OmegaTech.css"></link><meta name="generator" content="DocBook XSL Stylesheets V1.79.1"></meta></head><body class="yui-skin-sam"><div class="article"><div class="titlepage"><div><div><h2 class="title"><a id="idm61105528256"></a><b xmlns="" class="proglang">R</b>'s Evaluation Rules and Environments</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Duncan</span> <span class="surname">Temple Lang</span></h3><div class="affiliation"><span class="orgname">University of California at Davis<br></br></span> <span class="orgdiv">Graduate Studies and Department of Statistics<br></br></span></div></div></div></div><hr></hr></div><p>
The purpose of this is to coalesce some of the material we discussed in the first two days (Feb/March 2020),
focusing primarily on how <b xmlns="" class="proglang">R</b> finds variables when evaluating functions.
And environments are key to this.
An environment is a just a container for variables,
that is names bound to values/<b xmlns="" class="proglang">R</b> objects.
And each environment has a parent-environment and so an environment
is part of list/chain of environments.
When we look for a variable, we look along this list of environments.
The nature of environments is the same throughout <b xmlns="" class="proglang">R</b>,
but which chain of environments we search differs slightly
by whether we are evaluating code in the body of a function in a package
or at the top-level prompt or related to a (model) formula.
</p><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="idm61105522496"></a>Calling a Function in a Package</h2></div></div></div><p>
Consider the call
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105521600"><div><pre class="rcode" title="R code">
scatter.smooth(mtcars[, c("mpg", "wt")], , .75, xlab = "Weight", yla = "Miles per gallon",
main = "Motor Trend Cars Data")
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
issued at the <b xmlns="" class="proglang">R</b> prompt.
<b xmlns="" class="proglang">R</b> parses this and we have the language object that is a
<i xmlns=""><a href="Help/call-class.html">call</a></i> to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
So <b xmlns="" class="proglang">R</b> has to find that function.
Since this is being evaluated at the prompt, <b xmlns="" class="proglang">R</b> looks in the global
environment and its chain of parent environments.
We can use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">find()</i> to see where this is located:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105518240"><div><pre class="rcode" title="R code">
find("scatter.smooth")
<pre class="routput">
[1] "package:stats"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This is on the search path as we can see with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105517472"><div><pre class="rcode" title="R code">
search()
<pre class="routput">
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:datasets" "package:utils"
[7] "package:methods" "Autoloads" "package:base"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
So <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i> is in the second element of the search path,
package:stats.
</p><p>
While we can use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">search()</i> to locate the function,
let's introduce a function <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">getEnvChain()</i> in
<a class="ulink" href="showEnv.R" target="_top"><code class="filename">showEnv.R</code></a>
that shows the chain of environments generally, not just the search path.
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105513568"><div><pre class="rcode" title="R code">
source("showEnv.R")
names(getEnvChain(globalenv()))
<pre class="routput">
[1] "globalenv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:datasets" "package:utils"
[7] "package:methods" "Autoloads" "<base>"
[10] "emptyenv"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We've included the empty environment at the end.
We'll use this for more interesting cases than the global environment.
</p><p>
Having found the function <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i> function,
<b xmlns="" class="proglang">R</b> prepares to call it.
</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p> <b xmlns="" class="proglang">R</b> creates a call frame which is an environment.</p></li><li class="listitem"><p> creates a variable in the call frame for each of the parameters/formal arguments in the
definition of the function, i.e., <code xmlns="" class="Sexpression">names(formals(scatter.smooth))</code></p></li><li class="listitem"><p> matches the arguments in the call to the parameters in the function definition
</p><div class="orderedlist"><ol class="orderedlist" type="a"><li class="listitem"><p> matching named arguments by that correspond exactly to names of parameters</p></li><li class="listitem"><p> matching remaining named arguments by partial name matching, except to
parameters after <b xmlns="">...</b> in the function definition</p></li><li class="listitem"><p> matching the remaining arguments by position to the remaining parameters.</p></li></ol></div><p>
</p></li></ol></div><p>
To see how <b xmlns="" class="proglang">R</b> matches arguments in a particular call, we can use
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">match.call()</i>.
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105504592"><div><pre class="rcode" title="R code">
match.call(scatter.smooth,
quote(scatter.smooth(mtcars[, c("mpg", "wt")], , .75,
xlab = "Weight", yla = "Miles per gallon",
main = "Motor Trend Cars Data")))
<pre class="routput">
scatter.smooth(x = mtcars[, c("mpg", "wt")], xlab = "Weight",
ylab = "Miles per gallon", main = "Motor Trend Cars Data")
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
</p><p>
<b xmlns="" class="proglang">R</b> doesn't evaluate any of the arguments in the call at this point due to
<i xmlns="">lazy evaluation</i>. The value of each variable in the call
frame at this point is a <i xmlns="">promise</i>. This is the code corresponding
to that argument in the call, along with the environment in which that code needs
to be evaluated. When the function first asks for the value of the variable in the call
frame, that will trigger <b xmlns="" class="proglang">R</b> to evaluate the promise and that is when the argument will be
actually evaluated. And it will be evaluated in the environment in which the function
(<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>)
was called. In our case, this is the global environment since we call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>
from the <b xmlns="" class="proglang">R</b> prompt.
</p><p>
Now with the call frame constructed and the arguments matched to parameters,
<b xmlns="" class="proglang">R</b> is ready to evaluate the body of the function being called, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
We can see the expressions in the body of the function with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105498240"><div><pre class="rcode" title="R code">
body(scatter.smooth)
<pre class="routput">
{
xlabel <- if (!missing(x))
deparse(substitute(x))
ylabel <- if (!missing(y))
deparse(substitute(y))
xy <- xy.coords(x, y, xlabel, ylabel)
x <- xy$x
y <- xy$y
xlab <- if (is.null(xlab))
xy$xlab
else xlab
ylab <- if (is.null(ylab))
xy$ylab
else ylab
pred <- loess.smooth(x, y, span, degree, family, evaluation)
plot(x, y, ylim = ylim, xlab = xlab, ylab = ylab, ...)
do.call(lines, c(list(pred), lpars))
invisible()
}
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
<b xmlns="" class="proglang">R</b> now loops over these expressions and uses
the call frame as the environment in which to evaluate each expression.
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">missing()</i> and <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">substitute()</i>
are two functions that use non-standard evaluation. So let's ignore these
for now. Just know that <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">missing()</i> is used
to determine if that parameter (e.g. <i xmlns="" class="rarg">x</i>) was explicitly
provided a value in this call to the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>;
and, in this use of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">substitute()</i>, it returns the <b xmlns="" class="proglang">R</b> code corresponding to the
specified parameter (e.g., by looking inside the promise).
</p><p>
So let's consider the third expression
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105493328"><div><pre class="rcode" title="R code">
xy <- xy.coords(x, y, xlabel, ylabel)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This is an assignment. Since it is evaluated in the call frame,
the assignment will be in this call frame (an environment).
There is no parameter named <b xmlns="" class="S" title="">xy</b>, so this will create
a new variable in the call frame named <b xmlns="" class="S" title="">xy</b>,
if evaluating the right-hand side succeeds and does not throw an error.
<b xmlns="" class="proglang">R</b> evaluates the righ-hand side of the assignment before trying to make the
assignment. So it evaluates
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105491216"><div><pre class="rcode" title="R code">
xy.coords(x, y, xlabel, ylabel)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This is a function call, so <b xmlns="" class="proglang">R</b> first looks for <b xmlns="" class="S" title="">xy.coords</b>.
It looks first in the current environment, the call frame for our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
It is not there. So it looks in the parent environment, just it did when looking for
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>. But what is the parent environment of the call frame for
our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
While I could tell you, let's actually find it.
We'll debug the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>
and then get the call frame and ask for its <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">parent.env()</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105471040"><div><pre class="rcode" title="R code">
debug(scatter.smooth)
scatter.smooth(mtcars[, c(1, 2)])
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
At the Browse prompt, we can get the call frame with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105470528"><div><pre class="rcode" title="R code">
environment()
<pre class="routput">
<environment: 0x7fd23971cda0>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
The display is a hexadecimal number. It represents the location in memory of the call frame.
It will be different on your machine. The important thing to know is that it is not a named
environment such as R_GlobalEnv or package:stats, or package:base. It is an <i xmlns="">ad hoc</i>,
short-lived environment.
</p><p>
We can list the names of the variables in this environment/call frame with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105468496"><div><pre class="rcode" title="R code">
ls(environment(), all = TRUE)
<pre class="routput">
[1] "..." "degree" "evaluation" "family" "lpars"
[6] "span" "x" "xlab" "y" "ylab"
[11] "ylim"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We can manually see that these are the same names as those of the parameters
for <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>. We can also verify this programmatically
with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105466976"><div><pre class="rcode" title="R code">
setdiff(names(formals()), ls(environment(), all = TRUE))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
and this returns the empty character vector.
We could also get the call frame with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105466432"><div><pre class="rcode" title="R code">
sys.frames()[[1]]
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
in this specific case, but this is more complex in more nested calls.
So note that <code xmlns="" class="Sexpression">environment()</code> and <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">formals()</i>
know which call frame and function being called they are currently working on.
</p><p>
So now that we have the call frame, let's look along its chain of environments
to find <b xmlns="" class="S" title="">xy.coords</b>.
We can use our <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">showEnv()</i> function to show the chain of environments
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105463648"><div><pre class="rcode" title="R code">
names(getEnvChain(environment()))
<pre class="routput">
[1] "<0x7fd23971d3c0>" "<namespace:stats>" "imports:stats"
[4] "<namespace:base>" "globalenv" "package:stats"
[7] "package:graphics" "package:grDevices" "package:datasets"
[10] "package:utils" "package:methods" "Autoloads"
[13] "<base>" "emptyenv"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We can use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">efind()</i>, a variant of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">find()</i> we wrote for this exposition,
to find which of these environments in the chain contain a variable with a given name
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105461136"><div><pre class="rcode" title="R code">
z = efind("xy.coords", environment(), FALSE)
<pre class="routput">
[1] "imports:stats" "package:grDevices"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
If we look in each of these environments for <b xmlns="" class="S" title="">xy.coords</b>,
we'll find exactly the same function (as the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns=""><a href="http://cran.r-project.org/web/packages/stats/index.html">stats</a></i>
package imports the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns=""><a href="http://cran.r-project.org/web/packages/grDevices/index.html">grDevices</a></i> packages):
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105458896"><div><pre class="rcode" title="R code">
a = lapply(z, function(e) e$xy.coords)
identical(a[[1]], a[[2]])
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
</p><p>
So now we have found the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> function.
With this, <b xmlns="" class="proglang">R</b> starts the same preparation for the call to this function
by creating a call frame, creating variables within it for each of the
parameters in the definition of the function, and then matching
the arguments in the call.
Again, we can use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">match.call()</i> to see how the call
matches the arguments to the parameters:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105456496"><div><pre class="rcode" title="R code">
match.call(xy.coords, quote(xy.coords(x, y, xlabel, ylabel)))
<pre class="routput">
xy.coords(x = x, y = y, xlab = xlabel, ylab = ylabel)
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
So the parameter <i xmlns="" class="rarg">x</i> in the call frame for
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> is assigned (bound to) a promise
to compute the expression <code xmlns="" class="Sexpression">x</code>
which will be evaluated in the environment
in which we called <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> and this is the call frame of our call to
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
</p><p>
Note that because of the non-standard evaluation of
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105452944"><div><pre class="rcode" title="R code">
xlabel <- if (!missing(x))
deparse(substitute(x))
ylabel <- if (!missing(y))
deparse(substitute(y))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
we have yet to actually use the values of <b xmlns="" class="S" title="">x</b> and <b xmlns="" class="S" title="">y</b>.
So <b xmlns="" class="proglang">R</b> has not evaluated the values of these parameters due to lazy evaluation.
Specifically, we have not evaluated, for example, the command <code xmlns="" class="Sexpression">mtcars[, c("mpg", "cyl")]</code>.
This will happen soon when the value is actually needed. And we will see this later in the call
stack, i.e., the sequence of calls from one function to another, to another.
</p><p>
We can arrange to stop in the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i>
via <code xmlns="" class="Sexpression">debug(xy.coords)</code>.
(BTW, to stop debugging each call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i>,
we use <code xmlns="" class="Sexpression">undebug(xy.coords)</code>. That is we use
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">undebug()</i> as the opposite of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">debug()</i>.)
Now we continue evaluating the next expression in the
body of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i> via stepping in the debugger.
This calls <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> and we stop at the start of
that function call.
Note that if we call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">substitute(x)()</i> here
it won't give us the code <code xmlns="" class="Sexpression">mtcars[, c("mpg", "wt")]</code>
from the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
Instead, it gives us simply <code xmlns="" class="Sexpression">x</code>.
This is because the parameter <i xmlns="" class="rarg">x</i> in the call
to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> (from within <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>)
was specified in the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> as simply
<code xmlns="" class="Sexpression">x</code>, i.e., <code xmlns="" class="Sexpression">xy.coords(x, y, xlabel, ylabel)</code>.
So the code for the argument is <code xmlns="" class="Sexpression">x</code>.
When we evaluate that promise, it is the value of <b xmlns="" class="S" title="">x</b>
in the call frame of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
And that is the promise with the code <code xmlns="" class="Sexpression">mtcars[, c("mpg", "wt")]</code>
which will be evaluated in the global environment
since that is where we made the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
</p><p>
Since we know we are interested in when
the call to <code xmlns="" class="sfunction">[</code> on <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">mtcars()</i>
happens, we'll debug the <code xmlns="" class="sfunction">[</code> function.
We can't use
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105438096"><div><pre class="rcode" title="R code">
debug([)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
as that is a syntax error - <b xmlns="" class="proglang">R</b> is looking for a closing ].
So we refer to the symbol [ with `[`, i.e.,
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105437200"><div><pre class="rcode" title="R code">
debug(`[`)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
</p><p>
Since <b xmlns="" class="S" title="">y</b> is <i xmlns=""><code>NULL</code></i>
in <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i>,
we evaluate the body of the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">if()</i>
statement <code xmlns="" class="Sexpression">if(is.null(y))</code>.
The second expression in this is
<code xmlns="" class="Sexpression">if(is.language(x))...</code>
This uses the value of <i xmlns="" class="rarg">x</i>
so <b xmlns="" class="proglang">R</b> needs to evaluate the promise
for that parameter value.
This is when <code xmlns="" class="sfunction">[</code> gets called
in the <code xmlns="" class="Sexpression">mtcars[, c("mpg", "wt")]</code>.
So we step through this code in the debugger
and stop in the function <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">[.data.frame()</i>.
We can inspect the call stack from within the debugger
using the debugger command <b xmlns="" class="S" title="">where</b>:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105431248"><div><pre class="rcode" title="R code">
where
<pre class="routput">
where 1 at #1: `[.data.frame`(mtcars, , c("mpg", "wt"))
where 2 at #1: mtcars[, c("mpg", "wt")]
where 3: xy.coords(x, y, xlabel, ylabel)
where 4: scatter.smooth({
cat("evaluating now\n")
mtcars[, c("mpg", "wt")]
})
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We can also use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">sys.calls()()</i>
to see the call stack, and this works when we are not in the debugger:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105429712"><div><pre class="rcode" title="R code">
sys.calls()
<pre class="routput">
[[1]]
scatter.smooth({
cat("evaluating now\n")
mtcars[, c("mpg", "wt")]
})
[[2]]
xy.coords(x, y, xlabel, ylabel)
[[3]]
mtcars[, c("mpg","wt")]
[[4]]
mtcars[, c("mpg","wt")]
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
Note that a) this returns a list of language objects, and b)
these are in the reverse order as shown by the debugger's <b xmlns="" class="S" title="">where</b>.
</p><p>
Why did we get to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">[.data.frame()</i>
and not <code xmlns="" class="sfunction">[</code>?
And why does <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">sys.calls()</i> have
two identical elements <code xmlns="" class="Sexpression">mtcars[, c("mpg","wt")]</code>?
The reason is the same for both questions
and relates to <b xmlns="" class="proglang">R</b>'s object oriented S3 method dispatch.
We called <code xmlns="" class="sfunction">[</code> and that looked for a method based on the cllass
of its first argument, <b xmlns="" class="S" title="">mtcars</b>, which is a <i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>. So <b xmlns="" class="proglang">R</b> passed the call to
to <code xmlns="" class="sfunction">[</code> to the more specific version <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">[.data.frame()</i>.
We'll talk about this later.
</p><p>
The expression <code xmlns="" class="Sexpression">mtcars[, c("mpg", "wt")]</code>
is interesting. Firstly, it is actually a call to <code xmlns="" class="sfunction">[</code>.
We can see this via
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105421424"><div><pre class="rcode" title="R code">
sys.call()
<pre class="routput">
mtcars[, c("mpg","wt")]
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
when we are debugging in the call frame for <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">[.data.frame()</i>.
Specifically,
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105420128"><div><pre class="rcode" title="R code">
sys.call()[[1]]
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
returns <b xmlns="" class="S" title="">[.data.frame</b>.
The first argument is <b xmlns="" class="S" title="">mtcars</b>.
The second, corresponding to the rows to subset, is missing/not specified.
The third is the function call <code xmlns="" class="Sexpression">c("mpg", "wt")</code>.
We can ask is the second argument missing with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105418288"><div><pre class="rcode" title="R code">
missing(i)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
and we get <i xmlns=""><code>TRUE</code></i>.
</p><p>
This has shown us that <code xmlns="" class="Sexpression">mtcars[, c("mpg", "wt")]</code> is not
evaluated until we are in the body of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i>
called from <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
So we have witnessed and verified lazy evaluation via the call stack.
</p><div class="figure"><a id="idm61105415696"></a><p class="title"><strong>Figure 1. Call Stack, Call Frames and their Environment Chains/Search Paths</strong></p><div class="figure-contents"><div><img src="callStackSearchPathFig.png" alt="Call Stack, Call Frames and their Environment Chains/Search Paths"></img></div><div class="caption"><p>This shows the call stack for <code xmlns="" class="Sexpression">scatter.smooth(mtcars[, c(1, 2)])</code>.
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i> calls <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i> and finally the argument for <i xmlns="" class="rarg">x</i> in
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i> is needed.
Due to <b xmlns="" class="proglang">R</b>'s lazy evaluation, this is evaluated next,
hence the call the <code xmlns="" class="Sexpression">[(mtcars, c(1, 2))</code>.
Within each of the three call frames,
<b xmlns="" class="proglang">R</b> searches for symbols by looking first in the
call frame and then in its parent environment, and its parent environment,
and so on.
These environments are displayed moving from left to right and show
the search through packages and their imports and their parents.
</p></div></div></div><br class="figure-break"></br></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="idm61105408960"></a>Environment of a Function Defined Inside a Function</h2></div></div></div><p>
We just looked at a call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>
which is a function defined in a package.
We saw how it found the functions it itself called (e.g. <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">xy.coords()</i>)
along its own search path, not the <b xmlns="" class="proglang">R</b> session search path.
We found this chain of environments from within the call frame
in our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
We could have also obtained the environment of the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>
function with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105405728"><div><pre class="rcode" title="R code">
environment(scatter.smooth)
<pre class="routput">
<environment: namespace:stats>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
A namespace is a specific type of environment
and relates to packages.
This is the parent environment of the call frame
in our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smooth()</i>.
And this generality is true for all call frames -
the parent environment of a call frame
is the environment of the function (definition)
being called.
And what is the environment of a function?
<i xmlns="">The environment of a function is the
environment in which that function is defined.</i>
What does this mean?
</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>If the function is defined in a package,
the environment is the namespace of that package.
</p></li><li class="listitem"><p>
If a function is defined at the <b xmlns="" class="proglang">R</b> prompt,
the environment is the global environment
since that is where the creation of the function is evaluated.
</p></li><li class="listitem"><p>
If the function is defined via calling <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">source()</i>,
the environment is the one used in that call to the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">source()</i> function
and, when called from the <b xmlns="" class="proglang">R</b> prompt, that defaults to the global environment.
However, we can specify the environment in which <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">source()</i>
should evaluate the expressions.
</p></li><li class="listitem"><p>
If a function is defined within the body of a function,
then its environment is the call frame of in which it is defined.
This gives us closures.
</p></li></ol></div><p>
</p><p>
Let's define a simple function at the <b xmlns="" class="proglang">R</b> prompt.
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105397696"><div><pre class="rcode" title="R code">
f = function(n) median(rnorm(n))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
What is its environment?
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105397232"><div><pre class="rcode" title="R code">
environment(f)
<pre class="routput">
<environment: R_GlobalEnv>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
When we call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">f()</i>, <b xmlns="" class="proglang">R</b> will
create a call frame and match the argument to the parameter
and evaluate the expressions in the body.
It will look for <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">rnorm()</i>
in the call frame and then in the parent environment and so on.
The parent environment of the call frame will
be <code xmlns="" class="Sexpression">environment(f)</code>, that is the global environment
as we just determined.
</p><p>
Let's write another function, again at the top-level <b xmlns="" class="proglang">R</b> prompt.
This one defines and returns a function
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105393504"><div><pre class="rcode" title="R code">
gen = function(n) {
function(mu) rnorm(n, mu)
}
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This function has one parameter <i xmlns="" class="rarg">n</i>.
It returns a function which itself has only parameter
<i xmlns="" class="rarg">mu</i>.
We can call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">gen()</i> with an integer
giving the sample size
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105391696"><div><pre class="rcode" title="R code">
f1 = gen(10)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
<b xmlns="" class="S" title="">f1</b> is now a function
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
function(mu) rnorm(n, mu)
<environment: 0x7fd24a150308>
</pre>
<p>
Note the environment displayed.
We won't see this on our function <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">gen()</i>
when we print it. That is because it is the global environment
and <b xmlns="" class="proglang">R</b> doesn't show that environment when display a function.
But we see an environment on all other functions, e.g.,
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">scatter.smoooth()</i> shows
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
<environment: namespace:stats>
</pre>
<p>
</p><p>
In the case of <b xmlns="" class="S" title="">f</b>,
the environment is the call frame for our call to <code xmlns="" class="Sexpression">gen(10)</code>
in which the function was created.
Let's debug another call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">gen()</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105386768"><div><pre class="rcode" title="R code">
debug(gen)
f2 = gen(13)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
Let's get the identifier for the call frame with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105386288"><div><pre class="rcode" title="R code">
environment()
<pre class="routput">
<environment: 0x7fd2598ede10>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
(This will be a different hexadecimal value in your <b xmlns="" class="proglang">R</b> session.)
Now we'll continue the evaluation of the function call and return.
What is the environment of <b xmlns="" class="S" title="">f2</b>?
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105384448"><div><pre class="rcode" title="R code">
environment(f2)
<pre class="routput">
<environment: 0x7fd2598ede10>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
It is the call frame in which the function object was defined.
</p><p>
What are the variables in that environment?
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105383056"><div><pre class="rcode" title="R code">
ls(environment(f2), all = TRUE)
<pre class="routput">
[1] "n"
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
What is the value of that <b xmlns="" class="S" title="">n</b>?
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105381808"><div><pre class="rcode" title="R code">
environment(f2)$n
<pre class="routput">
[1] 13
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
i.e., the value of <b xmlns="" class="S" title="">n</b> in our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">gen()</i>
that created the function we assigned to <b xmlns="" class="S" title="">f2</b>.
</p><p>
What's the value of n in the environment of <b xmlns="" class="S" title="">f1</b> ?
That is 10 since we created <b xmlns="" class="S" title="">f1</b>
with the call <code xmlns="" class="Sexpression">f1 = gen(10)</code>.
</p></div><div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="idm61105377792"></a>Formulae, Environments and Non-Standard Evaluation</h2></div></div></div><p>
Like functions, model formula such as <code xmlns="" class="Sexpression">mpg ~ wt + cyl</code>
also have an associated formula.
We can create a formula as a stand-alone object with, e.g.,
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105376272"><div><pre class="rcode" title="R code">
frm = b ~ a
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We can examine its structure with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105375824"><div><pre class="rcode" title="R code">
str(frm)
<pre class="routput">
Class 'formula' language b ~ a
..- attr(*, ".Environment")=<environment: R_GlobalEnv>
</pre>
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We see that there is an attribute associated with the formula
and in this case it is the global environment.
Like functions, the environment of a formula is the environment
in which it is defined (unless explicitly set to be different.)
So if a formula was created in a call to a function, its environment
would be the call frame in which it was created.
</p><p>
Environments on formula allow functions manipulating a
formula to find the associated variables referenced
in the formula.
We'll look at this and how it can be somewhat surprising in some
respects.
Again, we'll construct an experiment we can explicitly debug
and understand in <b xmlns="" class="proglang">R</b>, rather than just stating the rules.
Let's create two variables <b xmlns="" class="S" title="">a</b> and <b xmlns="" class="S" title="">b</b>:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105372544"><div><pre class="rcode" title="R code">
set.seed(12312)
a = runif(10)
b = 5 + 2.34*a + rnorm(length(a))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
Now, consider the formula
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105372080"><div><pre class="rcode" title="R code">
b ~ a
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
As we saw, this has the global environment associated with it.
</p><p>
Note also that the formula <code xmlns="" class="Sexpression">b ~ a</code>
doesn't actually use the values of <b xmlns="" class="S" title="">a</b> and <b xmlns="" class="S" title="">b</b>.
The formula is symbolic and its contents will be evaluated
later. So the formula refers to some variables <b xmlns="" class="S" title="">a</b>
and <b xmlns="" class="S" title="">b</b> that will be resolved later when the formula
is used. The <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">~()</i> is somewhat like the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">quote()</i> function
which doesn't evaluate its argument. It is also like (but not the same as) <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">parse()</i> in that it's contents will be
used later. The expression <code xmlns="" class="Sexpression">b ~ a</code>
is a call to the function <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">~()</i>
which is a .Primitive function. That means it immediately
passes its un-evaluated arguments to <b xmlns="" class="acronym" title="C programming language"><b class="proglang">C</b></b> (C programming language) code
and does not follow <b xmlns="" class="proglang">R</b>'s standard evaluation
model. In other words, it uses non-standard evaluation!
So it doesn't actually get the value of <b xmlns="" class="S" title="">a</b>
or <b xmlns="" class="S" title="">b</b>.
</p><p>
Consider the call
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105364528"><div><pre class="rcode" title="R code">
coef(lm(b ~ a))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This calls <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">coef()</i> which <b xmlns="" class="proglang">R</b> locates
on the search path and matches the arguments to the parameters.
The call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> is also evaluated in the global
environment and the same rules apply.
The formula <code xmlns="" class="Sexpression">b ~ a</code> is also evaluated in the global
environment since this is a top-level expression.
So the formula's environment is the global environment just
as it would be if we had done this in two steps
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105362224"><div><pre class="rcode" title="R code">
frm = b ~ a
coef(lm(frm))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
The answer we get is
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
(Intercept) a
5.097029 2.618424
</pre>
<p>
</p><p>
How does <b xmlns="" class="proglang">R</b> evaluate the call to <code xmlns="" class="Sexpression">lm(b ~ a)</code>.
We have described the general rules of finding the
function (<b xmlns="" class="S" title="">lm</b>), creating
the call frame with a variable for each parameter,
matching the arguments to the parameters, and evaluating
the expressions in the body of the function with the call frame
as the environment for the evaluation.
However, like <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">library()</i>, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">rm()</i>,
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">?()</i>, and a few more, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>
uses non-standard evaluation (NSE).
And this is made more complicated by how formulae work.
</p><p>
The call <code xmlns="" class="Sexpression">lm(b ~ a)</code>
passes the formula to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>.
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> uses an explicit call to the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">eval()</i>
function to create the design/model matrix for the regression.
Actually, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> behaves even more unusually
in that it constructs the call it passes to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">eval()</i>
by adapting how it itself was called.
We can see this by debugging the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105354336"><div><pre class="rcode" title="R code">
debug(lm)
lm(b ~ a)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
After stepping through the initial expressions,
we see the sequence of commands
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105353824"><div><pre class="rcode" title="R code">
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- quote(stats::model.frame)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We have seen <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">match.call()</i>.
This returns are original call with the parameter names explicitly matched
and added, i.e.,
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105352544"><div><pre class="rcode" title="R code">
lm(formula = b ~ a)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
Next, we match the parameter names in the call to find formula, data, subset, etc.
and we then keep only those elements of the call, discarding any other arguments.
Why, because we are the arguments in the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>
to actually call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i> in the <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns=""><a href="http://cran.r-project.org/web/packages/stats/index.html">stats</a></i>
package. And <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> doesn't want to have to check which arguments
were passed to it and pass those on to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i>. Instead
it does this meta-programming on how it was invoked/called to create
the corresponding call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i>.
So the next two expressiins add a named argument <i xmlns="" class="rarg">drop.unused.levels</i>
with a value of <i xmlns=""><code>TRUE</code></i>
and changing the function being invoked in this call object
t <code xmlns="" class="Sexpression">stats::model.frame</code>.
So at the end of this sequence of expressions, we can see
that <b xmlns="" class="S" title="">mf</b> is
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
stats::model.frame(formula = b ~ a, drop.unused.levels = TRUE)
</pre>
<p>
and has class <i xmlns=""><a href="Help/call-class.html">call</a></i>.
</p><p>
Next, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> evaluates this call with
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105345840"><div><pre class="rcode" title="R code">
mf <- eval(mf, parent.frame())
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This explicitly calls <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">eval()</i>
and specifies an environment in which to evaluate that call.
This environment - computed as <code xmlns="" class="Sexpression">parent.frame()</code> -
controls where <b xmlns="" class="proglang">R</b> will look for variables in the call.
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">parent.frame()</i> looks along the call stack,
not the search path of this call frame.
In our case, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">parent.frame()</i> is the global environment
since our call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> is the only call in the current
call stack.
</p><p>
So let's debug the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105341792"><div><pre class="rcode" title="R code">
debug(model.frame)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This stops in
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
debugging in: stats::model.frame(formula = b ~ a, drop.unused.levels = TRUE)
debug: UseMethod("model.frame")
</pre>
<p>
This is the S3 method and <b xmlns="" class="proglang">R</b> will dispatch to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame.default()</i>
and will debug that for us as it is related to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i>.
We step through the expressions in <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame.default()</i>
and we get to
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105338800"><div><pre class="rcode" title="R code">
if (missing(data) && inherits(formula, "data.frame")) {
if (length(attr(formula, "terms")))
return(formula)
data <- formula
formula <- as.formula(data)
}
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We didn't provide a value for the <i xmlns="" class="rarg">data</i>
parameter in our call (to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i> that was mimiced to call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame()</i>). But
<b xmlns="" class="S" title="">formula</b> does not inherit from <i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>,
so this condition is false. And then we get to
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105335904"><div><pre class="rcode" title="R code">
formula <- as.formula(formula)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
and then
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105335408"><div><pre class="rcode" title="R code">
data <- environment(formula)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
So model.frame will use the environment of the formula as the source
of the data, i.e., the place in which to look for symbols.
Recall that the environment of our formula is the global environment.
And after a few more expressions, we get to
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105334688"><div><pre class="rcode" title="R code">
variables <- eval(predvars, data, env)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
<b xmlns="" class="S" title="">predvars</b> is
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
list(b, a)
</pre>
<p>
and <b xmlns="" class="S" title="">data</b> and <b xmlns="" class="S" title="">env</b> are the global environment.
So this call to evaluate will find <b xmlns="" class="S" title="">a</b> and <b xmlns="" class="S" title="">b</b>
in the global environment.
</p><div xmlns="" class="question" id="q10">
<font size="+2">Q.</font><p xmlns="http://www.w3.org/1999/xhtml">
Where and how will <b xmlns="" class="proglang">R</b> find the variables <b xmlns="" class="S" title="">a</b>
and <b xmlns="" class="S" title="">b</b> referenced in the formula?
</p><div class="codeToggle"><div class="unhidden" id="idm61105329760"><div><pre class="rcode" title="R code">
f = function() {
set.seed(12314)
a = runif(30)
b = pi * a + rnorm(30)
lm(b ~ a)
}
f()
</pre></div></div></div>
<div class="clearFloat"></div>
<p xmlns="http://www.w3.org/1999/xhtml">
</p><button class="collapsible">answer</button><div class="answerOuter"><div class="answer"><p xmlns="http://www.w3.org/1999/xhtml">
<i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame.default()</i> will call <code xmlns="" class="Sexpression">eval(predvars, data, env)</code>
with <b xmlns="" class="S" title="">data</b> and <b xmlns="" class="S" title="">env</b> being the environment of the formula.
This will be the call frame for the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">f()</i>.
So <b xmlns="" class="proglang">R</b> will find the variables <b xmlns="" class="S" title="">a</b> and <b xmlns="" class="S" title="">b</b>
in that call frame.
</p></div></div>
</div>
<div xmlns="" class="question" id="q11">
<font size="+2">Q.</font><p xmlns="http://www.w3.org/1999/xhtml">
What would happen in the following
</p><div class="codeToggle"><div class="unhidden" id="idm61105324288"><div><pre class="rcode" title="R code">
f = function() lm(b ~ a)
f()
</pre></div></div></div>
<div class="clearFloat"></div>
<p xmlns="http://www.w3.org/1999/xhtml">
Where would <b xmlns="" class="proglang">R</b> find <b xmlns="" class="S" title="">b</b> and <b xmlns="" class="S" title="">a</b> and how/why?
</p><button class="collapsible">answer</button><div class="answerOuter"><div class="answer">
<p xmlns="http://www.w3.org/1999/xhtml">
This looks simpler than the previous question
but is actually slightly more complicated, but
builds on the rules we saw earlier.
Again, <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame.default()</i> will get to
</p><div class="codeToggle"><div class="unhidden" id="idm61105321424"><div><pre class="rcode" title="R code">
variables <- eval(predvars, data, env)
</pre></div></div></div>
<div class="clearFloat"></div>
<p xmlns="http://www.w3.org/1999/xhtml">
with <b xmlns="" class="S" title="">data</b> and <b xmlns="" class="S" title="">env</b> coming from the
environment on the formula.
That will be the call frame from the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">f()</i>.
However, <b xmlns="" class="S" title="">a</b> and <b xmlns="" class="S" title="">b</b> are not in that call frame,
whereas they were in the previous question.
So the evaluation uses <b xmlns="" class="proglang">R</b>'s regular rules to search for the variables
along the chain of environments.
We can look at this chain with
</p><div class="codeToggle"><div class="unhidden" id="idm61105318224"><div><pre class="rcode" title="R code">
names(getEnvChain(env))
<pre class="routput">
[1] "<0x7fd259232f08>" "globalenv" "package:stats"
[4] "package:graphics" "package:grDevices" "package:datasets"
[7] "package:utils" "package:methods" "Autoloads"
[10] "<base>" "emptyenv"
</pre>
</pre></div></div></div>
<div class="clearFloat"></div>
<p xmlns="http://www.w3.org/1999/xhtml">
So we see the global environment is next on the chain after
the call frame for <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">f()</i>.
This is, as we know, because the environment of <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">()</i>
is the global environment since that is where <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">f()</i>
was defined. And the parent environment of a call frame is the environment
of that function being called. (Don't confuse this with <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">sys.parent()</i>
which refers to call frames on the call stack.)
</p>
</div></div>
</div>
<p>
Now, we will switch from finding the values for <b xmlns="" class="S" title="">a</b>
and <b xmlns="" class="S" title="">b</b> in the global environment
to providing them in a <i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>.
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105313136"><div><pre class="rcode" title="R code">
d = data.frame(a = a, b = b)
coef(lm(b ~ a, d))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
We actually have two instances of <b xmlns="" class="S" title="">a</b> -
one in the global environment and one as an element in the
<i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>.
Which one did <b xmlns="" class="proglang">R</b> use? It doesn't matter, in this case, since they are
the same. But it would matter if they were different.
So which did it use? We can read the help file,
or simply know, or use the debugger to step through the computations
as we did before.
<b xmlns="" class="proglang">R</b>, and <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">model.frame.default()</i> specifically, will use
the values in the <i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>,
not those in the global environment.
We can also verify this by removing <b xmlns="" class="S" title="">a</b>
from the global environment and seeing if there is an error
about not finding <b xmlns="" class="S" title="">a</b>:
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105308928"><div><pre class="rcode" title="R code">
a1 = a
rm(a)
coef(lm(b ~ a, d))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
This succeeds and gives us the same answer as we originally got.
</p><p>
We'll now make this a little more interesting.
We'll use <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>'s <i xmlns="" class="rarg">weights</i> parameter
to do weighted least squares.
We'll use two sets of weights, one where all observations
have weight 1 and another which are random numbers between
1 and 2, i.e.,
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105306992"><div><pre class="rcode" title="R code">
w1 = rep(1, length(b))
w2 = runif(length(b), 1, 2)
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
When we use these
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105306512"><div><pre class="rcode" title="R code">
coef(lm(b ~ a, weights = w1))
coef(lm(b ~ a, weights = w2))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
we get
</p><pre xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="routput">
(Intercept) a
5.097029 2.618424
(Intercept) a
5.305530 2.194265
</pre>
<p>
respectively.
So we get different estimates for a (and the intercept)
as we expect.
</p><p>
We might expect <b xmlns="" class="proglang">R</b>'s usual computational model
is in effect here. Specifically,
in the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>,
<b xmlns="" class="proglang">R</b> matches the expression <b xmlns="" class="S" title="">w1</b>
to the <i xmlns="" class="rarg">weights</i> parameter
and creates a promise to evaluate <code xmlns="" class="Sexpression">w1</code>
in the global environment in which we made the call to <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>.
And certainly, the result we got are consistent with this.
However, it is not the case. We are finding <b xmlns="" class="S" title="">w1</b>
via a slightly different mechanism which happens to give the same
result, but only in this case.
</p><p>
Let's create a new column in our <i xmlns=""><a href="Help/data.frame-class.html">data.frame</a></i>
<b xmlns="" class="S" title="">d</b> and we'll name it <i xmlns="" class="relement">w1</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105299824"><div><pre class="rcode" title="R code">
d$w1 = w1
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
Again, we call <i xmlns:r="http://www.r-project.org" xmlns:c="http://www.C.org" xmlns="" class="rfunc">lm()</i>
</p><div xmlns="" class="codeToggle"><div class="unhidden" id="idm61105299008"><div><pre class="rcode" title="R code">
coef(lm(b ~ a, d, weights = w1))
</pre></div></div></div>
<div xmlns="" class="clearFloat"></div>
<p>
and we get the same answer.