-
Notifications
You must be signed in to change notification settings - Fork 334
/
s3.rmd
630 lines (469 loc) · 25 KB
/
s3.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
# S3
You may have noticed that your slot machine results do not look the way I promised they would. I suggested that the slot machine would display its results like this:
```r
play()
## 0 0 DD
## $0
```
But the current machine displays its results in a less pretty format:
```r
play()
## "0" "0" "DD"
## 0
```
Moreover, the slot machine uses a hack to display symbols (we call `print` from within `play`). As a result, the symbols do not follow your prize output if you save it:
```r
one_play <- play()
## "B" "0" "B"
one_play
## 0
```
You can fix both of these problems with R's S3 system.
## The S3 System
S3 refers to a class system built into R. The system governs how R handles objects of different classes. Certain R functions will look up an object's S3 class, and then behave differently in response.
The `print` function is like this. When you print a numeric vector, `print` will display a number:
```r
num <- 1000000000
print(num)
## 1000000000
```
But if you give that number the S3 class `POSIXct` followed by `POSIXt`, `print` will display a time:
```r
class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"
```
If you use objects with classes—and you do—you will run into R's S3 system. S3 behavior can seem odd at first, but is easy to predict once you are familiar with it.
R's S3 system is built around three components: attributes (especially the `class` attribute), generic functions, and methods.
## Attributes
In [Attributes](#attributes), you learned that many R objects come with attributes, pieces of extra information that are given a name and appended to the object. Attributes do not affect the values of the object, but stick to the object as a type of metadata that R can use to handle the object. For example, a data frame stores its row and column names as attributes. Data frames also store their class, `"data.frame"`, as an attribute.
You can see an object's attributes with `attribute`. If you run `attribute` on the `deck` data frame that you created in [Project 2: Playing Cards], you will see:
```r
attributes(deck)
## $names
## [1] "face" "suit" "value"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## [20] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
```
R comes with many helper functions that let you set and access the most common attributes used in R. You've already met the `names`, `dim`, and `class` functions, which each work with an eponymously named attribute. However, R also has `row.names`, `levels`, and many other attribute-based helper functions. You can use any of these functions to retrieve an attribute's value:
```r
row.names(deck)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13"
## [14] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26"
## [27] "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
## [40] "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51" "52"
```
or to change an attribute's value:
```r
row.names(deck) <- 101:152
```
or to give an object a new attribute altogether:
```r
levels(deck) <- c("level 1", "level 2", "level 3")
attributes(deck)
## $names
## [1] "face" "suit" "value"
##
## $class
## [1] "data.frame"
##
## $row.names
## [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## [18] 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134
## [35] 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151
## [52] 152
##
## $levels
## [1] "level 1" "level 2" "level 3"
```
R is very laissez faire when it comes to attributes. It will let you add any attributes that you like to an object (and then it will usually ignore them). The only time R will complain is when a function needs to find an attribute and it is not there.
You can add any general attribute to an object with `attr`; you can also use `attr` to look up the value of any attribute of an object. Let's see how this works with `one_play`, the result of playing our slot machine one time:
```r
one_play <- play()
one_play
## 0
attributes(one_play)
## NULL
```
`attr` takes two arguments: an R object and the name of an attribute (as a character string). To give the R object an attribute of the specified name, save a value to the output of `attr`. Let's give `one_play` an attribute named `symbols` that contains a vector of character strings:
```r
attr(one_play, "symbols") <- c("B", "0", "B")
attributes(one_play)
## $symbols
## [1] "B" "0" "B"
```
To look up the value of any attribute, give `attr` an R object and the name of the attribute you would like to look up:
```r
attr(one_play, "symbols")
## "B" "0" "B"
```
If you give an attribute to an atomic vector, like `one_play`, R will usually display the attribute beneath the vector's values. However, if the attribute changes the vector's class, R may display all of the information in the vector in a new way (as we saw with `POSIXct` objects):
```r
one_play
## [1] 0
## attr(,"symbols")
## [1] "B" "0" "B"
```
R will generally ignore an object's attributes unless you give them a name that an R function looks for, like `names` or `class`. For example, R will ignore the `symbols` attribute of `one_play` as you manipulate `one_play`:
```r
one_play + 1
## 1
## attr(,"symbols")
## "B" "0" "B"
```
```{exercise, name = "Add an Attribute"}
Modify `play` to return a prize that contains the symbols associated with it as an attribute named `symbols`. Remove the redundant call to `print(symbols)`:
```
```r
play <- function() {
symbols <- get_symbols()
print(symbols)
score(symbols)
}
```
```{solution}
You can create a new version of `play` by capturing the output of `score(symbols)` and assigning an attribute to it. `play` can then return the enhanced version of the output:
```
```r
play <- function() {
symbols <- get_symbols()
prize <- score(symbols)
attr(prize, "symbols") <- symbols
prize
}
```
Now `play` returns both the prize and the symbols associated with the prize. The results may not look pretty, but the symbols stick with the prize when we copy it to a new object. We can work on tidying up the display in a minute:
```r
play()
## [1] 0
## attr(,"symbols")
## [1] "B" "BB" "0"
two_play <- play()
two_play
## [1] 0
## attr(,"symbols")
## [1] "0" "B" "0"
```
You can also generate a prize and set its attributes in one step with the `structure` function. `structure` creates an object with a set of attributes. The first argument of `structure` should be an R object or set of values, and the remaining arguments should be named attributes for `structure` to add to the object. You can give these arguments any argument names you like. `structure` will add the attributes to the object under the names that you provide as argument names:
```r
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols)
}
three_play <- play()
three_play
## 0
## attr(,"symbols")
## "0" "BB" "B"
```
Now that your `play` output contains a `symbols` attribute, what can you do with it? You can write your own functions that lookup and use the attribute. For example, the following function will look up `one_play`'s `symbols` attribute and use it to display `one_play` in a pretty manner. We will use this function to display our slot results, so let's take a moment to study what it does:
```r
slot_display <- function(prize){
# extract symbols
symbols <- attr(prize, "symbols")
# collapse symbols into single string
symbols <- paste(symbols, collapse = " ")
# combine symbol with prize as a character string
# \n is special escape sequence for a new line (i.e. return or enter)
string <- paste(symbols, prize, sep = "\n$")
# display character string in console without quotes
cat(string)
}
slot_display(one_play)
## B 0 B
## $0
```
The function expects an object like `one_play` that has both a numerical value and a `symbols` attribute. The first line of the function will look up the value of the `symbols` attribute and save it as an object named `symbols`. Let's make an example `symbols` object so we can see what the rest of the function does. We can use `one_play`'s `symbols` attribute to do the job. `symbols` will be a vector of three-character strings:
```r
symbols <- attr(one_play, "symbols")
symbols
## "B" "0" "B"
```
Next, `slot_display` uses `paste` to collapse the three strings in `symbols` into a single-character string. `paste` collapses a vector of character strings into a single string when you give it the `collapse` argument. `paste` will use the value of `collapse` to separate the formerly distinct strings. Hence, `symbols` becomes `B 0 B` the three strings separated by a space:
```r
symbols <- paste(symbols, collapse = " ")
symbols
## "B 0 B"
```
Our function then uses `paste` in a new way to combine `symbols` with the value of `prize`. `paste` combines separate objects into a character string when you give it a `sep` argument. For example, here `paste` will combine the string in `symbols`, `B 0 B`, with the number in `prize`, 0. `paste` will use the value of `sep` argument to separate the inputs in the new string. Here, that value is `\n$`, so our result will look like `"B 0 B\n$0"`:
```r
prize <- one_play
string <- paste(symbols, prize, sep = "\n$")
string
## "B 0 B\n$0"
```
The last line of `slot_display` calls `cat` on the new string. `cat` is like `print`; it displays its input at the command line. However, `cat` does not surround its output with quotation marks. `cat` also replaces every `\n` with a new line or line break. The result is what we see. Notice that it looks just how I suggested that our `play` output should look in [Programs]:
```r
cat(string)
## B 0 B
## $0
```
You can use `slot_display` to manually clean up the output of `play`:
```r
slot_display(play())
## C B 0
## $2
slot_display(play())
## 7 0 BB
## $0
```
This method of cleaning the output requires you to manually intervene in your R session (to call `slot_display`). There is a function that you can use to automatically clean up the output of `play` _each_ time it is displayed. This function is `print`, and it is a _generic function_.
## Generic Functions
R uses `print` more often than you may think; R calls `print` each time it displays a result in your console window. This call happens in the background, so you do not notice it; but the call explains how output makes it to the console window (recall that `print` always prints its argument in the console window). This `print` call also explains why the output of `print` always matches what you see when you display an object at the command line:
```r
print(pi)
## 3.141593
pi
## 3.141593
print(head(deck))
## face suit value
## king spades 13
## queen spades 12
## jack spades 11
## ten spades 10
## nine spades 9
## eight spades 8
head(deck)
## face suit value
## king spades 13
## queen spades 12
## jack spades 11
## ten spades 10
## nine spades 9
## eight spades 8
print(play())
## 5
## attr(,"symbols")
## "B" "BB" "B"
play()
## 5
## attr(,"symbols")
## "B" "BB" "B"
```
You can change how R displays your slot output by rewriting `print` to look like `slot_display`. Then R would print the output in our tidy format. However, this method would have negative side effects. You do not want R to call `slot_display` when it prints a data frame, a numerical vector, or any other object.
Fortunately, `print` is not a normal function; it is a _generic_ function. This means that `print` is written in a way that lets it do different things in different cases. You've already seen this behavior in action (although you may not have realized it). `print` did one thing when we looked at the unclassed version of `num`:
```r
num <- 1000000000
print(num)
## 1000000000
```
and a different thing when we gave `num` a class:
```r
class(num) <- c("POSIXct", "POSIXt")
print(num)
## "2001-09-08 19:46:40 CST"
```
Take a look at the code inside `print` to see how it does this. You may imagine that print looks up the class attribute of its input and then uses an +if+ tree to pick which output to display. If this occurred to you, great job! `print` does something very similar, but much more simple.
## Methods
When you call `print`, `print` calls a special function, `UseMethod`:
```r
print
## function (x, ...)
## UseMethod("print")
## <bytecode: 0x7ffee4c62f80>
## <environment: namespace:base>
```
`UseMethod` examines the class of the input that you provide for the first argument of `print`, and then passes all of your arguments to a new function designed to handle that class of input. For example, when you give `print` a POSIXct object, `UseMethod` will pass all of `print`'s arguments to `print.POSIXct`. R will then run `print.POSIXct` and return the results:
```r
print.POSIXct
## function (x, ...)
## {
## max.print <- getOption("max.print", 9999L)
## if (max.print < length(x)) {
## print(format(x[seq_len(max.print)], usetz = TRUE), ...)
## cat(" [ reached getOption(\"max.print\") -- omitted",
## length(x) - max.print, "entries ]\n")
## }
## else print(format(x, usetz = TRUE), ...)
## invisible(x)
## }
## <bytecode: 0x7fa948f3d008>
## <environment: namespace:base>
```
If you give `print` a factor object, `UseMethod` will pass all of `print`'s arguments to `print.factor`. R will then run `print.factor` and return the results:
```r
print.factor
## function (x, quote = FALSE, max.levels = NULL, width = getOption("width"),
## ...)
## {
## ord <- is.ordered(x)
## if (length(x) == 0L)
## cat(if (ord)
## "ordered"
## ...
## drop <- n > maxl
## cat(if (drop)
## paste(format(n), ""), T0, paste(if (drop)
## c(lev[1L:max(1, maxl - 1)], "...", if (maxl > 1) lev[n])
## else lev, collapse = colsep), "\n", sep = "")
## }
## invisible(x)
## }
## <bytecode: 0x7fa94a64d470>
## <environment: namespace:base>
```
`print.POSIXct` and `print.factor` are called _methods_ of `print`. By themselves, `print.POSIXct` and `print.factor` work like regular R functions. However, each was written specifically so `UseMethod` could call it to handle a specific class of `print` input.
Notice that `print.POSIXct` and `print.factor` do two different things (also notice that I abridged the middle of `print.factor`—it is a long function). This is how `print` manages to do different things in different cases. `print` calls `UseMethod`, which calls a specialized method based on the class of `print`'s first argument.
You can see which methods exist for a generic function by calling `methods` on the function. For example, `print` has almost 200 methods (which gives you an idea of how many classes exist in R):
```r
methods(print)
## [1] print.acf*
## [2] print.anova
## [3] print.aov*
## ...
## [176] print.xgettext*
## [177] print.xngettext*
## [178] print.xtabs*
##
## Nonvisible functions are asterisked
```
This system of generic functions, methods, and class-based dispatch is known as S3 because it originated in the third version of S, the programming language that would evolve into S-PLUS and R. Many common R functions are S3 generics that work with a set of class methods. For example, `summary` and `head` also call `UseMethod`. More basic functions, like `c`, `+`, `-`, `<` and others also behave like generic functions, although they call `.primitive` instead of `UseMethod`.
The S3 system allows R functions to behave in different ways for different classes. You can use S3 to format your slot output. First, give your output its own class. Then write a print method for that class. To do this efficiently, you will need to know a little about how `UseMethod` selects a method function to use.
### Method Dispatch
`UseMethod` uses a very simple system to match methods to functions.
Every S3 method has a two-part name. The first part of the name will refer to the function that the method works with. The second part will refer to the class. These two parts will be separated by a period. So for example, the print method that works with functions will be called `print.function`. The summary method that works with matrices will be called `summary.matrix`. And so on.
When `UseMethod` needs to call a method, it searches for an R function with the correct S3-style name. The function does not have to be special in any way; it just needs to have the correct name.
You can participate in this system by writing your own function and giving it a valid S3-style name. For example, let's give `one_play` a class of its own. It doesn't matter what you call the class; R will store any character string in the class attribute:
```r
class(one_play) <- "slots"
```
Now let's write an S3 print method for the +slots+ class. The method doesn't need to do anything special—it doesn't even need to print `one_play`. But it _does_ need to be named `print.slots`; otherwise `UseMethod` will not find it. The method should also take the same arguments as `print`; otherwise, R will give an error when it tries to pass the arguments to `print.slots`:
```r
args(print)
## function (x, ...)
## NULL
print.slots <- function(x, ...) {
cat("I'm using the print.slots method")
}
```
Does our method work? Yes, and not only that; R uses the print method to display the contents of `one_play`. This method isn't very useful, so I'm going to remove it. You'll have a chance to write a better one in a minute:
```r
print(one_play)
## I'm using the print.slots method
one_play
## I'm using the print.slots method
rm(print.slots)
```
Some R objects have multiple classes. For example, the output of `Sys.time` has two classes. Which class will `UseMethod` use to find a print method?
```r
now <- Sys.time()
attributes(now)
## $class
## [1] "POSIXct" "POSIXt"
```
`UseMethod` will first look for a method that matches the first class listed in the object's class vector. If `UseMethod` cannot find one, it will then look for the method that matches the second class (and so on if there are more classes in an object's class vector).
If you give `print` an object whose class or classes do not have a print method, `UseMethod` will call `print.default`, a special method written to handle general cases.
Let's use this system to write a better print method for the slot machine output.
```{exercise, name = "Make a Print Method"}
Write a new print method for the slots class. The method should call `slot_display` to return well-formatted slot-machine output.
What name must you use for this method?
```
```{solution}
It is surprisingly easy to write a good `print.slots` method because we've already done all of the hard work when we wrote `slot_display`. For example, the following method will work. Just make sure the method is named `print.slots` so `UseMethod` can find it, and make sure that it takes the same arguments as `print` so `UseMethod` can pass those arguments to `print.slots` without any trouble:
```
```r
print.slots <- function(x, ...) {
slot_display(x)
}
```
Now R will automatically use `slot_display` to display objects of class +slots+ (and only objects of class "slots"):
```r
one_play
## B 0 B
## $0
```
Let's ensure that every piece of slot machine output has the `slots` class.
```{exercise, name = "Add a Class"}
Modify the `play` function so it assigns `slots` to the `class` attribute of its output:
```
```r
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols)
}
```
```{solution}
You can set the `class` attribute of the output at the same time that you set the +symbols+ attribute. Just add `class = "slots"` to the `structure` call:
```
```r
play <- function() {
symbols <- get_symbols()
structure(score(symbols), symbols = symbols, class = "slots")
}
```
Now each of our slot machine plays will have the class `slots`:
```r
class(play())
## "slots"
```
As a result, R will display them in the correct slot-machine format:
```r
play()
## BB BB BBB
## $5
play()
## BB 0 0
## $0
```
## Classes
You can use the S3 system to make a robust new class of objects in R. Then R will treat objects of your class in a consistent, sensible manner. To make a class:
* Choose a name for your class.
* Assign each instance of your class a +class+ attribute.
* Write class methods for any generic function likely to use objects of your class.
Many R packages are based on classes that have been built in a similar manner. While this work is simple, it may not be easy. For example, consider how many methods exist for predefined classes.
You can call `methods` on a class with the `class` argument, which takes a character string. `methods` will return every method written for the class. Notice that `methods` will not be able to show you methods that come in an unloaded R package:
```r
methods(class = "factor")
## [1] [.factor [[.factor
## [3] [[<-.factor [<-.factor
## [5] all.equal.factor as.character.factor
## [7] as.data.frame.factor as.Date.factor
## [9] as.list.factor as.logical.factor
## [11] as.POSIXlt.factor as.vector.factor
## [13] droplevels.factor format.factor
## [15] is.na<-.factor length<-.factor
## [17] levels<-.factor Math.factor
## [19] Ops.factor plot.factor*
## [21] print.factor relevel.factor*
## [23] relist.factor* rep.factor
## [25] summary.factor Summary.factor
## [27] xtfrm.factor
##
## Nonvisible functions are asterisked
```
This output indicates how much work is required to create a robust, well-behaved class. You will usually need to write a `class` method for every basic R operation.
Consider two challenges that you will face right away. First, R drops attributes (like `class`) when it combines objects into a vector:
```r
play1 <- play()
play1
## B BBB BBB
## $5
play2 <- play()
play2
## 0 B 0
## $0
c(play1, play2)
## [1] 5 0
```
Here, R stops using `print.slots` to display the vector because the vector `c(play1, play2)` no longer has a "slots" +class+ attribute.
Next, R will drop the attributes of an object (like `class`) when you subset the object:
```r
play1[1]
## [1] 5
```
You can avoid this behavior by writing a `c.slots` method and a `[.slots` method, but then difficulties will quickly accrue. How would you combine the `symbols` attributes of multiple plays into a vector of symbols attributes? How would you change `print.slots` to handle vectors of outputs? These challenges are open for you to explore. However, you will usually not have to attempt this type of large-scale programming as a data scientist.
In our case, it is very handy to let `slots` objects revert to single prize values when we combine groups of them together into a vector.
## S3 and Debugging
S3 can be annoying if you are trying to understand R functions. It is difficult to tell what a function does if its code body contains a call to `UseMethod`. Now that you know that `UseMethod` calls a class-specific method, you can search for and examine the method directly. It will be a function whose name follows the `<function.class>` syntax, or possibly `<function.default>`. You can also use the `methods` function to see what methods are associated with a function or a class.
## S4 and R5
R also contains two other systems that create class specific behavior. These are known as S4 and R5 (or reference classes). Each of these systems is much harder to use than S3, and perhaps as a consequence, more rare. However, they offer safeguards that S3 does not. If you'd like to learn more about these systems, including how to write and use your own generic functions, I recommend the book [_Advanced R Programming_](http://adv-r.had.co.nz/) by Hadley Wickham.
## Summary
Values are not the only place to store information in R, and functions are not the only way to create unique behavior. You can also do both of these things with R's S3 system. The S3 system provides a simple way to create object-specific behavior in R. In other words, it is R's version of object-oriented programming (OOP). The system is implemented by generic functions. These functions examine the class attribute of their input and call a class-specific method to generate output. Many S3 methods will look for and use additional information that is stored in an object's attributes. Many common R functions are S3 generics.
R's S3 system is more helpful for the tasks of computer science than the tasks of data science, but understanding S3 can help you troubleshoot your work in R as a data scientist.
You now know quite a bit about how to write R code that performs custom tasks, but how could you repeat these tasks? As a data scientist, you will often repeat tasks, sometimes thousands or even millions of times. Why? Because repetition lets you simulate results and estimate probabilities. [Loops] will show you how to automate repetition with R's `for` and `while` functions. You'll use `for` to simulate various slot machine plays and to calculate the payout rate of your slot machine.