-
Notifications
You must be signed in to change notification settings - Fork 1
/
profile.Rmd
133 lines (100 loc) · 3.77 KB
/
profile.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "profile"
author: "Brian S. Yandell"
date: "7/6/2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
- [Advanced R by Hadley Wickham (ebook)](http://adv-r.had.co.nz/): [Performance](http://adv-r.had.co.nz/Performance.html) & [Profiling](http://adv-r.had.co.nz/Profiling.html)
“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.” — Donald Knuth.
Optimising code to make it run faster is an iterative process:
- Find the biggest bottleneck (the slowest part of your code).
- Try to eliminate it (you may not succeed but that’s ok).
- Repeat until your code is “fast enough.”
This sounds easy, but it’s not. (from [Advanced R by Hadley Wickham](http://adv-r.had.co.nz/))
### Diagnostics and testing
First, there are many ideas on how to diagnose bottlenecks and improve performance.
Reread what was just
- where are bottlenecks: `system.time()` & `proc.time()`
- what is broken: `traceback()` and `debug()`
- does it do what I want?
+ informal unit testing of small pieces of code
+ using [testthat](https://github.com/hadley/testthat) package
### Several issues of code efficiency come up:
- cautions on using `for` and `while` loops
+ see [data example](../curate/applyExample.Rmd)
- comparing floating point numbers: `all.equal()` and `1L`
- R profiling of code with Rprof() (before optimizing)
+ use `lineprof` package or `Rprof()` (see [lineprof.Rmd](lineprof.Rmd) and example in [Adv-R: Profiling](http://adv-r.had.co.nz/Profiling.html))
### Example of system.time
```{r}
surveys <- read.csv("../data/surveys.csv")
```
```{r}
forloop <- function(surveys) {
n_species <- length(unique(surveys$species_id))
means <- numeric(n_species)
names(means) <- sort(unique(surveys$species_id))
for(i in names(means))
means[i] <- mean(surveys$weight[surveys$species_id == i], na.rm = TRUE)
means
}
```
```{r}
system.time(fmeans <- forloop(surveys))
```
```{r message=FALSE}
library(dplyr)
```
```{r}
dplyrmeans <- function(surveys) {
surveys %>%
group_by(species_id) %>%
summarize(means = mean(weight, na.rm = TRUE)) %>%
ungroup
}
```
```{r}
system.time(dmeans <- dplyrmeans(surveys))
```
Alternatively, you can use the `proc.time()` function. See [data example](../curate/applyExample.Rmd)
for timing using this approach.
It is a good idea to examine results from different approaches.
Here are two ways to do this.
```{r}
summary(fmeans - dmeans$means)
```
```{r}
all.equal(fmeans, dmeans$means)
```
### Example of traceback and debug
Suppose you have a little function that does not work quite right.
```{r}
lousy <- function(x) {
x <- as.character(x)
y <- 1
y <- sum(x, y)
y
}
double_lousy <- function(x) {
lousy(x)
}
```
```{r eval=FALSE}
double_lousy("a")
```
```{r eval=FALSE}
debug(lousy)
```
Do the following in the console. Step through `lousy` using `return` or `n` & `return`.
You can examine parameters at each step, or try out the next step before running it.
```{r eval=FALSE}
double_lousy("a")
```
### Unit tests
See [Karl Broman's Writing Tests](http://kbroman.org/pkg_primer/pages/tests.html).
The idea is to write tests of units of code (unit tests), to check out code a step at a time.
Then go further, and set up unit tests to be reused whenever you change your code.
[Travis Continuous Integration](https://travis-ci.org/) works with packages on github to make sure code works as expected each time it is changed. For `R`, this is often used in conjunction with [testthat](https://github.com/hadley/testthat). However, Travis/CI works for a wide range of software languages and platforms.