-
Notifications
You must be signed in to change notification settings - Fork 0
/
assignment6_Perreault-graded.Rmd
236 lines (166 loc) · 10 KB
/
assignment6_Perreault-graded.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
title: 'Bios 6301: Assignment 6'
author: 'Andrea Perreault'
output: pdf_document
---
**Grade 43/50**
*Due Thursday, 1 December, 1:00 PM*
$5^{n=day}$ points taken off for each day late.
50 points total.
Submit a single knitr file (named `homework6.rmd`), along with a valid PDF output file. Inside the file, clearly indicate which parts of your responses go with which problems (you may use the original homework document as a template). Add your name as `author` to the file's metadata section. Raw R code/output or word processor files are not acceptable.
Failure to name file `homework6.rmd` or include author name may result in 5 points taken off.
### QUESTION 1 ###
**15 points**
Consider the following very simple genetic model (*very* simple -- don't worry if you're not a geneticist!). A population consists of equal numbers of two sexes: male and female. At each generation men and women are paired at random, and each pair produces exactly two offspring, one male and one female. We are interested in the distribution of height from one generation to the next. Suppose that the height of both children is just the average of the height of their parents, how will the distribution of height change across generations?
Represent the heights of the current generation as a dataframe with two variables, m and f, for the two sexes. We can use `rnorm` to randomly generate the population at generation 1:
```{r}
pop <- data.frame(m = rnorm(100, 160, 20), f = rnorm(100, 160, 20))
```
The following function takes the data frame `pop` and randomly permutes the ordering of the men. Men and women are then paired according to rows, and heights for the next generation are calculated by taking the mean of each row. The function returns a data frame with the same structure, giving the heights of the next generation.
```{r}
next_gen <- function(pop) {
pop$m <- sample(pop$m)
pop$m <- rowMeans(pop)
pop$f <- pop$m
pop
}
```
Use the function `next_gen` to generate nine generations (you already have the first), then use the function `hist` to plot the distribution of male heights in each generation (this will require multiple calls to `hist`). The phenomenon you see is called regression to the mean. Provide (at least) minimal decorations such as title and x-axis labels.
```{r}
gen.data <- pop
gen.data$gen <- 1
for (i in 2:9) {
pop <- next_gen(pop[,1:2])
pop$gen <- i
gen.data <- rbind(gen.data,pop)
}
gen.max <- max(gen.data$m)
gen.min <- min(gen.data$m)
par(mfrow=c(3,3))
for (j in 1:9) {
sub.gen <- gen.data[gen.data$gen == j, 'm']
hist(sub.gen, main=paste("Generation", j), col="purple", xlab="Height", xlim=c(gen.min, gen.max))
}
```
### QUESTION 2 ###
**10 points**
Use the simulated results from question 1 to reproduce (as closely as possible) the following plot in ggplot2.
![generations plot](http://d.pr/i/Xh0d+)
```{r}
library(ggplot2)
ggplot(gen.data, aes(x=m, y=f)) + geom_point(alpha=0.25) + facet_wrap(~ gen)
```
### QUESTION 3 ###
**10 points**
You calculated the power of a study design in question #2 of assignment 3. The study has two variables, treatment group and outcome. There are two treatment groups (0, 1) and they should be assigned randomly with equal probability. The outcome should be a random normal variable with a mean of 60 and standard deviation of 20. If a patient is in the treatment group, add 5 to the outcome.
Starting with a sample size of 250, create a 95% bootstrap percentile interval for the mean of each group. Then create a new bootstrap interval by increasing the sample size by 250 until the sample is 2500. Thus you will create a total of 10 bootstrap intervals. Each bootstrap should create 1000 bootstrap samples. (4 points)
Produce a line chart that includes the bootstrapped mean and lower and upper percentile intervals for each group. Add appropriate labels and a legend. (6 points)
You may use base graphics or ggplot2. It should look similar to this (in base).
![bp interval plot](bpi.jpg)
```{r}
set.seed(100)
bootstrapPI <- function(sample.size=250, n.boot=1000, alpha=0.05) {
treatment <- sample(c(0,1), sample.size, replace = TRUE)
outcome <- rnorm(sample.size, mean=60, sd=20)
data <- data.frame(cbind(treatment, outcome))
fix.outcome <- ifelse(treatment == 1, outcome + 5, outcome)
group.0 <- fix.outcome[treatment == 0]
group.1 <- fix.outcome[treatment == 1]
means.0 <- replicate(n = n.boot, mean(sample(group.0, length(group.0), replace=TRUE)))
CI.0 <- quantile(means.0, c(0.025, 0.975))
means.1 <- replicate(n = n.boot, mean(sample(group.1, length(group.1), replace=TRUE)))
CI.1 <- quantile(means.1, c(0.025, 0.975))
bounds <- c(mean(means.0), CI.0, mean(means.1), CI.1)
group.0.res <- data.frame(m = bounds[1], lb = bounds[2], ub = bounds[3], treatment = 0, n = sample.size)
group.1.res <- data.frame(m = bounds[4], lb = bounds[5], ub = bounds[6], treatment = 1, n = sample.size)
all.res <- rbind(group.0.res, group.1.res)
return(all.res)
}
bootstrapPI()
interval <- data.frame(m = numeric(0), lb = numeric(0), ub = numeric(0), treatment = numeric(0), n = numeric(0))
for (i in 1:10) {
interval <- rbind(interval, bootstrapPI(sample.size=250*i, n.boot=1000, alpha=0.05))
rownames(interval) <- NULL
}
ggplot(data=interval, mapping = aes(x=interval$n, group = factor(treatment))) + geom_ribbon(aes(ymin = lb, ymax = ub, fill = factor(treatment)), alpha = 0.5) + geom_line(aes(y = m)) + labs("x" = "Number of Patients", "y" = "Mean") + scale_fill_discrete(name = "", labels=c("No Treatment","Treatment")) + theme_bw()
```
### QUESTION 4 ###
**15 points**
Programming with classes. The following function will generate random patient information.
```{r}
makePatient <- function() {
vowel <- grep("[aeiou]", letters)
cons <- grep("[^aeiou]", letters)
name <- paste(sample(LETTERS[cons], 1), sample(letters[vowel], 1), sample(letters[cons], 1), sep='')
gender <- factor(sample(0:1, 1), levels=0:1, labels=c('female','male'))
dob <- as.Date(sample(7500, 1), origin="1970-01-01")
n <- sample(6, 1)
doa <- as.Date(sample(1500, n), origin="2010-01-01")
pulse <- round(rnorm(n, 80, 10))
temp <- round(rnorm(n, 98.4, 0.3), 2)
fluid <- round(runif(n), 2)
list(name, gender, dob, doa, pulse, temp, fluid)
}
```
1. Create an S3 class `medicalRecord` for objects that are a list with the named elements `name`, `gender`, `date_of_birth`, `date_of_admission`, `pulse`, `temperature`, `fluid_intake`. Note that an individual patient may have multiple measurements for some measurements. Set the RNG seed to `8` and create a medical record by taking the output of `makePatient`. Print the medical record, and print the class of the medical record. (5 points)
```{r}
set.seed(8)
patient <- makePatient()
class(patient) <- "medicalRecord"
names(patient) <- c("name", "gender", "dob", "doa", "pulse", "temp", "fluid")
print(patient)
```
2. Write a `medicalRecord` method for the generic function `mean`, which returns averages for pulse, temperature and fluids. Also write a `medicalRecord` method for `print`, which employs some nice formatting, perhaps arranging measurements by date, and `plot`, that generates a composite plot of measurements over time. Call each function for the medical record created in part 1. (5 points)
```{r}
#install.packages("cowplot")
library(cowplot)
mean.medrec <- function(data) {
total.mean <- list(Pulse = mean(data$pulse), Temperature = mean(data$temp), Fluid_Intake = mean(data$fluid))
return(total.mean)
}
print.medrec <- function(data) {
cat(sprintf("Name: %s\nGender: %s\nDOB: %s", data$name, data$gender, data$dob), "\n")
visit <- data.frame(doa = data$doa, pulse = data$pulse, temperature = data$temp, fluid_intake = data$fluid)
visit <- visit[order(visit$doa), ]
names(visit) <- c("DOA", "Pulse", "Temperature", "Fluid Intake")
print(visit)
}
plot.medrec <- function(data) {
visit <- data.frame(doa = data$doa, pulse = data$pulse, temperature = data$temp, fluid_intake = data$fluid)
visit <- visit[order(visit$doa), ]
p1 <- ggplot(data = visit) + geom_line(mapping = aes(x = visit$doa, y = visit$pulse)) + labs(x = "Dates of Admission", y = "Pulse")
p2 <- ggplot(data = visit) + geom_line(mapping = aes(x = visit$doa, y = visit$temp)) + labs(x = "Dates of Admission", y = "Temperature")
p3 <- ggplot(data = visit) + geom_line(mapping = aes(x = visit$doa, y = visit$fluid)) + labs(x = "Dates of Admission", y = "Fluid Intake")
plot_grid(p1, p2, p3, ncol = 3, nrow = 1)
}
mean.medrec(data = patient)
print.medrec(data = patient)
plot.medrec(data = patient)
```
**JC Grading -3** You want to create functions mean.medicalRecord, print.medicalRecord, and plot.medicalRecord. Then you can simply call mean(obj) when obj is of class medicalRecord.
3. Create a further class for a cohort (group) of patients, and write methods for `mean` and `print` which, when applied to a cohort, apply mean or print to each patient contained in the cohort. Hint: think of this as a "container" for patients. Reset the RNG seed to 8 and create a cohort of ten patients, then show the output for `mean` and `print`. (5 points)
```{r}
set.seed(8)
cohort <- lapply(1:10, function (data) {makePatient()})
class(cohort) <- "group"
for (i in seq(1:10)) {
names(cohort[[i]]) <- c("name", "gender", "dob", "doa", "pulse", "temp", "fluid")
}
mean.cohort <- function (data) {
for (i in 1:10) {
pulse.cohort <- data[[i]]$pulse
temp.cohort <- data[[i]]$temp
fluid.cohort <- data[[i]]$fluid
}
total.mean <- list(Pulse = mean(pulse.cohort), Temperature = mean(temp.cohort), Fluid_Intake = mean(fluid.cohort))
return(total.mean)
}
mean.cohort(data = cohort)
print.cohort <- function (data) {
print(data)
cat(sprintf("Number in cohort: %s\nPulse: %s\nTemperature: %s\nFluid Intake: %s", length(data), mean.cohort(data)[1], mean.cohort(data)[2], mean.cohort(data)[3]), "\n")
}
print.cohort(data = cohort)
```
**JC Grading -4**
The suffix .cohort (as in mean.cohort) works on objects assigned the class 'cohort'. You've created an object called cohort, but it is assigned the class 'group'. You should be able to call mean(cohort) and print(cohort).