-
Notifications
You must be signed in to change notification settings - Fork 1
/
ranavirus_growth_bayesian.Rmd
144 lines (95 loc) · 7.31 KB
/
ranavirus_growth_bayesian.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
title: "Analysis_Bayes_Class"
author: "Danielle Galvin"
output: html_document
---
WELCOME TO THE FUN ZONE
Loading our packages
```{r}
library(readr)
library(brms)
library(tidyverse)
library(rethinking)
library(janitor)
library(scales)
```
Bringing in the data and cleaning it up
```{r}
Data_full <- read_csv("Data/Data_full.csv")
#PLEASE IGNORE, THERE IS NO WIZARD HIDING BEHIND THE CURTAIN
#I want to make the data long so that all measurements, stages, and lengths are in single categories and are associated with their dates. This chunk will make measure instance and date of measurement.
#data_long <- Data_full %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, `6/15/2021`, `6/16/2021`, `6/17/2021`, `6/18/2021`, `6/19/2021`, `6/20/2021`, `6/21/2021`, `6/22/2021`, `6/23/2021`, `6/24/2021`, `6/25/2021`, `6/26/2021`, `6/27/2021`, `6/28/2021`, `6/29/2021`, `6/30/2021`, `7/1/2021`, `7/2/2021`, `7/3/2021`, `7/4/2021`, `7/5/2021`, `7/6/2021`, `7/7/2021`, `7/8/2021`, `7/9/2021`, `7/10/2021`, `7/11/2021`, `7/12/2021`, `7/13/2021`, `7/14/2021`, `7/15/2021`, `7/16/2021`, `7/17/2021`, `7/18/2021`, `7/19/2021`, `7/20/2021`, `7/21/2021`, `7/22/2021`, `7/23/2021`, `7/24/2021`, `7/25/2021`, `7/26/2021`, `7/27/2021`, `7/28/2021`, `7/29/2021`, `7/30/2021`, `7/31/2021`, `8/1/2021`, `8/2/2021`, `8/3/2021`, `8/4/2021`, `8/5/2021`, `8/6/2021`, `8/7/2021`, `8/8/2021`, `8/9/2021`, `8/10/2021`, `8/11/2021`, `8/12/2021`, `8/13/2021`, `8/14/2021`, `8/15/2021`, `8/16/2021`, `8/17/2021`, `8/18/2021`, `8/19/2021`, `8/20/2021`, `8/21/2021`, `8/22/2021`, `8/23/2021`, `8/24/2021`, `8/25/2021`, `8/26/2021`, `8/27/2021`, `8/28/2021`, `8/29/2021`, `8/30/2021`, `8/31/2021`, `9/1/2021`, `9/2/2021`, `9/3/2021`, `9/4/2021`, `9/5/2021`, `9/6/2021`, `9/7/2021`, `9/8/2021`, `9/9/2021`, `9/10/2021`, `9/11/2021`, `9/12/2021`, `9/13/2021`), names_to="measure_instance", values_to="measure_date")
#Now I want to combine all alive dates and alive status to cut down on colums
#data_longer <- data_long %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, measure_instance, measure_date), names_to="alive_date", values_to="alive_status")
#Now I want to collapse the mass measures
#data_longer_2 <- data_longer %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, length_1, stage_2, length_2, stage_3, length_3, stage_4, length_4, stage_5, length_5, measure_instance, measure_date, alive_date, alive_status), names_to="mass_instance", values_to="mass")
#Now I want to collapse stage measures
#data_longer_3 <- data_longer_2 %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, length_1, length_2, length_3, length_4, length_5, measure_instance, measure_date, alive_date, alive_status, mass_instance, mass), names_to="stage_instance", values_to="stage")
#Now collapse length
#data_longest <- data_longer_3 %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, measure_instance, measure_date, alive_date, alive_status, mass_instance, mass, stage_instance, stage), names_to="length_instance", values_to="length")
#Wait I think I messed up PLEASE ONLY LOOK AT THIS
data_guess <- Data_full %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, m_date1, m_date2, m_date3, m_date4, m_date5), names_to="date_status", values_to="alive_status")
#Adding a column for total days each individual survived
alive_data <- data_guess %>% group_by(individual) %>% mutate(days_survived = sum(alive_status))
#Cutting the data frame down so my computer doesn't crash and ruin my life
alive_data_useme <- alive_data[, c("individual", "species_num",
"species", "days_survived", "treatment_group")]
#Making control equal to 0
alive_data_useme$treatment_group[alive_data_useme$treatment_group == "control"] <- 0
#Okay really use this one!
real_data_alive <- alive_data_useme[!duplicated(alive_data_useme$individual),] %>%
transform(treatment_group=as.numeric(treatment_group)) %>%
view()
```
Model to investigate mortality date:
y ~ dpois(lambda)
log(lambda) = a + bx
a ~ ?
b ~ ?
Simulate the priors
I am not currently happy with this. Based on the literature infection and mortality peaks at 10 days post exposure. How can I change the priors to reflect this information? I would expect that the peak would occur ealier based on a higher treatment group due to higher viral load.
REALLY NEED HELP WITH THIS AS MY PREDICTIONS ARE CURRENTLY GARBAGE
```{r}
N=1000 #Number of simulations
prior_sim <- tibble(a = rnorm(N, 0.05, 0.05),
b = rnorm(N, 0.05, 0.05),
sigma = rexp(N, 0.5),
sim = 1:N)
# data (only the x values, since we're simulating y and mu and pretending we don't have them yet)
treatment_dg <- real_data_alive$treatment_group
d <- real_data_alive %>% mutate("treatment_sim" = treatment_dg)
# combine and simulate
prior_and_x_1 <- prior_sim %>% expand_grid(x_sim = treatment_dg) %>% # combine priors and x's
mutate(lambda = exp(a + b*x_sim), # simulate regressions
y = rpois(nrow(.), lambda=lambda)) # simulate data
prior_and_x_1 %>%
ggplot(aes(x = x_sim, y = lambda, group = sim)) +
geom_line() +
geom_point(aes(y = lambda)) +
labs(y = "sim")+
scale_y_log10()
```
Model to investigate mortality date
```{r}
#I am using a Poisson distribution since dates are integer values
d <- d %>% mutate("days" = days_survived)
get_prior(days ~ treatment_sim,
data = d,
family = poisson(link="log"))
bsr_brm_dg <- brm(days ~ treatment_sim,
data = d,
family = poisson(link="log"),
prior = c(prior(normal(0.05, 0.05), class = "Intercept"),
prior(normal(0.05, 0.05), class = "b", coef="treatment_sim"),
prior(exponential(0.5), class="b")),
cores = 1, chains = 1, iter = 1000,
sample_prior = "yes")
bsr_brm_dg
plot(conditional_effects(bsr_brm_dg), points = T)
posteriors_bsr_brm_dg <- as_draws_df(bsr_brm_dg) %>% as_tibble() %>%
mutate(lambda = exp(b_Intercept),
lambda_prior = exp(prior_Intercept))
ggplot(posteriors_bsr_brm_dg %>% filter(.draw <=1000), aes(x=b_treatment_sim, y=lambda_prior)) +
geom_point()
```
If you happen to be able to make the above data worth anything, I would appreciate if you had the opportunity to incorporate species into the model as well. Species number 1 is Pseudacris maculata (boreal chorus frog) which is very small and has no published data regarding survivability toward ranavirus. Species number 2 is Rana pipiens (Northern leopard frog) which is larger and has some published data regarding survivability to ranavirus, although results show that survival is not only dose dependent, but it is also dependent on the virulence of the specific strain. The strain I used was isolated from a Wood frog (Rana sylvatica) in Maine and the virulence in unknown to me.