ranavirus_growth_bayesian.Rmd

---
title: "Analysis_Bayes_Class"
author: "Danielle Galvin"
output: html_document
---

WELCOME TO THE FUN ZONE

Loading our packages
```{r}
library(readr)
library(brms)
library(tidyverse)
library(rethinking)
library(janitor)
library(scales)
```

Bringing in the data and cleaning it up
```{r}
Data_full <- read_csv("Data/Data_full.csv")

#PLEASE IGNORE, THERE IS NO WIZARD HIDING BEHIND THE CURTAIN

#I want to make the data long so that all measurements, stages, and lengths are in single categories and are associated with their dates. This chunk will make measure instance and date of measurement.
#data_long <- Data_full %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, `6/15/2021`, `6/16/2021`, `6/17/2021`, `6/18/2021`, `6/19/2021`, `6/20/2021`, `6/21/2021`, `6/22/2021`, `6/23/2021`, `6/24/2021`, `6/25/2021`, `6/26/2021`, `6/27/2021`, `6/28/2021`, `6/29/2021`, `6/30/2021`, `7/1/2021`, `7/2/2021`, `7/3/2021`, `7/4/2021`, `7/5/2021`, `7/6/2021`, `7/7/2021`, `7/8/2021`, `7/9/2021`, `7/10/2021`, `7/11/2021`, `7/12/2021`, `7/13/2021`, `7/14/2021`, `7/15/2021`, `7/16/2021`, `7/17/2021`, `7/18/2021`, `7/19/2021`, `7/20/2021`, `7/21/2021`, `7/22/2021`, `7/23/2021`, `7/24/2021`, `7/25/2021`, `7/26/2021`, `7/27/2021`, `7/28/2021`, `7/29/2021`, `7/30/2021`, `7/31/2021`, `8/1/2021`, `8/2/2021`, `8/3/2021`, `8/4/2021`, `8/5/2021`, `8/6/2021`, `8/7/2021`, `8/8/2021`, `8/9/2021`, `8/10/2021`, `8/11/2021`, `8/12/2021`, `8/13/2021`, `8/14/2021`, `8/15/2021`, `8/16/2021`, `8/17/2021`, `8/18/2021`, `8/19/2021`, `8/20/2021`, `8/21/2021`, `8/22/2021`, `8/23/2021`, `8/24/2021`, `8/25/2021`, `8/26/2021`, `8/27/2021`, `8/28/2021`, `8/29/2021`, `8/30/2021`, `8/31/2021`, `9/1/2021`, `9/2/2021`, `9/3/2021`, `9/4/2021`, `9/5/2021`, `9/6/2021`, `9/7/2021`, `9/8/2021`, `9/9/2021`, `9/10/2021`, `9/11/2021`, `9/12/2021`, `9/13/2021`), names_to="measure_instance", values_to="measure_date")

#Now I want to combine all alive dates and alive status to cut down on colums
#data_longer <- data_long %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, measure_instance, measure_date), names_to="alive_date", values_to="alive_status")

#Now I want to collapse the mass measures
#data_longer_2 <- data_longer %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, length_1, stage_2, length_2, stage_3, length_3, stage_4, length_4, stage_5, length_5, measure_instance, measure_date, alive_date, alive_status), names_to="mass_instance", values_to="mass")

#Now I want to collapse stage measures
#data_longer_3 <- data_longer_2 %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, length_1, length_2, length_3, length_4, length_5, measure_instance, measure_date, alive_date, alive_status, mass_instance, mass), names_to="stage_instance", values_to="stage")

#Now collapse length
#data_longest <- data_longer_3 %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, measure_instance, measure_date, alive_date, alive_status, mass_instance, mass, stage_instance, stage), names_to="length_instance", values_to="length")

#Wait I think I messed up PLEASE ONLY LOOK AT THIS
data_guess <- Data_full %>% pivot_longer(cols=-c(individual, treatment_group, species_num, species, stage_1, mass_1, length_1, stage_2, mass_2, length_2, stage_3, mass_3, length_3, stage_4, mass_4, length_4, stage_5, mass_5, length_5, m_date1, m_date2, m_date3, m_date4, m_date5), names_to="date_status", values_to="alive_status")

#Adding a column for total days each individual survived
alive_data <- data_guess %>% group_by(individual) %>%  mutate(days_survived = sum(alive_status))

#Cutting the data frame down so my computer doesn't crash and ruin my life
alive_data_useme <- alive_data[, c("individual", "species_num", 
                                   "species", "days_survived", "treatment_group")] 

#Making control equal to 0
alive_data_useme$treatment_group[alive_data_useme$treatment_group == "control"] <-  0

#Okay really use this one!
real_data_alive <- alive_data_useme[!duplicated(alive_data_useme$individual),] %>% 
  transform(treatment_group=as.numeric(treatment_group)) %>% 
  view()
```

Model to investigate mortality date:
  y ~ dpois(lambda)
  log(lambda)  = a + bx
  a ~ ?
  b ~ ?

Simulate the priors
I am not currently happy with this. Based on the literature infection and mortality peaks at 10 days post exposure. How can I change the priors to reflect this information? I would expect that the peak would occur ealier based on a higher treatment group due to higher viral load.

REALLY NEED HELP WITH THIS AS MY PREDICTIONS ARE CURRENTLY GARBAGE
```{r}
N=1000 #Number of simulations

prior_sim <- tibble(a = rnorm(N, 0.05, 0.05),
                 b = rnorm(N, 0.05, 0.05),
                 sigma = rexp(N, 0.5),
                 sim = 1:N) 

# data (only the x values, since we're simulating y and mu and pretending we don't have them yet)
treatment_dg <- real_data_alive$treatment_group
d <- real_data_alive %>% mutate("treatment_sim" = treatment_dg)

# combine and simulate
prior_and_x_1 <- prior_sim %>% expand_grid(x_sim = treatment_dg) %>% # combine priors and x's
  mutate(lambda = exp(a + b*x_sim),                         # simulate regressions
         y = rpois(nrow(.), lambda=lambda))   # simulate data 
         
         
prior_and_x_1 %>% 
  ggplot(aes(x = x_sim, y = lambda, group = sim)) + 
  geom_line() +
  geom_point(aes(y = lambda)) +
  labs(y = "sim")+
  scale_y_log10()
```

Model to investigate mortality date
```{r}
#I am using a Poisson distribution since dates are integer values
d <- d %>% mutate("days" = days_survived)

get_prior(days ~ treatment_sim, 
          data = d,
          family = poisson(link="log"))


bsr_brm_dg <- brm(days ~ treatment_sim, 
               data = d,
               family = poisson(link="log"),
               prior = c(prior(normal(0.05, 0.05), class = "Intercept"),
                         prior(normal(0.05, 0.05), class = "b", coef="treatment_sim"),
                         prior(exponential(0.5), class="b")), 
               cores = 1, chains = 1, iter = 1000,
               sample_prior = "yes")

bsr_brm_dg
plot(conditional_effects(bsr_brm_dg), points = T)


posteriors_bsr_brm_dg <- as_draws_df(bsr_brm_dg) %>% as_tibble() %>% 
  mutate(lambda = exp(b_Intercept),
         lambda_prior = exp(prior_Intercept))

ggplot(posteriors_bsr_brm_dg %>% filter(.draw <=1000), aes(x=b_treatment_sim, y=lambda_prior)) + 
  geom_point()
```

If you happen to be able to make the above data worth anything, I would appreciate if you had the opportunity to incorporate species into the model as well. Species number 1 is Pseudacris maculata (boreal chorus frog) which is very small and has no published data regarding survivability toward ranavirus. Species number 2 is Rana pipiens (Northern leopard frog) which is larger and has some published data regarding survivability to ranavirus, although results show that survival is not only dose dependent, but it is also dependent on the virulence of the specific strain. The strain I used was isolated from a Wood frog (Rana sylvatica) in Maine and the virulence in unknown to me.