Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rphylopars #1

Open
wcornwell opened this issue May 16, 2023 · 5 comments
Open

rphylopars #1

wcornwell opened this issue May 16, 2023 · 5 comments

Comments

@wcornwell
Copy link
Collaborator

wcornwell commented May 16, 2023

i found some code from Jess Tam's project that used rphylopars and it looks like it works here with a few changes:

library(V.PhyloMaker2)
library(dplyr)
library(Rphylopars)
library(ggplot2)

df <-
  data.frame(
    species = aus$taxon_name,
    genus = aus$genus,
    family = aus$family
  )
df <- distinct(df)
#tree <- phylo.maker(df)
#saveRDS(tree,"autraitstree.rds")
readRDS("autraitstree.rds")
phylo <- tree[[1]]

lma <- filter(aus, trait_name == "leaf_mass_per_area") %>%
  select(taxon_name, trait_value) %>%
  mutate(trait_value2 = log10(as.numeric(trait_value))) %>%
  select(taxon_name, trait_value2) %>%
  group_by(taxon_name) %>%
  summarize(trait_value2 = mean(trait_value2))

lma_vec <- data.frame(species = lma$taxon_name,
                      lma.log = lma$trait_value2)
rownames(lma_vec) <- gsub(" ", "_", lma$taxon_name)
lma_vec$species <- gsub(" ", "_", lma_vec$species)
lma_vec <- filter(lma_vec, lma_vec$species %in% phylo$tip.label)

# Run phylogenetic comparative analysis
phylopars_result <-
  phylopars(trait_data = lma_vec,
            tree = phylo,
            model = "BM")

# Set the seed for reproducibility
set.seed(123)

# Randomly assign NA to 10% of the data
lma_vec_missing <- lma_vec
lma_vec_missing$lma.log[sample(nrow(lma_vec), nrow(lma_vec) * 0.10)] <-
  NA

# Run phylogenetic comparative analysis
phylopars_result_missing <-
  phylopars(trait_data = lma_vec_missing,
            tree = phylo,
            model = "BM")

# Extract the estimated values
estimated_values <- phylopars_result_missing$anc_recon

rownames_to_column(data.frame(estimated_values), var = "species") %>%
  filter(species %in% lma_vec_missing$species) %>%
  rename(estimated_lma_log = lma.log) -> est

left_join(lma_vec_missing, est) %>%
  rename(lma_with_missing = lma.log) -> out

left_join(out, lma_vec) -> out

out %>%
  filter(is.na(lma_with_missing)) -> missing_data

missing_data %>%
  ggplot(aes(x = estimated_lma_log, y = lma.log)) + geom_point()

summary(lm(lma.log ~ estimated_lma_log, data = missing_data))

@wcornwell
Copy link
Collaborator Author

getting r2 values of like 0.57 out of the box for LMA

@wcornwell
Copy link
Collaborator Author

image

@wcornwell
Copy link
Collaborator Author

wcornwell commented May 16, 2023

i think the outliers are sorta interesting. Looks like it's situations were congenerics have very different values in austraits from the species that was left out in the randomization (true values are the far right column here, imputed values are the middle):
Screenshot 2023-05-16 at 4 26 44 pm
Screenshot 2023-05-16 at 4 26 04 pm

@wcornwell
Copy link
Collaborator Author

@dcol2804 here is two trait example with phylopars

plant_height <- filter(aus, trait_name == "plant_height") %>%
  select(taxon_name, trait_value) %>%
  mutate(trait_value2 = log10(as.numeric(trait_value))) %>%
  select(taxon_name, trait_value2) %>%
  group_by(taxon_name) %>%
  summarize(trait_value2 = mean(trait_value2))

leaf_area <- filter(aus, trait_name == "leaf_area") %>%
  select(taxon_name, trait_value) %>%
  mutate(trait_value2 = log10(as.numeric(trait_value))) %>%
  select(taxon_name, trait_value2) %>%
  group_by(taxon_name) %>%
  summarize(trait_value2 = mean(trait_value2))


lma_vec <- data.frame(species = lma$taxon_name,
                      lma.log = lma$trait_value2)
rownames(lma_vec) <- gsub(" ", "_", lma$taxon_name)
lma_vec$species <- gsub(" ", "_", lma_vec$species)
lma_vec <- filter(lma_vec, lma_vec$species %in% phylo$tip.label)

both <- data.frame(species = plant_height$taxon_name,
                      height.log = plant_height$trait_value2) %>%
  full_join(data.frame(species = leaf_area$taxon_name,
                       leaf_area.log = leaf_area$trait_value2))
rownames(both) <- gsub(" ", "_", both$species)
both$species <- gsub(" ", "_", both$species)
both_ss<- filter(both, both$species %in% phylo$tip.label)

# Run phylogenetic comparative analysis
phylopars_result <-
  phylopars(trait_data = both_ss,
            tree = phylo,
            model = "BM")

@dcol2804
Copy link
Owner

Thanks Will, that's awesome! I'll run the same thing in BHPMF and compare results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants