Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to filter non-significant odd named taxa, and only keep the significant odd named taxa? #324

Open
catherineel opened this issue Oct 25, 2021 · 8 comments

Comments

@catherineel
Copy link

Hi there!

I've been using metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
to remove odd taxa, but some of the odd named taxa are significant and I would like them to be displayed on the tree.

Is there a way to only display the significant odd named taxa?

@zachary-foster
Copy link
Contributor

What do you mean by significant? Can you give me an example? You can make a list of taxa you want to be displayed no matter what and do this:

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$") | taxon_names %in% my_taxon_name_list, reassign_obs = FALSE)

@catherineel
Copy link
Author

Statistical signifiance after correcting for multiple comparisons. This is what I did:

create a new column called wilcox_p_value_p.adjusted to correct for multiple comparison

obj$data$diff_table$wilcox_p_value_p.adjusted <- p.adjust(obj$data$diff_table$wilcox_p_value,
                                                          method = "fdr")

create a new column in diff_table containing log2_median ratio, then mutate this to remove values where wilcox.p.adjusted value is not significant, first create this new column with identical values
obj$data$diff_table$log2_median_ratio_wilcox.adjust <- obj$data$diff_table$log2_median_ratio

then mutate this new column to remove non-signif values
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0

Then I created the tree to only display significant taxa after correcting for multiple comparisons at the genus level

set.seed(1)
obj %>% 
  metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
  metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%  
  heat_tree_matrix(
                   data = "diff_table",
                   node_size = n_obs,
                   node_label = taxon_names,
                   node_color = log2_median_ratio_wilcox.adjust, 
                   node_color_range = diverging_palette(), 
                   node_color_trans = "linear", 
                   node_color_interval = c(-8, 8), 
                   edge_color_interval = c(-8, 8), 
                   node_size_axis_label = "Number of OTUs",
                   node_color_axis_label = "Log2 ratio median proportions",
                   layout = "davidson-harel", 
                   initial_layout = "reingold-tilford", 
                   output_file = "diff tree.pdf")

Let me know if I am doing anything wrong

@zachary-foster
Copy link
Contributor

Ok, I understand now. Thanks for the code! I see that you set the non-significant taxa to 0 but I dont see where you are filtering them out. Either way, if you want to remove and taxa with odd names that are not significant you can do something like:

metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05  & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

@catherineel
Copy link
Author

Thanks for that, but unfortunately I get this error when I replace

metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
with
metacoder::filter_taxa(! (wilcox_p_value_p.adjusted > 0.05 & ! grepl(taxon_names, pattern = "^[a-zA-Z]+$")), reassign_obs = FALSE)

Error: TRUE/FALSE vector (length = 1452) must be the same length as the number of taxa (242)

Oh did I do something wrong? I thought I did filter them out by having this line:
obj$data$diff_table$log2_median_ratio_wilcox.adjust[obj$data$diff_table$wilcox_p_value_p.adjusted > 0.05] <- 0
as it would filter the non signif ones after mutating and by choosing it to be displayed in the node_colour section?
Somehow it looked like it was filtered out in my tree when I did this

set.seed(1)
obj %>%
metacoder::filter_taxa(taxon_ranks == "g", supertaxa = TRUE, reassign_obs = FALSE) %>%
metacoder::filter_taxa(grepl(taxon_names, pattern = "^[a-zA-Z]+$"), reassign_obs = FALSE) %>%
heat_tree_matrix(
data = "diff_table",
node_size = n_obs,
node_label = taxon_names,
node_color = log2_median_ratio_wilcox.adjust,
node_color_range = diverging_palette(),
node_color_trans = "linear",
node_color_interval = c(-8, 8),
edge_color_interval = c(-8, 8),
node_size_axis_label = "Number of OTUs",
node_color_axis_label = "Log2 ratio median proportions",
layout = "davidson-harel",
initial_layout = "reingold-tilford",
output_file = "diff tree.pdf")

@zachary-foster
Copy link
Contributor

Can you send me an example data set with associated code that reproduces the issue? Its hard for me to debug without reproducing the error.

@catherineel
Copy link
Author

Sorry dumb question, but how do I send an example data?

My original data file is huge as it's a qza file from QIIME2 analysis and I'm not sure what I need to do to it.

@zachary-foster
Copy link
Contributor

No problem, its a common question.

If you can reproduce the error with a subset of the data, you can attach it to this issue to upload them. You can save the needed R objects to a file with readRDS at the point before the example code starts. You can also email the original data at [email protected] if you dont want it public and its small enough to email.

@catherineel
Copy link
Author

Thanks, I just emailed it to you!
I'm not sure if I did it correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants