updated vignettes, added demonstration of criterion filtering for rob…

…ustness
saeyslab · Sep 5, 2024 · 70678ec · 70678ec
1 parent 2123298
commit 70678ec
Show file tree

Hide file tree

Showing 69 changed files with 892 additions and 1,311 deletions.
diff --git a/R/pipeline.R b/R/pipeline.R
@@ -370,11 +370,7 @@ multi_nichenet_analysis = function(sce,
     batches = batches)
 
   ## check for condition-specific cell types
-  sample_group_celltype_df = abundance_info$abundance_data %>% filter(n > min_cells) %>% ungroup() %>% distinct(sample_id, group_id) %>% cross_join(abundance_info$abundance_data %>% ungroup() %>% distinct(celltype_id)) %>% arrange(sample_id)
-  abundance_df = sample_group_celltype_df %>% left_join(abundance_info$abundance_data %>% ungroup())
-  abundance_df$n[is.na(abundance_df$n)] = 0
-  abundance_df$keep[is.na(abundance_df$keep)] = FALSE
-  abundance_df_summarized = abundance_df %>% mutate(keep = as.logical(keep)) %>% group_by(group_id, celltype_id) %>% summarise(samples_present = sum((keep)))
+  abundance_df_summarized = abundance_info$abundance_data %>% mutate(keep = as.logical(keep)) %>% group_by(group_id, celltype_id) %>% summarise(samples_present = sum((keep)))
   celltypes_absent_one_condition = abundance_df_summarized %>% filter(samples_present == 0) %>% pull(celltype_id) %>% unique() # find truly condition-specific cell types by searching for cell types truely absent in at least one condition
   celltypes_present_one_condition = abundance_df_summarized %>% filter(samples_present >= 2) %>% pull(celltype_id) %>% unique() # require presence in at least 2 samples of one group so it is really present in at least one condition
   condition_specific_celltypes = intersect(celltypes_absent_one_condition, celltypes_present_one_condition)

diff --git a/vignettes/basic_analysis_steps_MISC.Rmd b/vignettes/basic_analysis_steps_MISC.Rmd
@@ -284,24 +284,7 @@ __Important__: Based on the cell type abundance diagnostics, we recommend users
 Running the following block of code can help you determine which cell types are condition-specific and which cell types are absent. 
 
 ```{r}
-sample_group_celltype_df = abundance_info$abundance_data %>% 
-  filter(n > min_cells) %>% 
-  ungroup() %>% 
-  distinct(sample_id, group_id) %>% 
-  cross_join(
-    abundance_info$abundance_data %>% 
-      ungroup() %>% 
-      distinct(celltype_id)
-    ) %>% 
-  arrange(sample_id)
-
-abundance_df = sample_group_celltype_df %>% left_join(
-  abundance_info$abundance_data %>% ungroup()
-  )
-
-abundance_df$n[is.na(abundance_df$n)] = 0
-abundance_df$keep[is.na(abundance_df$keep)] = FALSE
-abundance_df_summarized = abundance_df %>% 
+abundance_df_summarized = abundance_info$abundance_data %>% 
   mutate(keep = as.logical(keep)) %>% 
   group_by(group_id, celltype_id) %>% 
   summarise(samples_present = sum((keep)))
@@ -1000,10 +983,43 @@ Because of this, interactions in this plot may be interesting candidates for fol
 
 __Note__: These networks were generated by only looking at the top50 interactions overall. In practice, we encourage users to explore more hits than the top50, certainly if many cell type pairs are considered in the analysis. 
 
-All the previous were informative for interactions where both the sender and receiver cell types are captured in the data and where ligand and receptor are sufficiently expressed at the RNA level. However, these two conditions are not always fulfilled and some interesting cell-cell communication signals may be missed as a consequence. Can we still have an idea about these potentially missed interactions? Yes, we can.
+## Filter interactions based on specific prioritization criteria
+
+For some use cases, users could want to filter some interactions based on certain criteria. For example, if you would only be interested in seeing interactions that are strongly expressed in all samples within a condition, you could filter on that criterion as we will demonstrate now.
+
+The scores for the individual criteria can be inspected in this data frame:
+
+```{r}
+multinichenet_output$prioritization_tables$group_prioritization_tbl
+```
+
+To only consider interactions with sufficiently high ligand-and-receptor expression in all samples of a condition (MIS-C as example, all samples: `fraction_expressing_ligand_receptor` = 1), you can run this line of code to extract all CCI ids that fullfill this:
+
+```{r}
+filtered_ids = multinichenet_output$prioritization_tables$group_prioritization_tbl %>% filter(fraction_expressing_ligand_receptor == 1 & group == "M") %>% pull(id)
+```
+
+Now: continue only with these filtered CCIs in the top100 generally prioritized interactions for the M-group
+```{r}
+prioritized_tbl_oi_M_100_filtered = get_top_n_lr_pairs(
+  multinichenet_output$prioritization_tables, 
+  100, 
+  groups_oi = "M") %>% filter(id %in% filtered_ids)
+```
+
+```{r, fig.height=10, fig.width=17}
+plot_oi = make_sample_lr_prod_activity_plots_Omnipath(
+  multinichenet_output$prioritization_tables, 
+  prioritized_tbl_oi_M_100_filtered %>% inner_join(lr_network_all)
+)
+plot_oi
+```
+Note that this we don't recommend this as a general strategy. In general, the default prioritization framework finds a tradeoff between relevant aspects of CCC, of which sufficent expression is one criterion. However, in some use-cases, users may want to emphasize some properties more than others, and for such cases, this downstream filtering may be helpful.
 
 ## Visualize sender-agnostic ligand activities for each receiver-group combination
 
+All the previous figures were informative for interactions where both the sender and receiver cell types are captured in the data and where ligand and receptor are sufficiently expressed at the RNA level. However, these two conditions are not always fulfilled and some interesting cell-cell communication signals may be missed as a consequence. Can we still have an idea about these potentially missed interactions? Yes, we can.
+
 In the next type of plot, we plot all the ligand activities (both scaled and absolute activities) of each receiver-condition combination. This can give us some insights in active signaling pathways across conditions. Note that we can thus show top ligands based on ligand activity - irrespective and agnostic of expression in sender. Benefits of this analysis are the possibility to infer the activity of ligands that are expressed by cell types that are not in your single-cell dataset or that are hard to pick up at the RNA level. 
 
 The following block of code will show how to visualize the activities for the top5 ligands for each receiver cell type - condition combination: