relative -> aggregate for cell-type support

hsmaan · Sep 13, 2023 · 2b5f268 · 2b5f268
1 parent 0e6a24f
commit 2b5f268
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 4 deletions.
diff --git a/docs/guidelines.Rmd b/docs/guidelines.Rmd
@@ -643,7 +643,7 @@ This case study demonstrates how the guidelines for single-cell integration in i
 
 ## Pulling the curtain
 
-Now let's revisit what we knew about this data before we pretended to forget that we had cell-type labels - in batch 2 we did not have CD4+ T cells and CD14+ Monocytes, but they were present in batch 1. From [the manuscript](https://www.biorxiv.org/content/10.1101/2022.10.06.511156v3), we know that *relative cell-type support* and *minimum cell-type center distance* are the two facts that contribute to quantitation differences post-integration. In fact, this case-study used the same data (minus some tweaks and balancing) as the main pbmc perturbation experiments in the paper - the major difference being that batch 2 had two cell-types ablated compared to batch 1, instead of just 1 cell-type being downsampled or ablated. In the paper experiments, loss of biological heterogeneity was observed subsequently through the KNN-classification experiments for the downsampled/ablated cell-types and their related cell-types. As such, it should be no surprise that integrating with Seurat/Harmony leads to worse biological heterogeneity conservation scores for the cells from batch 2. We demonstrated how to use the guidelines to alleviate these effects even when we don't have cell-type labels, but it must be made clear that these are heuristics and must be utilized carefully. One factor that is not included is whether or not the researchers have a prior on whether or not there is going to be an imbalance present - such as in the case of developmental or tumor data. In this case, it may be prudent to actually consider tuning the integration method extensively before the first iteration.
+Now let's revisit what we knew about this data before we pretended to forget that we had cell-type labels - in batch 2 we did not have CD4+ T cells and CD14+ Monocytes, but they were present in batch 1. From [the manuscript](https://www.biorxiv.org/content/10.1101/2022.10.06.511156v3), we know that *aggregate cell-type support* and *minimum cell-type center distance* are the two facts that contribute to quantitation differences post-integration. In fact, this case-study used the same data (minus some tweaks and balancing) as the main pbmc perturbation experiments in the paper - the major difference being that batch 2 had two cell-types ablated compared to batch 1, instead of just 1 cell-type being downsampled or ablated. In the paper experiments, loss of biological heterogeneity was observed subsequently through the KNN-classification experiments for the downsampled/ablated cell-types and their related cell-types. As such, it should be no surprise that integrating with Seurat/Harmony leads to worse biological heterogeneity conservation scores for the cells from batch 2. We demonstrated how to use the guidelines to alleviate these effects even when we don't have cell-type labels, but it must be made clear that these are heuristics and must be utilized carefully. One factor that is not included is whether or not the researchers have a prior on whether or not there is going to be an imbalance present - such as in the case of developmental or tumor data. In this case, it may be prudent to actually consider tuning the integration method extensively before the first iteration.
 
 ## Counterfactual - what if we didn't use the guidelines?
 

diff --git a/workflow/analysis/R/10B_Iniq_Real_Datasets_Fig_4_Stat_Tests.R b/workflow/analysis/R/10B_Iniq_Real_Datasets_Fig_4_Stat_Tests.R
@@ -298,7 +298,7 @@ fwrite(
   col.names = TRUE
 )
 
-### Test 2 - testing whether or not the proposed metrics - relative celltype 
+### Test 2 - testing whether or not the proposed metrics - aggregate celltype 
 ### support and minimum celltype center distance are significant in an ANOVA
 ### test for their predictive value for F1-scores 
 
@@ -469,7 +469,7 @@ support_anova <- function(dataset, knn_class_df) {
   return(anova_result_dt)
 }
 
-# Get the ANOVA results across all methods for relative celltype support
+# Get the ANOVA results across all methods for aggregate celltype support
 support_anova_results <- mapply(
   support_anova,
   dataset = datasets,

diff --git a/workflow/analysis/R/10_Iniq_Real_Datasets_Fig_4_Analysis_Plots.R b/workflow/analysis/R/10_Iniq_Real_Datasets_Fig_4_Analysis_Plots.R
@@ -1447,7 +1447,7 @@ knn_support_plot <- function(dataset, knn_class_df, plot_height, plot_width) {
     geom_jitter(aes(color = Log_support)) +
     geom_boxplot(aes(fill = Log_support)) +
     labs(
-      fill = "Relative \ncell-type \nsupport",
+      fill = "Aggregate \ncell-type \nsupport",
       x = "Cell-type",
       y = "F1-classification score post-integration"
     ) +