Skip to content

Commit

Permalink
relative -> aggregate for cell-type support
Browse files Browse the repository at this point in the history
  • Loading branch information
hsmaan committed Sep 13, 2023
1 parent 0e6a24f commit 2b5f268
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion docs/guidelines.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,7 @@ This case study demonstrates how the guidelines for single-cell integration in i

## Pulling the curtain

Now let's revisit what we knew about this data before we pretended to forget that we had cell-type labels - in batch 2 we did not have CD4+ T cells and CD14+ Monocytes, but they were present in batch 1. From [the manuscript](https://www.biorxiv.org/content/10.1101/2022.10.06.511156v3), we know that *relative cell-type support* and *minimum cell-type center distance* are the two facts that contribute to quantitation differences post-integration. In fact, this case-study used the same data (minus some tweaks and balancing) as the main pbmc perturbation experiments in the paper - the major difference being that batch 2 had two cell-types ablated compared to batch 1, instead of just 1 cell-type being downsampled or ablated. In the paper experiments, loss of biological heterogeneity was observed subsequently through the KNN-classification experiments for the downsampled/ablated cell-types and their related cell-types. As such, it should be no surprise that integrating with Seurat/Harmony leads to worse biological heterogeneity conservation scores for the cells from batch 2. We demonstrated how to use the guidelines to alleviate these effects even when we don't have cell-type labels, but it must be made clear that these are heuristics and must be utilized carefully. One factor that is not included is whether or not the researchers have a prior on whether or not there is going to be an imbalance present - such as in the case of developmental or tumor data. In this case, it may be prudent to actually consider tuning the integration method extensively before the first iteration.
Now let's revisit what we knew about this data before we pretended to forget that we had cell-type labels - in batch 2 we did not have CD4+ T cells and CD14+ Monocytes, but they were present in batch 1. From [the manuscript](https://www.biorxiv.org/content/10.1101/2022.10.06.511156v3), we know that *aggregate cell-type support* and *minimum cell-type center distance* are the two facts that contribute to quantitation differences post-integration. In fact, this case-study used the same data (minus some tweaks and balancing) as the main pbmc perturbation experiments in the paper - the major difference being that batch 2 had two cell-types ablated compared to batch 1, instead of just 1 cell-type being downsampled or ablated. In the paper experiments, loss of biological heterogeneity was observed subsequently through the KNN-classification experiments for the downsampled/ablated cell-types and their related cell-types. As such, it should be no surprise that integrating with Seurat/Harmony leads to worse biological heterogeneity conservation scores for the cells from batch 2. We demonstrated how to use the guidelines to alleviate these effects even when we don't have cell-type labels, but it must be made clear that these are heuristics and must be utilized carefully. One factor that is not included is whether or not the researchers have a prior on whether or not there is going to be an imbalance present - such as in the case of developmental or tumor data. In this case, it may be prudent to actually consider tuning the integration method extensively before the first iteration.

## Counterfactual - what if we didn't use the guidelines?

Expand Down
4 changes: 2 additions & 2 deletions workflow/analysis/R/10B_Iniq_Real_Datasets_Fig_4_Stat_Tests.R
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ fwrite(
col.names = TRUE
)

### Test 2 - testing whether or not the proposed metrics - relative celltype
### Test 2 - testing whether or not the proposed metrics - aggregate celltype
### support and minimum celltype center distance are significant in an ANOVA
### test for their predictive value for F1-scores

Expand Down Expand Up @@ -469,7 +469,7 @@ support_anova <- function(dataset, knn_class_df) {
return(anova_result_dt)
}

# Get the ANOVA results across all methods for relative celltype support
# Get the ANOVA results across all methods for aggregate celltype support
support_anova_results <- mapply(
support_anova,
dataset = datasets,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1447,7 +1447,7 @@ knn_support_plot <- function(dataset, knn_class_df, plot_height, plot_width) {
geom_jitter(aes(color = Log_support)) +
geom_boxplot(aes(fill = Log_support)) +
labs(
fill = "Relative \ncell-type \nsupport",
fill = "Aggregate \ncell-type \nsupport",
x = "Cell-type",
y = "F1-classification score post-integration"
) +
Expand Down

0 comments on commit 2b5f268

Please sign in to comment.