Different DA results with the same K and P values #361

geena-wai · 2025-02-27T10:29:46Z

Hello,

I'm trying to do DA on a subset of cells from my data.
My original data set contained almost 50K cells and the subsetted object contains 1617 cells.
I have 3 conditions in my object, control, one injury, two injury with 3 rats contributing to each condition respectively so 9 samples in total.
I originally processed my data in Seurat, so the input for the analysis was converted from a seurat object, then to singlecellexperimentobject, then to milo object.
I have 9 samples in total so the k and prop parameters so the optimal NhoodSize histogram would be around 45 (or between 50 - 100 according to tutorials). After optimization, I decided that these are the best parameters for DA analysis:
k = 15, prop = 0.1, d = 30

One of my concerns is that when I build my milo object with these parameters and do DA analysis on it, different runs yield different results even though the k, prop and d are not changed. So for example, when I build the milo object and do DA in run 1 with k = 15, prop = 0.1, d = 30, I get 2 neighborhoods that have a significant SpatialFDR value smaller than 0.1. And then in run 2, I build the milo object again and do DA with the same parameters, k = 15, prop = 0.1, d = 30, I get 8 neighborhoods with a significant SpatialFDR value smaller than 0.1. And then in run 3, I do the exact same thing, but this time, I have no significant DA neighborhoods.

I have no idea which DA result to trust. If I choose the run with the most DA results, it feels like cherry-picking. But I also feel like I might miss out on something if I choose the run with no DA results. So what should I do? Should there be any variation in DA results from different runs?

I'm using miloR 2.2.0 so here is my code. I'm using R but in a linux shell on a high performance cluster if that's relevant.

-------------------------------------------------------------------------------------------------------------------------------------------

library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)
library(scuttle)
library(glue)

construct KNN graph

milo <- buildGraph(milo, k=15, d=30, reduced.dim = "PCA")

For sampling purposes

# d in makeNhoods should be the same as d in buildGraph

milo <- makeNhoods(milo, prop=0.1, k=15, d=30, refined = TRUE, reduced_dims = "PCA")

Count the cells in each neighborhood

# sample should be the Rat number here -- Sham, Sham4, Sham6... etc

milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample="sample")

Building neighbourhood graph

milo <- buildNhoodGraph(milo)

Setting up contrasts for differential abundance testing

milo_design <- data.frame(colData(milo))[,c("sample", "condition")]
milo_design <- distinct(milo_design)
rownames(milo_design) <- milo_design$sample

Test for differential abundance

da_results <- testNhoods(milo, design = ~ 0 + condition, design.df = milo_design, model.contrasts = "conditionmTBI2x - conditionSham")
da_results %>% arrange(SpatialFDR) %>% head()

-------------------------------------------------------------------------------------------------------------------------------------------

This is the sessionInfo()

sessionInfo()
Failed to query server: Operation not permitted
R version 4.4.2 (2024-10-31)
Platform: x86_64-conda-linux-gnu
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /data1/gwai/miniconda3/envs/Milo_ENV/lib/libopenblasp-r0.3.28.so; LAPACK version 3.12.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Asia/Hong_Kong
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] glue_1.8.0 patchwork_1.3.0
[3] dplyr_1.1.4 scran_1.34.0
[5] scater_1.34.0 ggplot2_3.5.1
[7] scuttle_1.16.0 SingleCellExperiment_1.28.0
[9] SummarizedExperiment_1.36.0 Biobase_2.66.0
[11] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
[13] IRanges_2.40.0 S4Vectors_0.44.0
[15] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
[17] matrixStats_1.5.0 miloR_2.2.0
[19] edgeR_4.4.0 limma_3.62.1

loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2 vipor_0.4.7
[4] farver_2.1.2 viridis_0.6.5 ggraph_2.2.1
[7] fastmap_1.2.0 pracma_2.4.4 bluster_1.16.0
[10] tweenr_2.0.3 rsvd_1.0.5 lifecycle_1.0.4
[13] cluster_2.1.8 statmod_1.5.0 magrittr_2.0.3
[16] compiler_4.4.2 rlang_1.1.5 tools_4.4.2
[19] igraph_2.0.3 labeling_0.4.3 dqrng_0.3.2
[22] S4Arrays_1.6.0 graphlayouts_1.2.2 DelayedArray_0.32.0
[25] RColorBrewer_1.1-3 abind_1.4-5 BiocParallel_1.40.0
[28] withr_3.0.2 purrr_1.0.2 numDeriv_2016.8-1.1
[31] grid_4.4.2 polyclip_1.10-7 beachmat_2.22.0
[34] colorspace_2.1-1 scales_1.3.0 gtools_3.9.5
[37] MASS_7.3-64 cli_3.6.3 crayon_1.5.3
[40] generics_0.1.3 metapod_1.14.0 httr_1.4.7
[43] ggbeeswarm_0.7.2 cachem_1.1.0 ggforce_0.4.2
[46] stringr_1.5.1 splines_4.4.2 zlibbioc_1.52.0
[49] parallel_4.4.2 XVector_0.46.0 vctrs_0.6.5
[52] Matrix_1.7-2 jsonlite_1.8.9 BiocSingular_1.22.0
[55] BiocNeighbors_2.0.0 ggrepel_0.9.6 irlba_2.3.5.1
[58] beeswarm_0.4.0 locfit_1.5-9.11 tidyr_1.3.1
[61] codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4
[64] gtable_0.3.6 UCSC.utils_1.2.0 ScaledMatrix_1.14.0
[67] munsell_0.5.1 tibble_3.2.1 pillar_1.10.1
[70] GenomeInfoDbData_1.2.13 R6_2.5.1 tidygraph_1.3.0
[73] lattice_0.22-6 memoise_2.0.1 Rcpp_1.0.14
[76] gridExtra_2.3 SparseArray_1.6.0 pkgconfig_2.0.3
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1

The text was updated successfully, but these errors were encountered:

MikeDMorgan · 2025-02-27T10:39:22Z

Just to be clear, are you trying to run DA testing with just 1617 cells?

geena-wai · 2025-02-27T12:17:10Z

@MikeDMorgan yes just on the 1617 cells, they are the same cell type

MikeDMorgan · 2025-02-27T12:50:04Z

I would be very surprised that with just 1617 cells on the same cell type you would find much, especially with 3 replicates for each of 3 conditions.

The reason for the variability is that the nhood refinement and graph building both have a stochastic element, so you need to use set.seed to get consistent results.

geena-wai · 2025-02-27T13:24:49Z

I see. Thanks for your explanation!

My snRNA data is on the brain. I have general groups of cells like excitatory neurons, inhibitory neurons, glia and other vascular cell types. They are also further annotated with cluster-specific markers.

To run milo, would you recommend generating subsets of the data set, i.e. subsetting excitatory neurons, and then doing DA on it? Or would you keep the object as a whole and do DA on it?

MikeDMorgan · 2025-02-27T14:01:41Z

In the first instance I'd recommend doing on the whole data. If you think there are then some interesting DA subtypes of cells then you can do a deep dive/sub-analysis on those, noting the caveat of smaller cell numbers so the values of k, d and prop will likely need to be adapted accordingly. I would also strongly recommend re-computing the reduced dimensional space on each subtype separately rather than using the global reduced dims as you appear to have done here.

geena-wai · 2025-02-27T14:32:54Z

Okay I'll try doing it on the whole object first. I did recluster my subsetted cell type and extracted the reduced dims from there before running milo but I guess the small cell number really is a limiting factor here.

Thank you very much for your insight! I really appreciate the feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different DA results with the same K and P values #361

Different DA results with the same K and P values #361

geena-wai commented Feb 27, 2025 •

edited

Loading

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

Different DA results with the same K and P values #361

Different DA results with the same K and P values #361

Comments

geena-wai commented Feb 27, 2025 • edited Loading

-------------------------------------------------------------------------------------------------------------------------------------------

construct KNN graph

For sampling purposes

Count the cells in each neighborhood

Building neighbourhood graph

Setting up contrasts for differential abundance testing

Test for differential abundance

-------------------------------------------------------------------------------------------------------------------------------------------

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

MikeDMorgan commented Feb 27, 2025

geena-wai commented Feb 27, 2025

geena-wai commented Feb 27, 2025 •

edited

Loading