Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different DA results with the same K and P values #361

Open
geena-wai opened this issue Feb 27, 2025 · 6 comments
Open

Different DA results with the same K and P values #361

geena-wai opened this issue Feb 27, 2025 · 6 comments

Comments

@geena-wai
Copy link

geena-wai commented Feb 27, 2025

Hello,

I'm trying to do DA on a subset of cells from my data.
My original data set contained almost 50K cells and the subsetted object contains 1617 cells.
I have 3 conditions in my object, control, one injury, two injury with 3 rats contributing to each condition respectively so 9 samples in total.
I originally processed my data in Seurat, so the input for the analysis was converted from a seurat object, then to singlecellexperimentobject, then to milo object.
I have 9 samples in total so the k and prop parameters so the optimal NhoodSize histogram would be around 45 (or between 50 - 100 according to tutorials). After optimization, I decided that these are the best parameters for DA analysis:
k = 15, prop = 0.1, d = 30

One of my concerns is that when I build my milo object with these parameters and do DA analysis on it, different runs yield different results even though the k, prop and d are not changed. So for example, when I build the milo object and do DA in run 1 with k = 15, prop = 0.1, d = 30, I get 2 neighborhoods that have a significant SpatialFDR value smaller than 0.1. And then in run 2, I build the milo object again and do DA with the same parameters, k = 15, prop = 0.1, d = 30, I get 8 neighborhoods with a significant SpatialFDR value smaller than 0.1. And then in run 3, I do the exact same thing, but this time, I have no significant DA neighborhoods.

I have no idea which DA result to trust. If I choose the run with the most DA results, it feels like cherry-picking. But I also feel like I might miss out on something if I choose the run with no DA results. So what should I do? Should there be any variation in DA results from different runs?

I'm using miloR 2.2.0 so here is my code. I'm using R but in a linux shell on a high performance cluster if that's relevant.

-------------------------------------------------------------------------------------------------------------------------------------------

library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)
library(scuttle)
library(glue)

construct KNN graph

milo <- buildGraph(milo, k=15, d=30, reduced.dim = "PCA")

For sampling purposes

# d in makeNhoods should be the same as d in buildGraph

milo <- makeNhoods(milo, prop=0.1, k=15, d=30, refined = TRUE, reduced_dims = "PCA")

Count the cells in each neighborhood

# sample should be the Rat number here -- Sham, Sham4, Sham6... etc

milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample="sample")

Building neighbourhood graph

milo <- buildNhoodGraph(milo)

Setting up contrasts for differential abundance testing

milo_design <- data.frame(colData(milo))[,c("sample", "condition")]
milo_design <- distinct(milo_design)
rownames(milo_design) <- milo_design$sample

Test for differential abundance

da_results <- testNhoods(milo, design = ~ 0 + condition, design.df = milo_design, model.contrasts = "conditionmTBI2x - conditionSham")
da_results %>% arrange(SpatialFDR) %>% head()

-------------------------------------------------------------------------------------------------------------------------------------------

This is the sessionInfo()

sessionInfo()
Failed to query server: Operation not permitted
R version 4.4.2 (2024-10-31)
Platform: x86_64-conda-linux-gnu
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /data1/gwai/miniconda3/envs/Milo_ENV/lib/libopenblasp-r0.3.28.so; LAPACK version 3.12.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Asia/Hong_Kong
tzcode source: system (glibc)

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] glue_1.8.0 patchwork_1.3.0
[3] dplyr_1.1.4 scran_1.34.0
[5] scater_1.34.0 ggplot2_3.5.1
[7] scuttle_1.16.0 SingleCellExperiment_1.28.0
[9] SummarizedExperiment_1.36.0 Biobase_2.66.0
[11] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
[13] IRanges_2.40.0 S4Vectors_0.44.0
[15] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
[17] matrixStats_1.5.0 miloR_2.2.0
[19] edgeR_4.4.0 limma_3.62.1

loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2 vipor_0.4.7
[4] farver_2.1.2 viridis_0.6.5 ggraph_2.2.1
[7] fastmap_1.2.0 pracma_2.4.4 bluster_1.16.0
[10] tweenr_2.0.3 rsvd_1.0.5 lifecycle_1.0.4
[13] cluster_2.1.8 statmod_1.5.0 magrittr_2.0.3
[16] compiler_4.4.2 rlang_1.1.5 tools_4.4.2
[19] igraph_2.0.3 labeling_0.4.3 dqrng_0.3.2
[22] S4Arrays_1.6.0 graphlayouts_1.2.2 DelayedArray_0.32.0
[25] RColorBrewer_1.1-3 abind_1.4-5 BiocParallel_1.40.0
[28] withr_3.0.2 purrr_1.0.2 numDeriv_2016.8-1.1
[31] grid_4.4.2 polyclip_1.10-7 beachmat_2.22.0
[34] colorspace_2.1-1 scales_1.3.0 gtools_3.9.5
[37] MASS_7.3-64 cli_3.6.3 crayon_1.5.3
[40] generics_0.1.3 metapod_1.14.0 httr_1.4.7
[43] ggbeeswarm_0.7.2 cachem_1.1.0 ggforce_0.4.2
[46] stringr_1.5.1 splines_4.4.2 zlibbioc_1.52.0
[49] parallel_4.4.2 XVector_0.46.0 vctrs_0.6.5
[52] Matrix_1.7-2 jsonlite_1.8.9 BiocSingular_1.22.0
[55] BiocNeighbors_2.0.0 ggrepel_0.9.6 irlba_2.3.5.1
[58] beeswarm_0.4.0 locfit_1.5-9.11 tidyr_1.3.1
[61] codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4
[64] gtable_0.3.6 UCSC.utils_1.2.0 ScaledMatrix_1.14.0
[67] munsell_0.5.1 tibble_3.2.1 pillar_1.10.1
[70] GenomeInfoDbData_1.2.13 R6_2.5.1 tidygraph_1.3.0
[73] lattice_0.22-6 memoise_2.0.1 Rcpp_1.0.14
[76] gridExtra_2.3 SparseArray_1.6.0 pkgconfig_2.0.3
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1

@MikeDMorgan
Copy link
Member

Just to be clear, are you trying to run DA testing with just 1617 cells?

@geena-wai
Copy link
Author

@MikeDMorgan yes just on the 1617 cells, they are the same cell type

@MikeDMorgan
Copy link
Member

I would be very surprised that with just 1617 cells on the same cell type you would find much, especially with 3 replicates for each of 3 conditions.

The reason for the variability is that the nhood refinement and graph building both have a stochastic element, so you need to use set.seed to get consistent results.

@geena-wai
Copy link
Author

I see. Thanks for your explanation!

My snRNA data is on the brain. I have general groups of cells like excitatory neurons, inhibitory neurons, glia and other vascular cell types. They are also further annotated with cluster-specific markers.

To run milo, would you recommend generating subsets of the data set, i.e. subsetting excitatory neurons, and then doing DA on it? Or would you keep the object as a whole and do DA on it?

@MikeDMorgan
Copy link
Member

In the first instance I'd recommend doing on the whole data. If you think there are then some interesting DA subtypes of cells then you can do a deep dive/sub-analysis on those, noting the caveat of smaller cell numbers so the values of k, d and prop will likely need to be adapted accordingly. I would also strongly recommend re-computing the reduced dimensional space on each subtype separately rather than using the global reduced dims as you appear to have done here.

@geena-wai
Copy link
Author

Okay I'll try doing it on the whole object first. I did recluster my subsetted cell type and extracted the reduced dims from there before running milo but I guess the small cell number really is a limiting factor here.

Thank you very much for your insight! I really appreciate the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants