-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different DA results with the same K and P values #361
Comments
Just to be clear, are you trying to run DA testing with just 1617 cells? |
@MikeDMorgan yes just on the 1617 cells, they are the same cell type |
I would be very surprised that with just 1617 cells on the same cell type you would find much, especially with 3 replicates for each of 3 conditions. The reason for the variability is that the nhood refinement and graph building both have a stochastic element, so you need to use set.seed to get consistent results. |
I see. Thanks for your explanation! My snRNA data is on the brain. I have general groups of cells like excitatory neurons, inhibitory neurons, glia and other vascular cell types. They are also further annotated with cluster-specific markers. To run milo, would you recommend generating subsets of the data set, i.e. subsetting excitatory neurons, and then doing DA on it? Or would you keep the object as a whole and do DA on it? |
In the first instance I'd recommend doing on the whole data. If you think there are then some interesting DA subtypes of cells then you can do a deep dive/sub-analysis on those, noting the caveat of smaller cell numbers so the values of k, d and prop will likely need to be adapted accordingly. I would also strongly recommend re-computing the reduced dimensional space on each subtype separately rather than using the global reduced dims as you appear to have done here. |
Okay I'll try doing it on the whole object first. I did recluster my subsetted cell type and extracted the reduced dims from there before running milo but I guess the small cell number really is a limiting factor here. Thank you very much for your insight! I really appreciate the feedback |
Hello,
I'm trying to do DA on a subset of cells from my data.
My original data set contained almost 50K cells and the subsetted object contains 1617 cells.
I have 3 conditions in my object, control, one injury, two injury with 3 rats contributing to each condition respectively so 9 samples in total.
I originally processed my data in Seurat, so the input for the analysis was converted from a seurat object, then to singlecellexperimentobject, then to milo object.
I have 9 samples in total so the k and prop parameters so the optimal NhoodSize histogram would be around 45 (or between 50 - 100 according to tutorials). After optimization, I decided that these are the best parameters for DA analysis:
k = 15, prop = 0.1, d = 30
One of my concerns is that when I build my milo object with these parameters and do DA analysis on it, different runs yield different results even though the k, prop and d are not changed. So for example, when I build the milo object and do DA in run 1 with k = 15, prop = 0.1, d = 30, I get 2 neighborhoods that have a significant SpatialFDR value smaller than 0.1. And then in run 2, I build the milo object again and do DA with the same parameters, k = 15, prop = 0.1, d = 30, I get 8 neighborhoods with a significant SpatialFDR value smaller than 0.1. And then in run 3, I do the exact same thing, but this time, I have no significant DA neighborhoods.
I have no idea which DA result to trust. If I choose the run with the most DA results, it feels like cherry-picking. But I also feel like I might miss out on something if I choose the run with no DA results. So what should I do? Should there be any variation in DA results from different runs?
I'm using miloR 2.2.0 so here is my code. I'm using R but in a linux shell on a high performance cluster if that's relevant.
-------------------------------------------------------------------------------------------------------------------------------------------
library(miloR)
library(SingleCellExperiment)
library(scater)
library(scran)
library(dplyr)
library(patchwork)
library(scuttle)
library(glue)
construct KNN graph
milo <- buildGraph(milo, k=15, d=30, reduced.dim = "PCA")
For sampling purposes
milo <- makeNhoods(milo, prop=0.1, k=15, d=30, refined = TRUE, reduced_dims = "PCA")
Count the cells in each neighborhood
milo <- countCells(milo, meta.data = data.frame(colData(milo)), sample="sample")
Building neighbourhood graph
milo <- buildNhoodGraph(milo)
Setting up contrasts for differential abundance testing
milo_design <- data.frame(colData(milo))[,c("sample", "condition")]
milo_design <- distinct(milo_design)
rownames(milo_design) <- milo_design$sample
Test for differential abundance
da_results <- testNhoods(milo, design = ~ 0 + condition, design.df = milo_design, model.contrasts = "conditionmTBI2x - conditionSham")
da_results %>% arrange(SpatialFDR) %>% head()
-------------------------------------------------------------------------------------------------------------------------------------------
This is the sessionInfo()
Matrix products: default
BLAS/LAPACK: /data1/gwai/miniconda3/envs/Milo_ENV/lib/libopenblasp-r0.3.28.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Asia/Hong_Kong
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] glue_1.8.0 patchwork_1.3.0
[3] dplyr_1.1.4 scran_1.34.0
[5] scater_1.34.0 ggplot2_3.5.1
[7] scuttle_1.16.0 SingleCellExperiment_1.28.0
[9] SummarizedExperiment_1.36.0 Biobase_2.66.0
[11] GenomicRanges_1.58.0 GenomeInfoDb_1.42.0
[13] IRanges_2.40.0 S4Vectors_0.44.0
[15] BiocGenerics_0.52.0 MatrixGenerics_1.18.0
[17] matrixStats_1.5.0 miloR_2.2.0
[19] edgeR_4.4.0 limma_3.62.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2 vipor_0.4.7
[4] farver_2.1.2 viridis_0.6.5 ggraph_2.2.1
[7] fastmap_1.2.0 pracma_2.4.4 bluster_1.16.0
[10] tweenr_2.0.3 rsvd_1.0.5 lifecycle_1.0.4
[13] cluster_2.1.8 statmod_1.5.0 magrittr_2.0.3
[16] compiler_4.4.2 rlang_1.1.5 tools_4.4.2
[19] igraph_2.0.3 labeling_0.4.3 dqrng_0.3.2
[22] S4Arrays_1.6.0 graphlayouts_1.2.2 DelayedArray_0.32.0
[25] RColorBrewer_1.1-3 abind_1.4-5 BiocParallel_1.40.0
[28] withr_3.0.2 purrr_1.0.2 numDeriv_2016.8-1.1
[31] grid_4.4.2 polyclip_1.10-7 beachmat_2.22.0
[34] colorspace_2.1-1 scales_1.3.0 gtools_3.9.5
[37] MASS_7.3-64 cli_3.6.3 crayon_1.5.3
[40] generics_0.1.3 metapod_1.14.0 httr_1.4.7
[43] ggbeeswarm_0.7.2 cachem_1.1.0 ggforce_0.4.2
[46] stringr_1.5.1 splines_4.4.2 zlibbioc_1.52.0
[49] parallel_4.4.2 XVector_0.46.0 vctrs_0.6.5
[52] Matrix_1.7-2 jsonlite_1.8.9 BiocSingular_1.22.0
[55] BiocNeighbors_2.0.0 ggrepel_0.9.6 irlba_2.3.5.1
[58] beeswarm_0.4.0 locfit_1.5-9.11 tidyr_1.3.1
[61] codetools_0.2-20 cowplot_1.1.3 stringi_1.8.4
[64] gtable_0.3.6 UCSC.utils_1.2.0 ScaledMatrix_1.14.0
[67] munsell_0.5.1 tibble_3.2.1 pillar_1.10.1
[70] GenomeInfoDbData_1.2.13 R6_2.5.1 tidygraph_1.3.0
[73] lattice_0.22-6 memoise_2.0.1 Rcpp_1.0.14
[76] gridExtra_2.3 SparseArray_1.6.0 pkgconfig_2.0.3
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1
The text was updated successfully, but these errors were encountered: