Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concern regarding method use for Xenium data #203

Open
LPotter21 opened this issue Dec 11, 2024 · 2 comments
Open

Concern regarding method use for Xenium data #203

LPotter21 opened this issue Dec 11, 2024 · 2 comments

Comments

@LPotter21
Copy link

Hello all,

We have recently begun working with 10x Xenium data, and have been comparing normalization methods for our pipeline. We have noticed oddities in how SCTransform behaves for the data in comparison to traditional scRNA-seq data. These data make us doubt the appropriateness of SCTransform for Xenium data, so we wanted to reach out to see your opinion.

The SCTransform adds a few columns to the Seurat object metadata including nCount_SCT. According to our understanding, nCount_SCT represents the total "normalized counts" for each cell, and contrasts nicely with the raw counts (nCount_RNA for scRNA-seq, and nCount_Xenium for 10x Xenium).

Plotting the raw counts (nCount_RNA) vs the nCount_SCT allows for a high-level comparison of how the model transformed the counts across cells.

  • Using scRNA-seq data from your vignette (replicated as well using our own scRNA-seq experiments) yields a pattern similar to this:

scRNA_SCTvsRawCounts

  • However, using Xenium data, also from your vignette, we see a stratified set of distributions

Screen Shot 2024-12-10 at 4 29 29 PM

This issue is even stronger within our own data, with some samples showing more distinct separation within nCount_SCT.

When you look into spatial plotting, you can see even more strongly the concern.

Screen Shot 2024-12-10 at 4 29 41 PM

There is a grid-like pattern within the physical image data post-SCTransformation, seemingly associated with the different "strata" in the SCT counts seen above. We see similar and stronger patterns within our own data following the same methodology.

This clearly cannot represent biological variation, given the patterning, and so we hope that you can provide some insight into whether this data is expected, and if so, why?

Lastly, when looking into the counts for specific genes, we saw that 0-count genes were given non-0 values following SCTransform as well. While this makes sense conceptually for scRNA-seq, we are unsure whether such count abundance estimates are appropriate for Xenium, as an image and in-situ hybridization-based technology.

Please let us know your thoughts on this as well. Thank you

@saketkc
Copy link
Contributor

saketkc commented Dec 22, 2024

Hi @LPotter21, thanks for the question. nCount_SCT is the sum total of corrected counts after SCT normalization.
When we calculate the corrected counts, we ‘reverse’ the regression model. In short, with the a) person residuals and b) the regularized model estimates, we reverse the regression model to estimate the per cell per gene counts. When you do this, you also need to tell the model what is the sequencing depth of the cell and since the goal is to obtain these values where the sequencing depth has been accounted for, we use the median UMI as a reasonable estimate of depth (i.e. the corrected counts are calculated per gene per cell assuming all cells have been sequenced to median depth with no constraints on the final corrected sequencing depth). This is a reasonable estimate as we observed higher TPR (controlling for the same FDR) in downstream DE analysis (which we also show in the v2 paper).

The goal post normalization is to compare one gene across cells and not the total counts post normalization. What is your rationale of using nCount_SCT for comparison?

@LPotter21
Copy link
Author

Hi @saketkc, thank you for your reply. I think that your description of nCount_SCT as “the sum total of corrected counts” is exactly why we chose to include an examination of it in post-normalization QC. Just like we examine the distribution of total raw counts across a sample to ensure no technical artifacts impact the quality of the sample, we also believe it is important to confirm that the data normalization method selected does not create technical artifacts either.

Seeing artificial patterns after SCT normalization in the spatial data of multiple Xenium experiments that do not correspond to any biological pattern is very concerning, especially when added to the evidence of several recent papers (here and here) that indicate how normalization methods for sequencing-based transcriptomics do not follow the same assumptions as those of image-based transcriptomics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants