Skip to content

2.1 Spatial RNA: Visium

Pingxu0101 edited this page Aug 27, 2024 · 20 revisions

Welcome to the Spatial Transcriptomics Workshop!

In this session, we will focus on Spatial Transcriptomics data generated by the 10X Visium platform.

(The spatial analysis covered in this workshop can also be applied to other types of spatial data and sample types, such as the Curio Brain data.)

Background

10X Visium

10X Visium is a technology developed by 10x Genomics for spatial gene expression profiling. It allows researchers to measure gene expression across tissue sections while preserving the spatial context of the tissue architecture.
The capture area is 6.5 x 6.5 mm. There are a total of 4992 total spots per capture area and each spot is 55 µm in diameter with a 100 µm center to center distance between spots.


For more details about this method, you can refer to Visium Spatial Gene Expression.

DCIS

Ductal Carcinoma In Situ (DCIS), also known as intraductal carcinoma or stage 0 breast cancer, is a non-invasive or pre-invasive breast cancer. This means that while the cells lining the ducts have transformed into cancer cells, they have not yet spread through the duct walls into the surrounding breast tissue. There is a gap in knowledge regarding how different cancer cells are spatially organized and how they interact with the tumor microenvironment (TME).


For more information, you can refer to this DCIS.

Data download

The data for this part (DCIS data from 10X Visium and scRNA data from 10X 3' chemistry) is published by Wei, Runmin, et al.
For more details about this publication, you can refer to Nature Biotechnology (2022).

rds file link

DCIS_SC
DCIS_ST
DCIS_traint
DCIS_celltrek
Processed_DCIS_ST
RCTD_doublet_result
RCTD_full

Example output from Space Ranger

cloupe file
html file

Workshop Goals

  1. Become familiar with the Visium ST data format and the major processing steps, including filtering, dimensionality reduction, clustering, and visualization.
  2. Explore different methods to understand the spatial heterogeneity of cancer cells and the tumor microenvironment (TME) in DCIS.

Main Sections

  1. Quality Control Metrics for ST Data
    • nCount_Spatial
    • nFeature_Spatial
    • log10GenesPerUMI
    • mitoRatio
    • Filtering out low-quality spots
  2. Data Processing and Visualization
    • Spatial coordinates
    • H&E image
    • Spatially Visualize nFeature_Spatial & nCount_Spatial
    • Dimensionality reduction and Clustering
    • Find marker genes
    • Visualization
  3. Deconvolution
    • Data preparation
    • "doublet" mode in RCTD
    • "full" mode in RCTD
  4. Seurat Label Transfer
  5. CellTrek Analysis
    • Single cell projection
    • Spatial proximity analysis

Activate your Conda environment named 'course4.3' on Linux

conda activate course4.3
which R

Open a new terminal window and open RStudio on Linux

conda deactivate
export RSTUDIO_WHICH_R= your_course4.3_R_directory
open -na rstudio

R code

# Set the library path
# The required packages have already been installed. 
.libPaths("~/VisiumLibrary")
# Specify the working directory
your_working_dir <- "/Users/admin/Desktop/CSHL_2024/ST_VISIUM/your_working_folder/"
# Set Working Directory
setwd(your_working_dir)
# Load necessary packages 
suppressPackageStartupMessages({
    library(hdf5r)
    library(Seurat)
    library(dplyr)
    library(RColorBrewer)
    library(CellTrek)
    library(ggplot2)
    library(spacexr)
    library(paletteer)
})
# Check Seurat version
packageVersion("Seurat")
# Path to the preprocessed single-cell RDS file with cell type annotations from CellTrek publication
DCIS_SC_Path <- "/Users/admin/Desktop/CSHL_2024/ST_VISIUM/DCIS_SC.rds"

# Path to raw ST data generated from 10X Space Ranger output
DCIS_ST_Path <- "/Users/admin/Desktop/CSHL_2024/ST_VISIUM/DCIS_ST.rds"

# Path to CellTrek output (intermediate file saved for visualization, as processing takes a long time)
DCIS_traint_Path <- "/Users/admin/Desktop/CSHL_2024/ST_VISIUM/DCIS_traint.rds"

# Path to another CellTrek output (intermediate file saved for visualization, as processing takes a long time)
DCIS_celltrek_Path <- "/Users/admin/Desktop/CSHL_2024/ST_VISIUM/DCIS_celltrek.rds"

# Path to the processed ST file
Processed_DCIS_ST_Path <- '/Users/admin/Desktop/CSHL_2024/ST_VISIUM/Processed_DCIS_ST.rds'

# Path to the RCTD output with "doublet" mode
RCTD_Path <- '/Users/admin/Desktop/CSHL_2024/ST_VISIUM/RCTD.rds'

# Path to the RCTD output with "full" mode
RCTD_full_Path <- '/Users/admin/Desktop/CSHL_2024/ST_VISIUM/RCTD_full.rds'
# Load data
# This is the scRNA-seq data (which has been processed and annotated)
# DCIS_SC_Path and DCIS_ST_Path are obsolete paths for.rds files.
DCIS_SC <- readRDS(file = DCIS_SC_Path) 
# This is the ST data
DCIS_ST <- readRDS(file = DCIS_ST_Path )
# Check the object details
DCIS_SC
Screenshot 2024-06-19 at 3 59 35 PM
# Check the object details
DCIS_ST
Screenshot 2024-06-19 at 4 00 56 PM
# Check the metadata of the scRNA-seq object
head(DCIS_SC@meta.data)
Screenshot 2024-06-19 at 4 03 24 PM
# Check the metadata of the ST object
head(DCIS_ST@meta.data)
Screenshot 2024-06-19 at 4 05 16 PM


Three columns are shown in the metadata of the ST object.

  • orig.ident: this often contains the sample identity.
  • nCount_Spatial: number of UMIs per spot.
  • nFeature_Spatial: number of genes detected per spot (~55um).

UMIs, or Unique Molecular Identifiers, are short sequences of random nucleotides that are used in sequencing experiments to label individual molecules. They help in reducing biases and errors during the amplification process by allowing for the identification and removal of duplicates. This ensures a more accurate quantification of the original molecules present in a sample.

Quality Control Metrics for ST Data


We need to calculate some additional metrics for quality control (QC) checks.

  • number of genes detected per UMI: this metric with give us an idea of the complexity of our dataset. (The more genes detected per UMI, the more complex our data is.)
  • mitochondrial ratio: this metric will give us a percentage of reads originating from the mitochondrial genes.
# Add number of genes per UMI for each spot to metadata
DCIS_ST$log10GenesPerUMI <- log10(DCIS_ST$nFeature_Spatial) / log10(DCIS_ST$nCount_Spatial)
# Compute percent mito ratio
DCIS_ST$mitoRatio <- PercentageFeatureSet(object = DCIS_ST, pattern = "^MT-")
# Since we want the ratio value for plotting, we will reverse that step by then dividing by 100.
DCIS_ST$mitoRatio <- DCIS_ST@meta.data$mitoRatio / 100

nCount_Spatial

# Visualize the number of transcripts per spot (~55um).
DCIS_ST@meta.data %>% 
  ggplot(aes( x=nCount_Spatial, fill= 'orig.ident')) + 
  geom_density(alpha = 0.2) + 
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 200)+NoLegend()

Note:
This block of code creates a density plot using ggplot2.
It uses data from the data frame, with nCount_Spatial on the x-axis and orig.ident for fill color.
The plot includes a density layer, a logarithmic scale for the x-axis, classic theme, y-axis label, a vertical line at x = 200, and removes the legend.

# If you want to save the plot to your directory 
pdf('nCount_Spatial_plot.pdf', height = 5, width = 5)
p <- DCIS_ST@meta.data %>% 
  ggplot(aes( x=nCount_Spatial, fill= 'orig.ident')) + 
  geom_density(alpha = 0.2) + 
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 200)+NoLegend()
print(p)
dev.off()

Note:
pdf('nCount_Spatial_plot.pdf', height = 5, width = 5): This line opens a new PDF device for plotting.
The plot will be saved as "nCount_Spatial_plot.pdf" with a height and width of 5 inches each.
The units for height and width are in inches by default in R's graphics devices.
dev.off(): This function closes the current graphics device.
In this case, it closes the PDF device that was opened with pdf().
This finalizes and saves the plot to the specified PDF file.
If dev.off() is not called, the plot will not be saved properly, and the PDF file might be incomplete or corrupted.

nFeature_Spatial

# Visualize the distribution of genes detected per spot (~55um).
DCIS_ST@meta.data  %>% 
  ggplot(aes( x=nFeature_Spatial, fill= 'orig.ident')) + 
  geom_density(alpha = 0.2) + 
  scale_x_log10() + 
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 200)+NoLegend()

log10GenesPerUMI

# Visualize the overall complexity of the gene expression by visualizing the genes detected per UMI
DCIS_ST@meta.data %>%
  ggplot(aes(x=log10GenesPerUMI, fill= 'orig.ident')) +
  geom_density(alpha = 0.2) +
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 0.80)+NoLegend()

mitoRatio

# Visualize the Mitochondrial counts ratio
DCIS_ST@meta.data %>%
  ggplot(aes(x=mitoRatio, fill= 'orig.ident')) +
  geom_density(alpha = 0.2) +
  theme_classic() +
  ylab("Cell density") +
  geom_vline(xintercept = 0.15)+NoLegend()

Filtering out low-quality spots

Now that we have visualized the various metrics, we can set thresholds to filter out low-quality spots. It's important to note that different tissues or experimental settings may require different quality control (QC) thresholds. Feel free to try different thresholds. For this analysis, we will apply the following thresholds:
- nCount_Spatial > 200
- nFeature_Spatial > 200
- log10GenesPerUMI > 0.8
- mitoRatio < 0.15

# Filter out low quality reads using selected thresholds
DCIS_ST <- subset(x = DCIS_ST, 
                         subset= (nCount_Spatial >= 200) & 
                           (nFeature_Spatial >= 200) & 
                           (log10GenesPerUMI > 0.80) & 
                           (mitoRatio < 0.15))

# Check the updated data		
# After removing 34 spots, 1533 spots remain.
DCIS_ST
Screenshot 2024-06-19 at 4 25 30 PM

Data Processing and Visualization

Spatial Coordinates

# Check the X y information in the ST object
head(DCIS_ST@images$slice1@coordinates)
Screenshot 2024-06-19 at 4 26 38 PM

Each spot on the Visium slide captures mRNA from the tissue section, and these spots are organized in a grid with row and column indices.

  • row: The row index of the spot within the array.
  • col: The column index of the spot within the array.
  • imagerow: The row coordinate of the spot in the image space.
  • imagecol: The column coordinate of the spot in the image space.
# Plot the X y information in the ST object
plot1 <- ggplot(DCIS_ST@images$slice1@coordinates, aes(x = col, y = row)) +
  geom_point(size= 0.5) +
  labs(title = "Spatial Spot Index") +
  theme_minimal() +
  coord_flip()
plot2 <- ggplot(DCIS_ST@images$slice1@coordinates, aes(x = imagecol, y = imagerow)) +
  geom_point(size= 0.5) +
  labs(title = "Image Coordinate") +
  theme_minimal() + 
  scale_y_reverse()
# Print the plot
plot1 | plot2

H&E image

This H&E image allows us to better understand the histological features of the tissue.

# Plot the H&E image
SpatialFeaturePlot(DCIS_ST, features = NULL, pt.size.factor = 0, crop = F)+ NoLegend()

Spatially Visualize nFeature_Spatial & nCount_Spatial

The nFeature_Spatial and nCount_Spatial metrics are influenced by the spot density across the tissue.
We can compare these parameters with the H&E image above.

# Visualize nCount_Spatial and nFeature_Spatial
SpatialFeaturePlot(DCIS_ST, 
                   features = c("nCount_Spatial",
                                "nFeature_Spatial"),
                   pt.size.factor = 1,
                   crop = F) 

Dimensionality reduction and Clustering

# Normalization and runPCA
DCIS_ST <- NormalizeData(DCIS_ST, assay='Spatial') %>% 
  FindVariableFeatures() %>% 
  ScaleData(features=rownames(DCIS_ST))

DCIS_ST <- RunPCA(DCIS_ST, features=VariableFeatures(DCIS_ST),
                  verbose = F)

ElbowPlot(DCIS_ST, ndims = 50) 

The Elbow Plot helps to visualize the standard deviation of each principal component (PC). The 'elbow' point (where the curve bends) indicates the number of PCs that capture the most variation in the data. Based on the Elbow Plot, we choose the number of PCs to use for downstream analysis. In this case, we use the first 10 PCs.

# RunUMAP
DCIS_ST <- FindNeighbors(DCIS_ST, reduction='pca', dims = 1:10)
DCIS_ST <- FindClusters(DCIS_ST, resolution = 0.5)
DCIS_ST <- RunUMAP(DCIS_ST, dims = 1:10)
# We can then visualize the results of the clustering in UMAP space. 
plot1 <- DimPlot(DCIS_ST, label = T,reduction = "umap", 
              cols=colorRampPalette(brewer.pal(12,"Paired"))(7))
plot1
# We can then visualize the results of the clustering in the spatial coordinate space. 
plot2 <- SpatialDimPlot(DCIS_ST,label=T, label.size = 3, pt.size.factor = 1, crop = F) +
  scale_fill_manual(values = colorRampPalette(brewer.pal(12,"Paired"))(7))
# Print the plot
plot2

Find marker genes

DCIS_ST_markers <- FindAllMarkers(DCIS_ST, only.pos = T)
DCIS_ST_top10 <- DCIS_ST_markers %>% 
  group_by(cluster) %>% 
  top_n(n=10,wt=avg_log2FC)
# check the DCIS_ST_top10
DCIS_ST_top10
Screenshot 2024-06-19 at 4 49 59 PM
# Heatmap visualize
plot1 <- DoHeatmap(DCIS_ST,
                   features = DCIS_ST_top10$gene, 
                   group.colors = colorRampPalette(brewer.pal(12,"Paired"))(9))
plot1
# Dot plot visualization
DCIS_ST_top5 <- DCIS_ST_markers %>% 
  group_by(cluster) %>% 
  top_n(n=5,wt=avg_log2FC)
# cols = "RdBu" sets the color scheme to a red-to-blue gradient
DotPlot(object = DCIS_ST, 
        features = unique(DCIS_ST_top5$gene),
        col.min = -2,
        col.max = 1,
        cols = "RdBu") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot2
image
# Check marker gene for Cluster 2
# Clutser 2 expressed Plasma B cell (IGLC2, IGLC3) and Fibroblast cell (COL1A1, POSTN) markers. 
DCIS_ST_top10[DCIS_ST_top10$cluster == '2',]$gene
Screenshot 2024-06-19 at 4 56 22 PM
# Check marker gene for Cluster 3
# Clutser 3 expressed Adipocyte (ADIPOQ, LEP, G0S2, LPL, PLIN1) markers. 
DCIS_ST_top10[DCIS_ST_top10$cluster == '3',]$gene
Screenshot 2024-06-19 at 4 54 45 PM
# Check gene expression spatially
SpatialFeaturePlot(object = DCIS_ST, 
                   features = c("COL1A1", "PLIN1"),
                   pt.size.factor = 1,
                   crop = F,
                   ncol = 2)
image

Visualization

Visualize Spatial cluster

plot1 <- SpatialDimPlot(DCIS_ST, pt.size.factor = 1, 
               cells.highlight = WhichCells(DCIS_ST, idents =c( '2')),
                cols.highlight = c("#00F5FF","grey50"),
               crop = F)+ggtitle('Cluster 2 Highlighted' )+NoLegend()
plot2 <-SpatialDimPlot(DCIS_ST, pt.size.factor = 1, 
               cells.highlight = WhichCells(DCIS_ST, idents =c( '3')),
                  cols.highlight = c("#C0FF3E","grey50"),
               crop = F)+ggtitle('Cluster 3 Highlighted' )+NoLegend()
plot1|plot2

Visualize nCount_Spatial and nFeature_Spatial in each cluster

plot1 <- VlnPlot(DCIS_ST, 
                 features = c("nCount_Spatial", "nFeature_Spatial"),
                 pt.size = 0, 
        cols = colorRampPalette(brewer.pal(12,"Paired"))(7))
plot1

Generally speaking, immune cells tend to have lower nCount and nFeature values compared to other cell types. This is because immune cells, such as T cells and B cells, often have fewer detected transcripts and genes. In contrast, cancer cells usually have higher nCount and nFeature values, reflecting their increased transcriptional activity.

# FindSpatiallyVariableFeatures
DCIS_ST_V2 <- FindSpatiallyVariableFeatures(DCIS_ST, 
                                            assay = "Spatial",
                                            features = VariableFeatures(DCIS_ST)[1:1000],
                                            selection.method = "moransi")


# Access the meta.features data frame
meta_features <- DCIS_ST_V2[["Spatial"]]@meta.features
meta_features <- meta_features[!is.na(meta_features$moransi.spatially.variable.rank),]
# Arrange features based on moransi.spatially.variable.rank
ranked_features <- meta_features %>%
  arrange(moransi.spatially.variable.rank)
head(ranked_features)
image
# Visualize the expression of the top 6 features identified by this measure.
head(ranked_features, 6)
SpatialFeaturePlot(DCIS_ST, 
                   features = rownames(ranked_features)[1:6],
                   pt.size.factor = 1,
                   crop = F,
                   ncol = 3)
image

Check scRNA-seq object

Annotation

This dataset has already been preprocessed and includes annotation information. Based on CopyKAT inference, three subclones were identified, each exhibiting distinct CNVs (copy number variations). CopyKAT is a method to infer the genomic copy number and subclonal structure from single-cell RNA sequencing (scRNA-seq) or sequence based spatial transcriptomic data. This heatmap is from Figure 4a of the original publication. For more details, please visit the CellTrek publication. For more information about CopyKAT, visit the CopyKAT.

Subclone_heatmap
# Check scRNA data UMAP
plot1 <- DimPlot(DCIS_SC, 
                 label = T,
                 repel = T,
                 cols = colorRampPalette(brewer.pal(12,"Paired"))(8))
plot1

Deconvolution

Introduction

For spatial data analysis, understanding the spatial distribution of cells is crucial. In the context of 10X Visium data, each spatial transcriptomics (ST) spot (55um) can contain multiple cells (approximately 1-15 cells, depending on the tissue density). This complexity implies that marker genes for each cluster may originate from different cell types or states, making it challenging to directly annotate cell types or states from each cluster.
Deconvolution in the context of spatial transcriptomics (ST) refers to the process of computationally inferring the cellular composition within each spatially resolved spot or region on a spatial transcriptomics slide. Because each spot in ST data typically contains transcripts from multiple cell types, deconvolution is an option to unravel the contribution of different cell types to the observed gene expression profiles.

Why Deconvolution is Important

Heterogeneity: Spatial spots often capture mixed populations of cells, and deconvolution helps to disentangle this heterogeneity.

  • Cell Type Identification: It allows researchers to identify which cell types are present in each spot and their relative proportions.
  • Biological Insights: By understanding the spatial distribution of cell types, researchers can gain insights into tissue organization, microenvironments, and cellular interactions.

Key Steps in Deconvolution

Reference Profile Generation: Create or obtain single-cell RNA-seq (scRNA-seq) reference profiles that represent the pure expression profiles of different cell types.

  • Spot Profile Analysis: Analyze the gene expression profiles of each spatial spot in the ST dataset.
  • Computational Deconvolution: Use algorithms to infer the proportions of different cell types in each spot based on the reference profiles and the spot profiles.

Methods for Deconvolution

Several computational methods and tools are available for deconvolution of ST data. In the following tutorial, we will use RCTD (Robust Cell Type Decomposition).

  • For more information about this method, visit the RCTD.

This are other methods for Deconvolution:SPOTlight; SpatialDDLS; cell2location.

Data preparation

# Check the identities of the cells and the number of cells for each identity
cell_counts <- table(Idents(DCIS_SC))
cell_counts
Screenshot 2024-06-19 at 5 11 15 PM
# Subset the reference dataset to include only cell types with at least 25 cells
valid_idents <- names(cell_counts[cell_counts >= 25])
reference_SC <- subset(DCIS_SC, idents = valid_idents)
# Verify the cell counts in the subsetted reference_SC
table(Idents(reference_SC))
Screenshot 2024-06-19 at 5 11 54 PM
# Extract information from scRNA-seq dataset
counts <- reference_SC@assays$RNA@counts
cluster <- as.factor(reference_SC$cell_type)
levels(cluster) <- gsub("/", "_", levels(cluster))
# Drop unused levels from a factor 
cluster <- droplevels(cluster)
nUMI <- reference_SC$nCount_RNA
# Create the RCTD reference object
reference <- Reference(counts, cluster, nUMI)
counts_ST <- DCIS_ST@assays$Spatial@counts
# Check counts_ST
counts_ST[1:5,1:3]
Screenshot 2024-06-19 at 5 15 36 PM
coords <- DCIS_ST@images$slice1@coordinates[,c( 'row','col')]
colnames(coords) <- c("x", "y")
# Check coords
head(coords)
Screenshot 2024-06-19 at 5 16 35 PM

"doublet" mode in RCTD

# Set up query with the RCTD function SpatialRNA
query <- SpatialRNA(coords, counts_ST, colSums(counts_ST))
# Multiple cores can be specified for faster performance
RCTD <- create.RCTD(query, reference, max_cores = 20)
# If in doublet mode, fits at most two cell types per pixel.
# It classifies each pixel as 'singlet' or 'doublet' and searches for the cell types on the pixel.
# If in full mode, can fit any number of cell types on each pixel. 
# In multi mode, cell types are added using a greedy algorithm, up to a fixed number.
RCTD <- run.RCTD(RCTD, doublet_mode = "doublet")
saveRDS(RCTD, file = RCTD_Path)

Option 2: We can load the saved rds file.

RCTD <- readRDS(file = RCTD_Path )
results_df = RCTD@results$results_df
results_df$spot_id = rownames(results_df)
head(results_df)
Screenshot 2024-06-19 at 5 19 15 PM
DCIS_ST$spot_id = rownames(DCIS_ST@meta.data)
DCIS_ST@meta.data = left_join(DCIS_ST@meta.data,results_df, by= 'spot_id' )
rownames(DCIS_ST@meta.data) = DCIS_ST@meta.data$spot_id
# This updated DCIS_ST object is also saved as 'Processed_DCIS_ST.rds' and can be accessed later..
# saveRDS(DCIS_ST, file = '/volumes/USR1/pingxu/CSHL_course_2024_Summer/Spatial_transcriptomics/Processed_DCIS_ST.rds')
# Set color vector
ST_colors =c('#5050FFFF','#00FF00','#E762D7',
            '#D58F5CFF','#FF0000','#5DB1DDFF','#CC33FF' )
names(ST_colors) = c("Clone1","Clone2","Clone3","Fibroblast", "EndoVas","NK_T" , "Myeloid")

# Next, plot the RCTD annotations. 
# Because we ran RCTD in doublet mode, the algorithm assigns a first_type and second_type for each spot.
Plot1 <- SpatialDimPlot(DCIS_ST, group.by = 'first_type', pt.size.factor = 1, crop = F, cols =  ST_colors)
Plot1 
Plot1 <- SpatialDimPlot(DCIS_ST, group.by = "second_type", pt.size.factor = 1, crop = F, cols =  ST_colors)
Plot1 

"full" mode in RCTD

# Set up a query with the RCTD function SpatialRNA, which is similar to the previous section.
query <- SpatialRNA(coords, counts_ST, colSums(counts_ST))
RCTD_full <- create.RCTD(query, reference, max_cores = 20)
# Run full mode
RCTD_full <- run.RCTD(RCTD_full, doublet_mode = "full")
saveRDS(RCTD_full, file = RCTD_full_Path )

Option 2: We can load the saved rds file.

RCTD_full<- readRDS(file = RCTD_full_Path)
# The results of RCTD full mode are stored in @results$weights. 
# Next, we normalize the weights using normalize_weights so that they sum to one.
# Each entry represents the estimated proportion of each cell type on each Visium spot
weights <- RCTD_full@results$weights
norm_weights <- data.frame(normalize_weights(weights))
norm_weights$spot_id = rownames(norm_weights)
head(norm_weights)
Screenshot 2024-06-19 at 5 23 29 PM

Add the output into the DCIS_ST object.

DCIS_ST@meta.data = left_join(DCIS_ST@meta.data,norm_weights, by= 'spot_id')
rownames(DCIS_ST@meta.data) = DCIS_ST@meta.data$spot_id
# Save intermediate file for future use.
saveRDS(DCIS_ST, file = Processed_DCIS_ST_Path)
# Visualize the result from RCTD full mode
SpatialFeaturePlot(DCIS_ST, 
                   features = c("Clone1",
                                "Clone2",
                                'Clone3'),
                   pt.size.factor = 1,
                   crop = F) 

Seurat label transfer

Seurat's label transfer is a powerful method for integrating and annotating single-cell datasets. This technique enables the transfer of cell type labels from a well-annotated reference dataset to a new query dataset, facilitating accurate identification and characterization of cell types in the query dataset.

Here’s an overview of the process and its utility:

  • Reference Dataset Preparation: A reference dataset with known cell type labels is prepared. This dataset should be well-annotated and contain diverse cell types relevant to the query dataset.

  • Identification of Anchors: The FindTransferAnchors function in Seurat identifies pairs of cells between the reference and query datasets that are similar to each other. These pairs are referred to as "anchors."


This step uses canonical correlation analysis (CCA) or mutual nearest neighbors (MNN) to align the datasets and find the best matches.

  • Transfer of Labels: The TransferData function uses these anchors to project the cell type labels from the reference dataset onto the query dataset. The function computes a score for each cell in the query dataset, indicating the likelihood of it belonging to each cell type in the reference dataset.

  • Visualization and Analysis: The transferred labels can be visualized and further analyzed.
    For more information about Seurat label transfer, visit the Seurat label transfer.

# Remove the added metadata from RCTD
DCIS_ST<- readRDS(file = Processed_DCIS_ST_Path)
DCIS_ST@meta.data = DCIS_ST@meta.data [,1:8]
rownames(DCIS_ST@meta.data) = DCIS_ST@meta.data$spot_id
# Perform label transfer
DCIS_anchors <- FindTransferAnchors(reference = DCIS_SC, query = DCIS_ST)
# Transfer data and assign predictions
DCIS_predictions.assay <- TransferData(anchorset = DCIS_anchors, 
                                       refdata = DCIS_SC$cell_type, 
                                       prediction.assay = TRUE,
                                       weight.reduction = DCIS_ST[["pca"]], 
                                       dims = 1:30)
DCIS_ST[["predictions"]] <- DCIS_predictions.assay
DefaultAssay(DCIS_ST) <- "predictions"
# Check the updated object
DCIS_ST@assays$predictions@data[,1:3]
Screenshot 2024-06-19 at 5 27 48 PM

Columns: Each column represents a single cell in your dataset. The column names (e.g., AAACAAGTATCTCCCA-1) are cell barcodes, which uniquely identify each cell.
Rows: Each row represents a different predicted cell type. The row names (e.g., NK/T, Clone2, Clone1, etc.) are the predicted cell types. The max row contains the highest prediction score for each cell across all cell types.
Values: Each value in the matrix is a probability or score indicating how likely it is that a particular cell belongs to a specific cell type.
For example, in the cell AAACAAGTATCTCCCA-1, the probability of being Clone1 is 0.69521410, and this is the highest probability for this cell, as indicated by the max row.

# Visualize cancer cells spatially
plot1 <- SpatialFeaturePlot(DCIS_ST, features = c("Clone1", "Clone2", "Clone3"), 
                         pt.size.factor = 1, ncol = 3, crop = FALSE)
plot1
## Then we can visualize other cell types spatially.
plot1 <- SpatialFeaturePlot(DCIS_ST, features = c("Fibroblast", "EndoVas"), 
                   pt.size.factor = 1, ncol = 2, crop = F)
plot1

CellTrek analysis

CellTrek, developed in Dr. Navin's lab, is a computational tool designed to spatially resolve single-cell RNA sequencing (scRNA-seq) data. It enhances the spatial mapping of scRNA-seq data, providing a more detailed understanding of cellular organization within tissues. It allows researchers to map scRNA-seq data onto spatial transcriptomics data, thereby providing a more detailed and spatially informed understanding of cellular organization within tissues. By leveraging spatial and transcriptomic information, CellTrek enhances the ability to study tissue architecture, cell-cell interactions, and the spatial distribution of different cell types within a given tissue sample. This integration of spatial and single-cell data helps in uncovering new biological insights that are not apparent from either data type alone.

Single cell projection

Major steps for Single cell projection analysis:

Step 1: We first co-embed scRNA-seq and ST data. (Output for step 1 was saved as 'DCIS_traint.rds'). Co-embedding is a technique commonly used in single-cell data analysis to integrate multiple datasets into a shared low-dimensional space. This approach allows researchers to compare and contrast different datasets by visualizing them together.

Step 2: After co-embedding the two datasets, we can chart each single cell to the spatial positions by running the main function of CellTrek. Since this step takes quite a long time (~35 mins) to run on a regular PC, you can just load the data which we have already run. (Output for step 2 was saved as 'DCIS_celltrek.rds').
For more information about the toolkit, visit the CellTrek GitHub.

# Step 1
DCIS_traint <- CellTrek::traint(st_data=DCIS_ST, sc_data=DCIS_SC, 
                                sc_assay='RNA', cell_names='cell_type')
saveRDS(DCIS_traint, file = DCIS_traint_Path)

# Step 2
DCIS_celltrek <- CellTrek::celltrek(st_sc_int=DCIS_traint, int_assay='traint', sc_data=DCIS_SC,  
                                   sc_assay = 'RNA', reduction='pca', intp=T, intp_pnt=10000,   
                                   intp_lin=F, nPCs=30, ntree=1000, dist_thresh=0.4, top_spot=5,  
                                   spot_n=5, repel_r=8, repel_iter=10, keep_model=T)$celltrek
saveRDS(DCIS_celltrek, file = DCIS_celltrek_Path)
# Option 2: load outpot for step 1
DCIS_traint <- readRDS(file = DCIS_traint_Path)
# Option 2: load outpot for step 2
DCIS_celltrek <- readRDS(DCIS_celltrek_Path)
# Check the result
p1 <- DimPlot(DCIS_traint, group.by = "type")
p2 <- DimPlot(DCIS_traint, group.by = "cell_type", label = T)
p1|p2

Visualize celltrek results

This function call will open a Shiny interface where you can interact with the data In the Shiny interface, set the following parameters: Color: cell_type Type: Categorical Shape: None Click 'Plot' to visualize the results.

DCIS_celltrek$cell_type <- factor(DCIS_celltrek$cell_type, levels=sort(unique(DCIS_celltrek$cell_type)))
CellTrek::celltrek_vis(DCIS_celltrek@meta.data %>% dplyr::select(coord_x, coord_y, cell_type:id_new),
                       DCIS_celltrek@images$slice1@image, 
                       DCIS_celltrek@images$slice1@scale.factors$lowres)

Shiny_plot1

Spatial proximity analysis

Based on the CellTrek result, we can summarize the colocalization patterns between different cell types using scoloc module. Futhermore, we can use spatial graph to summarize how different cell types/states are colocalized spatially using the scoloc function. The scoloc function offers different methods for calculating colocalization.

In this example, we use the 'DT' (Delaunay Triangulation) method. You can experiment with other methods as described in the CellTrek documentation.

# Feel free to try different methods in `scoloc`
# The `scoloc` function identifies colocalization patterns between different cell types based on their spatial distribution.
DCIS_celltrek_coloc <- CellTrek::scoloc(DCIS_celltrek, col_cell='cell_type', use_method='DT') 

# Extract the colocalization results (including cell types and their frequencies)
DCIS_celltrek_coloc_mst_cons <- DCIS_celltrek_coloc$mst_cons

## We then extract the metadata (including cell types and their frequencies)
DCIS_cell_class <- DCIS_celltrek@meta.data %>% dplyr::select(id=cell_type) %>% unique

Visualize scoloc_vis results

Visualize the colocalization patterns using scoloc_vis. The scoloc_vis function generates a spatial graph that shows how different cell types are colocalized.
Each node represents a cell type, and edges between nodes indicate colocalization relationships.
The thickness of the edges reflects the strength of colocalization.
The color and size of the nodes can represent additional metadata, such as cell type or frequency.

CellTrek::scoloc_vis(DCIS_celltrek_coloc_mst_cons, meta_data=DCIS_cell_class)

Shiny_plot2

Session info

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/libopenblasp-r0.3.3.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] paletteer_1.5.0         spacexr_2.2.1           ggplot2_3.4.2          
##  [4] CellTrek_0.0.94         RColorBrewer_1.1-3      dplyr_1.1.2            
##  [7] SeuratObject_4.1.3.9001 Seurat_4.3.0            hdf5r_1.3.7            
## [10] klippy_0.0.0.9500      
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.4.1        plyr_1.8.8             igraph_1.5.0.1        
##   [4] lazyeval_0.2.2         sp_2.0-0               splines_4.1.2         
##   [7] listenv_0.9.0          scattermore_0.8        digest_0.6.34         
##  [10] foreach_1.5.1          htmltools_0.5.4        fansi_1.0.4           
##  [13] magrittr_2.0.3         doParallel_1.0.14      tensor_1.5            
##  [16] cluster_2.1.2          ROCR_1.0-11            limma_3.50.3          
##  [19] globals_0.16.2         fastcluster_1.2.3      matrixStats_0.63.0    
##  [22] spatstat.sparse_3.0-0  rmdformats_1.0.4       colorspace_2.1-0      
##  [25] ggrepel_0.9.2          xfun_0.37              jsonlite_1.8.4        
##  [28] progressr_0.13.0       spatstat.data_3.0-0    iterators_1.0.14      
##  [31] survival_3.6-4         zoo_1.8-11             glue_1.6.2.9000       
##  [34] polyclip_1.10-4        gtable_0.3.1           leiden_0.4.3          
##  [37] car_3.1-0              future.apply_1.10.0    abind_1.4-5           
##  [40] scales_1.2.1           DBI_1.2.2              data.tree_1.0.0       
##  [43] rstatix_0.7.1          spatstat.random_3.1-3  miniUI_0.1.1.1        
##  [46] Rcpp_1.0.10            viridisLite_0.4.1      xtable_1.8-4          
##  [49] magic_1.6-0            reticulate_1.28        bit_4.0.5             
##  [52] akima_0.6-3.4          randomForestSRC_3.2.0  htmlwidgets_1.5.4     
##  [55] httr_1.4.5             DiagrammeR_1.0.9       ellipsis_0.3.2        
##  [58] ica_1.0-3              farver_2.1.1           pkgconfig_2.0.3       
##  [61] sass_0.4.2             uwot_0.1.14            deldir_1.0-6          
##  [64] utf8_1.2.3             dynamicTreeCut_1.63-1  labeling_0.4.2        
##  [67] tidyselect_1.2.0       rlang_1.1.0            reshape2_1.4.4        
##  [70] later_1.3.0            munsell_0.5.0          tools_4.1.2           
##  [73] visNetwork_2.1.0       cachem_1.0.7           cli_3.6.2             
##  [76] dbscan_1.1-11          generics_0.1.3         broom_1.0.4           
##  [79] ggridges_0.5.4         geometry_0.4.5         evaluate_0.23         
##  [82] stringr_1.5.0          fastmap_1.1.1          yaml_2.3.7            
##  [85] goftest_1.2-3          rematch2_2.1.2         knitr_1.41            
##  [88] bit64_4.0.5            fitdistrplus_1.1-11    purrr_1.0.1           
##  [91] RANN_2.6.1             pbapply_1.7-0          future_1.32.0         
##  [94] nlme_3.1-155           mime_0.12              ggrastr_1.0.1         
##  [97] compiler_4.1.2         rstudioapi_0.14        beeswarm_0.4.0        
## [100] plotly_4.10.0          png_0.1-8              ggsignif_0.6.3        
## [103] spatstat.utils_3.0-1   tibble_3.2.1           bslib_0.4.0           
## [106] stringi_1.7.12         highr_0.10             lattice_0.20-45       
## [109] Matrix_1.5-4           vctrs_0.6.2            pillar_1.9.0          
## [112] lifecycle_1.0.3        spatstat.geom_3.0-6    lmtest_0.9-40         
## [115] jquerylib_0.1.4        RcppAnnoy_0.0.20       data.table_1.15.0     
## [118] cowplot_1.1.1          irlba_2.3.5.1          httpuv_1.6.8          
## [121] patchwork_1.1.2        R6_2.5.1               bookdown_0.30         
## [124] promises_1.2.0.1       KernSmooth_2.23-20     gridExtra_2.3         
## [127] vipor_0.4.5            parallelly_1.35.0      codetools_0.2-18      
## [130] MASS_7.3-60            assertthat_0.2.1       philentropy_0.7.0     
## [133] withr_2.5.0            sctransform_0.3.5      parallel_4.1.2        
## [136] grid_4.1.2             tidyr_1.2.1            rmarkdown_2.14        
## [139] carData_3.0-5          packcircles_0.3.5      Rtsne_0.16            
## [142] ggpubr_0.5.0           spatstat.explore_3.0-6 shiny_1.7.2           
## [145] ggbeeswarm_0.6.0
Clone this wiki locally