forked from BodenmillerGroup/IMCDataAnalysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
09-singlecell_visualization.Rmd
729 lines (574 loc) · 27 KB
/
09-singlecell_visualization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
# Single cell visualization {#single-cell-visualization}
The following section describes typical approaches on how to visualize
single-cell data.
This chapter is divided into three parts. Section \@ref(cell-type-level)
will highlight visualization approaches downstream of our cell type
classification from Section \@ref(classification). We will then focus on
visualization methods that relate single-cell data to the sample level
in Section \@ref(sample-level). Lastly, Section \@ref(rich-example) will
provide a more customized example on how to integrate various
single-cell and sample metadata into one heatmap using the
[ComplexHeatmap](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html)
package [@Gu2016].
Visualization functions from popular R packages in single-cell research
such as
[scater](https://bioconductor.org/packages/release/bioc/html/scater.html),
[DittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html)
and
[CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html)
will be utilized. We will recycle methods and functions that we have
seen in previous sections, while also introducing new ones.
Please note that this chapter aims to provide an overview on **common**
visualization options and should be seen as a stepping-stone. However,
many more options exist and the user should customize the visualization
according to the biological question at hand.
## Load data
First, we will read in the previously generated `SpatialExperiment`
object.
```{r read-data-scviz, message=FALSE}
spe <- readRDS("data/spe.rds")
```
For visualization purposes, we will define markers that were used for
cell type classification and markers that can indicate a specific cell
state (e.g. Ki67).
```{r define-markers, message=FALSE}
# Define cell_type_markers
type_markers <- c("Ecad", "CD45RO", "CD20", "CD3", "FOXP3", "CD206", "MPO",
"SMA", "CD8a", "CD4", "HLADR", "CD15", "CD38", "PDGFRb")
# Define cell_state_markers
state_markers <- c("CarbonicAnhydrase", "Ki67", "PD1", "GrzB", "PDL1",
"ICOS", "TCF7", "VISTA")
# Add to spe
rowData(spe)$marker_class <- ifelse(rownames(spe) %in% type_markers, "type",
ifelse(rownames(spe) %in% state_markers, "state",
"other"))
```
## Cell-type level {#cell-type-level}
In the first section of this chapter, the grouping-level for the
visualization approaches will be the cell type classification from
Section \@ref(classification). Other grouping levels (e.g. cluster
assignments from Section \@ref(clustering)) are possible and the user
should adjust depending on the chosen analysis workflow.
### Dimensionality reduction visualization
As seen before, we can visualize single-cells in low-dimensional space.
Often, non-linear methods for dimensionality reduction such as tSNE and
UMAP are sued. They aim to preserve the distances between each cell and its
neighbors in the high-dimensional space.
Interpreting these plots is not trivial, but local neighborhoods in the
plot can suggest similarity in expression for given cells. See
[Orchestrating Single-Cell Analysis with
Bioconductor](https://bioconductor.org/books/release/OSCA/) for more
details.
Here, we will use `dittoDimPlot` from the
[DittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html)
package and `plotReducedDim` from the
[scater](https://bioconductor.org/packages/release/bioc/html/scater.html) package
to visualize the fastMNN-corrected UMAP colored by cell type and
expression, respectively.
Both functions are highly flexible and return `ggplot` objects which can
be further modified.
```{r cell type umap, fig.width=10, fig.height=5, message=FALSE}
library(dittoSeq)
library(scater)
library(patchwork)
library(cowplot)
library(viridis)
## UMAP colored by cell type and expression - dittoDimPlot
p1 <- dittoDimPlot(spe, var = "celltype",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
do.label = TRUE) +
scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
theme(legend.title = element_blank()) +
ggtitle("Cell types on UMAP, integrated cells")
p2 <- dittoDimPlot(spe, var = "Ecad", assay = "exprs",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
colors = viridis(100), do.label = TRUE) +
scale_color_viridis()
p1 + p2
```
```{r cell type umap 2, fig.width=8, fig.height=8, message=FALSE}
# UMAP colored by expression for all markers - plotReducedDim
plot_list <- lapply(rownames(spe)[rowData(spe)$marker_class == "type"], function(x){
p <- plotReducedDim(spe, dimred = "UMAP_mnnCorrected",
colour_by = x,
by_exprs_values = "exprs",
point_size = 0.2)
return(p)
})
plot_grid(plotlist = plot_list)
```
### Heatmap visualization
Next, it is often useful to visualize single-cell expression per cell
type in form of a heatmap. For this, we will use the `dittoHeatmap`
function from the
[DittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html)
package.
We sub-sample the dataset to 4000 cells for ease of visualization and
overlay the cancer type and patient ID from which the cells were
extracted.
```{r celltype heatmap, fig.height = 7,fig.width = 7, message=FALSE}
set.seed(220818)
cur_cells <- sample(seq_len(ncol(spe)), 4000)
#Heatmap visualization - DittoHeatmap
dittoHeatmap(spe[,cur_cells], genes = rownames(spe)[rowData(spe)$marker_class == "type"],
assay = "exprs", order.by = c("celltype"),
cluster_cols = FALSE, scale = "none",
heatmap.colors = viridis(100), annot.by = c("celltype","indication","patient_id"),
annotation_colors = list(indication = metadata(spe)$color_vectors$indication,
patient_id = metadata(spe)$color_vectors$patient_id,
celltype = metadata(spe)$color_vectors$celltype)
)
```
Similarly, we can visualize the mean marker expression per cell type for
all cells using `aggregateAcrossCells` from
[scuttle](https://bioconductor.org/packages/release/bioc/html/scuttle.html)
and then use `dittoHeatmap`. We will annotate the heatmap with the number of
cells per cell type.
```{r celltype mean-expression-per-cluster, fig.height=5}
library(scuttle)
## by cell type
celltype_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$celltype,
statistics = "mean",
use.assay.type = "exprs",
subset.row = rownames(spe)[rowData(spe)$marker_class == "type"]
)
# No scaling
dittoHeatmap(celltype_mean,
assay = "exprs", cluster_cols = TRUE,
scale = "none",
heatmap.colors = viridis(100),
annot.by = c("celltype","ncells"),
annotation_colors = list(celltype = metadata(spe)$color_vectors$celltype,
ncells = plasma(100)))
# Min-max expression scaling
dittoHeatmap(celltype_mean,
assay = "exprs", cluster_cols = TRUE,
scaled.to.max = TRUE,
heatmap.colors.max.scaled = inferno(100),
annot.by = c("celltype","ncells"),
annotation_colors = list(celltype = metadata(spe)$color_vectors$celltype,
ncells = plasma(100)))
```
As illustrated above for not- and min-max-scaled expression values,
different ways of scaling can have strong effects on visualization
output and we encourage the user to test multiple options.
Overall, we can observe cell-type specific marker expression (e.g. Tumor
= Ecad high and B cells = CD20 high) in agreement with the gating scheme
of Section \@ref(classification).
### Violin plot visualization
The `plotExpression` function from the
[scater](https://bioconductor.org/packages/release/bioc/html/scater.html) package
allows to plot the distribution of expression values across cell types
for a chosen set of proteins. The output is a flexible `ggplot` object.
```{r celltype violin, message=FALSE, fig.height=12}
#Violin Plot - plotExpression
plotExpression(spe[,cur_cells],
features = rownames(spe)[rowData(spe)$marker_class == "type"],
x = "celltype", exprs_values = "exprs",
colour_by = "celltype") +
theme(axis.text.x = element_text(angle = 90))+
scale_color_manual(values = metadata(spe)$color_vectors$celltype)
```
### Scatter plot visualization
Moreover, a protein expression based scatter plot can be generated with
`dittoScatterPlot` (returns a `ggplot` object). We overlay the plot with
the cell type information.
```{r celltype scatter, message=FALSE}
#Scatter plot
dittoScatterPlot(spe,
x.var = "CD3", y.var="CD20",
assay.x = "exprs", assay.y = "exprs",
color.var = "celltype") +
scale_color_manual(values = metadata(spe)$color_vectors$celltype) +
ggtitle("Scatterplot for CD3/CD20 labelled by celltype")
```
We can nicely observe how the "B next to T cell" phenotype (`BnTcell`)
has high expression values for both CD20 and CD3.
**Of note**, in a setting where the user aims to assign labels to
clusters based on marker genes/proteins, all of the above plots can be
particularly helpful.
### Barplot visualization
In order to display frequencies of cell types per sample/patient, the
`dittoBarPlot` function will be used. Data can be represented as
percentages or counts and again `ggplot` objects are outputted.
```{r barplot celltype, message=FALSE}
# by sample_id
dittoBarPlot(spe, var = "celltype", group.by = "sample_id") +
scale_fill_manual(values = metadata(spe)$color_vectors$celltype)
# by patient_id - percentage
dittoBarPlot(spe, var = "celltype", group.by = "patient_id") +
scale_fill_manual(values = metadata(spe)$color_vectors$celltype)
# by patient_id - count
dittoBarPlot(spe, scale = "count", var = "celltype", group.by = "patient_id") +
scale_fill_manual(values = metadata(spe)$color_vectors$celltype)
```
We can see that cell type frequencies change between samples/patients
and that the highest proportion/counts of plasma cells and stromal
cells can be observed for Patient 2 and Patient 4, respectively.
### CATALYST-based visualization
In the following, we highlight some useful visualization
functions from the
[CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html)
package.
To this end, we will first convert the `SpatialExperiment` object into a
CATALYST-compatible format.
```{r celltype CATALYST}
library(CATALYST)
# save spe in CATALYST-compatible object with renamed colData entries and
# new metadata information
spe_cat <- spe
spe_cat$sample_id <- factor(spe$sample_id)
spe_cat$condition <- factor(spe$indication)
spe_cat$cluster_id <- factor(spe$celltype)
#add celltype information to metadata
metadata(spe_cat)$cluster_codes <- data.frame(celltype = factor(spe_cat$celltype))
```
All of the `CATALYST` functions presented below return `ggplot` objects,
which allow flexible downstream adjustment.
#### Pseudobulk-level MDS plot
Pseudobulk-level multi-dimensional scaling (MDS) plots can be rendered
with the exported `pbMDS` function.
Here, we will use `pbMDS` to highlight expression similarities between
cell types and subsequently for each celltype-sample-combination.
```{r celltype pbmds, message=FALSE}
# MDS pseudobulk by cell type
pbMDS(spe_cat, by = "cluster_id",
features = rownames(spe_cat)[rowData(spe_cat)$marker_class == "type"],
label_by = "cluster_id", k = "celltype") +
scale_color_manual(values = metadata(spe_cat)$color_vectors$celltype)
# MDS pseudobulk by cell type and sample_id
pbMDS(spe_cat, by = "both",
features = rownames(spe_cat)[rowData(spe_cat)$marker_class == "type"],
k = "celltype", shape_by = "condition",
size_by = TRUE) +
scale_color_manual(values = metadata(spe_cat)$color_vectors$celltype)
```
We can see that the pseudobulk-expression profile of neutrophils seems
markedly distinct from the other cell types, while comparable cell types
such as the T cell subtypes group together. Furthermore, pseudobulk
cell-type profiles from SCCHN appear different from the other
indications.
#### Reduced dimension plot on CLR of proportions
The `clrDR` function produces dimensionality reduction plots on centered
log-ratios (CLR) of sample/cell type proportions across cell
type/samples.
As with `pbMDS`, the output plots aim to illustrate the degree of
similarity between cell types based on sample proportions.
```{r celltype - clrDR, message=FALSE}
# CLR on cluster proportions across samples
clrDR(spe_cat, dr = "PCA",
by = "cluster_id", k = "celltype",
label_by = "cluster_id", arrow_col = "patient_id",
point_pal = metadata(spe_cat)$color_vectors$celltype) +
scale_color_manual(values = metadata(spe_cat)$color_vectors$patient_id)
```
We can again observe that neutrophils have a divergent profile also in
terms of their sample proportions.
#### Pseudobulk expression boxplot
The `plotPbExprs` generates combined box- and jitter-plots of aggregated
marker expression per cell type. Here, we further split the data by
cancer type.
```{r celltype pbExprs, fig.width=15, fig.height=6, message=FALSE}
plotPbExprs(spe_cat, k = "celltype",
facet_by = "cluster_id", ncol = 4,
features = rownames(spe_cat)[rowData(spe_cat)$marker_class == "type"]) +
scale_color_manual(values = metadata(spe_cat)$color_vectors$indication)
```
Notably, CD15 levels are elevated in SCCHN in comparison to all other
indications for most cell types.
## Sample-level {#sample-level}
In the next section, we will shift the grouping-level focus from the
cell type to the sample-level. Sample-levels will be further divided
into the sample-(image) and patient-level.
Although we will mostly repeat the functions from the previous section
\@ref(cell-type-level), sample- and patient-level centered visualization
can provide additional quality control and biological interpretation.
### Dimensionality reduction visualization
Visualization of low-dimensional embeddings, here comparing non-corrected and
fastMNN-corrected UMAPs, and coloring it by sample-levels is often used
for "batch effect" assessment as mentioned in Section
\@ref(cell-quality).
We will again use `dittoDimPlot`.
```{r sample umap, fig.width=8, fig.height=8, message = FALSE}
## UMAP colored by cell type and expression - dittoDimPlot
p1 <- dittoDimPlot(spe, var = "sample_id",
reduction.use = "UMAP", size = 0.2,
colors = viridis(100), do.label = FALSE) +
scale_color_manual(values = metadata(spe)$color_vectors$sample_id) +
theme(legend.title = element_blank()) +
ggtitle("Sample ID")
p2 <- dittoDimPlot(spe, var = "sample_id",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
colors = viridis(100), do.label = FALSE) +
scale_color_manual(values = metadata(spe)$color_vectors$sample_id) +
theme(legend.title = element_blank()) +
ggtitle("Sample ID")
p3 <- dittoDimPlot(spe, var = "patient_id",
reduction.use = "UMAP", size = 0.2,
do.label = FALSE) +
scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
theme(legend.title = element_blank()) +
ggtitle("Patient ID")
p4 <- dittoDimPlot(spe, var = "patient_id",
reduction.use = "UMAP_mnnCorrected", size = 0.2,
do.label = FALSE) +
scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
theme(legend.title = element_blank()) +
ggtitle("Patient ID")
(p1 + p2) / (p3 + p4)
```
As illustrated in Section \@ref(batch-effects), we see that the fastMNN
approach (right side of the plot) leads to mixing of cells across
samples/patients and thus batch effect correction.
### Heatmap visualization
It can be beneficial to use a heatmap to visualize single-cell
expression per sample and patient. Such a plot, which we will create
using `dittoHeatmap`, can highlight biological differences across
samples/patients.
```{r sample heatmap, fig.height = 8,fig.width = 8}
#Heatmap visualization - DittoHeatmap
dittoHeatmap(spe[,cur_cells], genes = rownames(spe)[rowData(spe)$marker_class == "type"],
assay = "exprs", order.by = c("patient_id","sample_id"),
cluster_cols = FALSE, scale = "none",
heatmap.colors = viridis(100),
annot.by = c("celltype","indication","patient_id","sample_id"),
annotation_colors = list(celltype = metadata(spe)$color_vectors$celltype,
indication = metadata(spe)$color_vectors$indication,
patient_id = metadata(spe)$color_vectors$patient_id,
sample_id = metadata(spe)$color_vectors$sample_id))
```
As in Section \@ref(image-quality), aggregated mean marker expression
per sample/patient allow identification of samples/patients with
outlying expression patterns.
Here, we will focus on the patient level and use `aggregateAcrossCells`
and `dittoHeatmap`. The heatmap will be annotated with the number of
cells per patient and cancer type and displayed using two scaling
options.
```{r sample mean-expression-per-cluster, fig.height=5}
#by patient_id
patient_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$patient_id,
statistics = "mean",
use.assay.type = "exprs",
subset.row = rownames(spe)[rowData(spe)$marker_class == "type"]
)
# No scaling
dittoHeatmap(patient_mean,
assay = "exprs", cluster_cols = TRUE,
scale = "none",
heatmap.colors = viridis(100),
annot.by = c("patient_id","indication","ncells"),
annotation_colors = list(patient_id = metadata(spe)$color_vectors$patient_id,
indication = metadata(spe)$color_vectors$indication,
ncells = plasma(100)))
# Min-max expression scaling
dittoHeatmap(patient_mean,
assay = "exprs", cluster_cols = TRUE,
scaled.to.max = TRUE,
heatmap.colors.max.scaled = inferno(100),
annot.by = c("patient_id","indication","ncells"),
annotation_colors = list(patient_id = metadata(spe)$color_vectors$patient_id,
indication = metadata(spe)$color_vectors$indication,
ncells = plasma(100)))
```
As seen before CD15 levels are elevated in Patient 4 (SCCHN), while SMA
levels are highest for Patient 4 (CRC).
### Barplot visualization
Complementary to displaying cell type frequencies per sample/patient, we
can use `dittoBarPlot` to display sample/patient frequencies per cell
type.
```{r barplot sample, message=FALSE}
dittoBarPlot(spe, var = "patient_id", group.by = "celltype") +
scale_fill_manual(values = metadata(spe)$color_vectors$patient_id)
dittoBarPlot(spe, var = "sample_id", group.by = "celltype") +
scale_fill_manual(values = metadata(spe)$color_vectors$sample_id)
```
`Patient2` has the highest and lowest proportion of plasma cells and
neutrophils, respectively.
### CATALYST-based visualization
#### Pseudobulk-level MDS plot
Expression-based pseudobulks for each sample can be compared with the
`pbMDS` function.
```{r sample-pbmds}
# MDS pseudobulk by sample_id
pbMDS(spe_cat, by = "sample_id",
color_by = "sample_id",
features = rownames(spe_cat)[rowData(spe_cat)$marker_class == "type"]) +
scale_color_manual(values = metadata(spe_cat)$color_vectors$sample_id)
```
There are marked differences in pseudobulk-expression patterns between
samples and across patients, which can be driven by biological
differences and also technical aspects such as divergent region
selection.
#### Reduced dimension plot on CLR of proportions
The `clrDR` function can also be used to analyze similarity of samples
based on cell type proportions.
```{r sample-clrDR}
# CLR on sample proportions across clusters
clrDR(spe_cat, dr = "PCA",
by = "sample_id", point_col = "sample_id",
k = "celltype", point_pal = metadata(spe_cat)$color_vectors$sample_id) +
scale_color_manual(values = metadata(spe_cat)$color_vectors$celltype)
```
There are notable differences between samples based on their cell type
proportions.
Interestingly, `Patient3_001`, `Patient1_003`, `Patient4_007` and
`Patient4_006` group together and the PC loadings indicate a strong
contribution of BnT and B cells, which could propose formation of
tertiary lymphoid structures (TLS). In section \@ref(spatial-viz), we
will be able to confirm this hypothesis visually on the images.
## Further examples {#rich-example}
In the last section of this chapter, we will use the popular
[ComplexHeatmap](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html)
package to create a visualization example that combines various
cell-type- and sample-level information.
[ComplexHeatmap](https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap.html)
is highly versatile and is originally inspired from the
[pheatmap](https://cran.r-project.org/web/packages/pheatmap/index.html)
package. Therefore, many arguments have the same/similar names.
For more details, we would recommend to read the [reference
book](https://jokergoo.github.io/ComplexHeatmap-reference/book/).
### Publication-ready ComplexHeatmap
For this example, we will concatenate heatmaps and annotations
horizontally into one rich heatmap list. The grouping-level for the
visualization will again be the cell type information from Section
\@ref(classification)
Initially, we will create two separate `Heatmap` objects for cell type
and state markers.
Then, metadata information, including the cancer type proportion and
number of cells/patients per cell type, will be extracted into
`HeatmapAnnotation` objects.
Notably, we will add spatial features per cell type, here the number of
neighbors extracted from `colPair(spe)` and cell area, in another
`HeatmapAnnotation` object.
Ultimately, all objects are combined in a `HeatmapList` and visualized.
```{r complex-heatmap, warning = FALSE, message = FALSE, fig.width=9, fig.height=5}
library(ComplexHeatmap)
library(circlize)
library(tidyverse)
set.seed(22)
### 1. Heatmap bodies ###
# Heatmap body color
col_exprs <- colorRamp2(c(0,1,2,3,4),
c("#440154FF","#3B518BFF","#20938CFF",
"#6ACD5AFF","#FDE725FF"))
# Create Heatmap objects
# By celltype markers
celltype_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$celltype,
statistics = "mean",
use.assay.type = "exprs",
subset.row = rownames(spe)[rowData(spe)$marker_class == "type"])
h_type <- Heatmap(t(assay(celltype_mean, "exprs")),
column_title = "type_markers",
col = col_exprs,
name= "mean exprs",
show_row_names = TRUE,
show_column_names = TRUE)
# By cellstate markers
cellstate_mean <- aggregateAcrossCells(as(spe, "SingleCellExperiment"),
ids = spe$celltype,
statistics = "mean",
use.assay.type = "exprs",
subset.row = rownames(spe)[rowData(spe)$marker_class == "state"])
h_state <- Heatmap(t(assay(cellstate_mean, "exprs")),
column_title = "state_markers",
col = col_exprs,
name= "mean exprs",
show_row_names = TRUE,
show_column_names = TRUE)
### 2. Heatmap annotation ###
### 2.1 Metadata features
anno <- colData(celltype_mean) %>% as.data.frame %>% select(celltype, ncells)
# Proportion of indication per celltype
indication <- colData(spe) %>%
as.data.frame() %>%
select(celltype, indication) %>%
group_by(celltype) %>%
table() %>%
as.data.frame()
indication <- indication %>%
group_by(celltype) %>%
mutate(fra = Freq/sum(Freq))
indication <- indication %>%
select(-Freq) %>%
pivot_wider(id_cols = celltype,
names_from = indication,
values_from = fra) %>%
column_to_rownames("celltype")
# Number of contributing patients per celltype
cluster_PID <- colData(spe) %>%
as.data.frame() %>%
select(celltype, patient_id) %>%
group_by(celltype) %>% table() %>%
as.data.frame()
n_PID <- cluster_PID %>%
filter(Freq>0) %>%
group_by(celltype) %>%
count(name = "n_PID") %>%
column_to_rownames("celltype")
# Create HeatmapAnnotation objects
ha_anno <- HeatmapAnnotation(celltype = anno$celltype,
border = TRUE,
gap = unit(1,"mm"),
col = list(celltype = metadata(spe)$color_vectors$celltype),
which = "row")
ha_meta <- HeatmapAnnotation(n_cells = anno_barplot(anno$ncells, width = unit(10, "mm")),
n_PID = anno_barplot(n_PID, width = unit(10, "mm")),
indication = anno_barplot(indication,width = unit(10, "mm"),
gp = gpar(fill = metadata(spe)$color_vectors$indication)),
border = TRUE,
annotation_name_rot = 90,
gap = unit(1,"mm"),
which = "row")
### 2.2 Spatial features
# Add number of neighbors to spe object (saved in colPair)
n_neighbors <- colPair(spe) %>%
as.data.frame() %>%
group_by(from) %>%
count() %>%
arrange(desc(n))
spe$n_neighbors <- n_neighbors$n[match(seq_along(colnames(spe)), n_neighbors$from)]
spe$n_neighbors <- spe$n_neighbors %>% replace_na(0)
# Select spatial features and average over celltypes
spatial <- colData(spe) %>%
as.data.frame() %>%
select(area, celltype, n_neighbors)
spatial <- spatial %>%
select(-celltype) %>%
aggregate(by = list(celltype = spatial$celltype), FUN = mean) %>%
column_to_rownames("celltype")
# Create HeatmapAnnotation object
ha_spatial <- HeatmapAnnotation(
area = spatial$area,
n_neighbors = spatial$n_neighbors,
border = TRUE,
gap = unit(1,"mm"),
which = "row")
### 3. Plot rich heatmap ###
# Create HeatmapList object
h_list <- h_type +
h_state +
ha_anno +
ha_spatial +
ha_meta
# Add customized legend for anno_barplot()
lgd <- Legend(title = "indication", at = colnames(indication),
legend_gp = gpar(fill = metadata(spe)$color_vectors$indication))
# Plot
draw(h_list,annotation_legend_list = list(lgd))
```
This plot summarizes most of the information we have seen in this
chapter previously. In addition, we can observe that tumor cells have
the largest mean cell area, high number of neighbors and elevated Ki67
expression. BnT cells have the highest number of neighbors on average,
which is biological sound given their predominant location in highly
immune infiltrated regions (such as TLS).
## Session Info
<details>
<summary>SessionInfo</summary>
```{r, echo = FALSE}
sessionInfo()
```
</details>