forked from BodenmillerGroup/IMCDataAnalysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
03-prerequisites.Rmd
259 lines (204 loc) · 10.9 KB
/
03-prerequisites.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# Prerequisites {#prerequisites}
The analysis presented in this book requires a basic understanding of the
`R` programing language. An introduction to `R` can be found [here](https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf) and
in the book [R for Data Science](https://r4ds.had.co.nz/index.html).
Furthermore, it is beneficial to be familiar with single-cell data analysis
using the [Bioconductor](https://www.bioconductor.org/) framework. The
[Orchestrating Single-Cell Analysis with Bioconductor](https://bioconductor.org/books/release/OSCA/)
gives an excellent overview on data containers and basic analysis that are being
used here.
An overview on IMC as technology and necessary image processing steps can be
found on the [IMC workflow website](https://bodenmillergroup.github.io/IMCWorkflow/).
Before we get started on IMC data analysis, we will need to make sure that
software dependencies are installed and the needed example data is downloaded.
## Software requirements
To install all R packages needed for the analysis, please run:
```{r install-packages, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c("pheatmap", "viridis",
"zoo", "BiocManager", "devtools", "tiff",
"distill", "openxlsx", "ggrepel", "patchwork", "mclust",
"RColorBrewer", "uwot", "Rtsne", "harmony", "Seurat",
"SeuratObject", "cowplot", "kohonen", "caret",
"randomForest", "ggridges", "cowplot", "gridGraphics",
"scales", "tiff", "CATALYST", "scuttle", "scater",
"dittoSeq", "tidyverse", "batchelor",
"bluster","scran", "lisaClust", "spicyR"))
# Github dependencies
devtools::install_github(c("BodenmillerGroup/imcRtools",
"BodenmillerGroup/cytomapper",
"i-cyto/Rphenograph"))
```
```{r load-libraries, echo = FALSE, message = FALSE}
options(timeout=10000)
library(CATALYST)
library(SpatialExperiment)
library(SingleCellExperiment)
library(scuttle)
library(scater)
library(imcRtools)
library(cytomapper)
library(dittoSeq)
library(tidyverse)
library(bluster)
library(scran)
library(lisaClust)
library(caret)
```
Throughout the analysis, we rely on different R software packages.
This section lists the most commonly used packages in this workflow.
Data containers:
* [SpatialExperiment](https://bioconductor.org/packages/release/bioc/html/SpatialExperiment.html) version `r packageVersion("SpatialExperiment")`
* [SingleCellExperiment](https://bioconductor.org/packages/release/bioc/html/SingleCellExperiment.html) version `r packageVersion("SingleCellExperiment")`
Data analysis:
* [CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html) version `r packageVersion("CATALYST")`
* [imcRtools](https://github.com/BodenmillerGroup/imcRtools) version `r packageVersion("imcRtools")` from [Github](https://github.com/BodenmillerGroup/imcRtools)
* [scuttle](https://bioconductor.org/packages/release/bioc/html/scuttle.html) version `r packageVersion("scuttle")`
* [scater](https://bioconductor.org/packages/release/bioc/html/scater.html) version `r packageVersion("scater")`
* [batchelor](https://www.bioconductor.org/packages/release/bioc/html/batchelor.html) version `r packageVersion("batchelor")`
* [bluster](https://www.bioconductor.org/packages/release/bioc/html/bluster.html) version `r packageVersion("bluster")`
* [scran](https://www.bioconductor.org/packages/release/bioc/html/scran.html) version `r packageVersion("scran")`
* [harmony](https://github.com/immunogenomics/harmony) version `r packageVersion("harmony")`
* [Seurat](https://satijalab.org/seurat/index.html) version `r packageVersion("Seurat")`
* [lisaClust](https://www.bioconductor.org/packages/release/bioc/html/lisaClust.html) version `r packageVersion("lisaClust")`
* [caret](https://topepo.github.io/caret/) version `r packageVersion("caret")`
Data visualization:
* [cytomapper](https://github.com/BodenmillerGroup/cytomapper) version `r packageVersion("cytomapper")` from [Github](https://github.com/BodenmillerGroup/cytomapper)
* [dittoSeq](https://bioconductor.org/packages/release/bioc/html/dittoSeq.html) version `r packageVersion("dittoSeq")`
Tidy R:
* [tidyverse](https://www.tidyverse.org/) version `r packageVersion("tidyverse")`
## Image processing {#image-processing}
The analysis presented here fully relies on packages written in the programming
language `R` and primarily focuses on analysis approaches downstream of image
processing. The example data available at
[https://zenodo.org/record/5949116](https://zenodo.org/record/5949116) were
processed (file type conversion, image segmentation, feature extraction as
explained in Section \@ref(processing)) using the
[steinbock](https://bodenmillergroup.github.io/steinbock/latest/) framework. The
exact command line interface calls to process the raw data are shown below:
```{r, echo = FALSE, message = FALSE}
dir.create("data/steinbock")
dir.create("data/ImcSegmentationPipeline")
# Pre-download steinbock file
download.file("https://zenodo.org/record/6642699/files/steinbock.sh",
"data/steinbock/steinbock.sh")
```
```{bash, file="data/steinbock/steinbock.sh", eval=FALSE}
```
## Download example data {#download-data}
Throughout this tutorial, we will access a number of different data types.
To declutter the analysis scripts, we will already download all needed data here.
To highlight the basic steps of IMC data analysis, we provide example data that
were acquired as part of the **I**ntegrated i**MMU**noprofiling of large adaptive
**CAN**cer patient cohorts projects ([immucan.eu](https://immucan.eu/)). The
raw data of 4 patients can be accessed online at
[zenodo.org/record/5949116](https://zenodo.org/record/5949116) the
sample/patient metadata information here:
```{r download-sample-data}
download.file("https://zenodo.org/record/5949116/files/sample_metadata.xlsx",
destfile = "data/sample_metadata.xlsx")
```
### Processed multiplexed imaging data
The IMC raw data was either processed using the
[steinbock](https://github.com/BodenmillerGroup/steinbock) framework or the
[IMC Segmentation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline).
Image processing included file type conversion, cell segmentation and feature
extraction.
**steinbock output**
The output of the `steinbock` framework required for the analysis presented here includes the single-cell mean
intensity files, the single-cell morphological features and spatial locations,
spatial object graphs in form of edge lists indicating cells in close proximity,
hot pixel filtered multi-channel images, segmentation masks, image metadata and
channel metadata. All these files will be downloaded here for later use. The
commands which were used to generate this data can be found in
`data/steinbock/steinbock.sh`.
```{r steinbock-results}
# download intensities
url <- "https://zenodo.org/record/6642699/files/intensities.zip"
destfile <- "data/steinbock/intensities.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download regionprops
url <- "https://zenodo.org/record/6642699/files/regionprops.zip"
destfile <- "data/steinbock/regionprops.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download neighbors
url <- "https://zenodo.org/record/6642699/files/neighbors.zip"
destfile <- "data/steinbock/neighbors.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download images
url <- "https://zenodo.org/record/6642699/files/img.zip"
destfile <- "data/steinbock/img.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download masks
url <- "https://zenodo.org/record/6642699/files/masks_deepcell.zip"
destfile <- "data/steinbock/masks_deepcell.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/steinbock", overwrite=TRUE)
unlink(destfile)
# download individual files
download.file("https://zenodo.org/record/6642699/files/panel.csv",
"data/steinbock/panel.csv")
download.file("https://zenodo.org/record/6642699/files/images.csv",
"data/steinbock/images.csv")
download.file("https://zenodo.org/record/6642699/files/steinbock.sh",
"data/steinbock/steinbock.sh")
```
**IMC Segmentation Pipeline output**
The example data was also processed using the
[IMC Segmetation Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline) (version 3).
To highlight the use of the reader function for this type of output, we will need
to download the `cpout` folder which is part of the `analysis` folder. The `cpout`
folder stores all relevant output files of the pipeline. For a full description
of the pipeline, please refer to the [docs](https://bodenmillergroup.github.io/ImcSegmentationPipeline/).
```{r imcsegpipe-results}
# download analysis folder
url <- "https://zenodo.org/record/6449127/files/analysis.zip"
destfile <- "data/ImcSegmentationPipeline/analysis.zip"
download.file(url, destfile)
unzip(destfile, exdir="data/ImcSegmentationPipeline", overwrite=TRUE)
unlink(destfile)
unlink("data/ImcSegmentationPipeline/analysis/cpinp/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/crops/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/histocat/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ilastik/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/ometiff/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/images/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/probabilities/", recursive=TRUE)
unlink("data/ImcSegmentationPipeline/analysis/cpout/masks/", recursive=TRUE)
```
### Files for spillover matrix estimation
To highlight the estimation and correction of channel-spillover as described by
[@Chevrier2017], we can access an example spillover-acquisition from:
```{r download-spillover-data}
download.file("https://zenodo.org/record/5949116/files/compensation.zip",
"data/compensation.zip")
unzip("data/compensation.zip", exdir="data", overwrite=TRUE)
unlink("data/compensation.zip")
```
### Gated cells
In Section \@ref(classification), we present a cell type classification approach
that relies on previously gated cells. This ground truth data is available
online at [zenodo.org/record/6554611](https://zenodo.org/record/6554611) and
will be downloaded here for later use:
```{r download-gated-cells}
download.file("https://zenodo.org/record/7079294/files/gated_cells.zip",
"data/gated_cells.zip")
unzip("data/gated_cells.zip", exdir="data", overwrite=TRUE)
unlink("data/gated_cells.zip")
```
## Software versions {#sessionInfo}
<details>
<summary>SessionInfo</summary>
```{r, echo = FALSE, message = FALSE}
sessionInfo()
```
</details>