05-spillover_matrix.Rmd

# Spillover correction

**Original scripts:** *Vito Zanotelli*, **adapted/maintained by:** *Nils Eling*

This section highlights how to generate a spillover matrix from individually
acquired single metal spots on an agarose slide. Each spot needs to be imaged as
its own acquisition/ROI and individual .txt files containing the pixel
intensities per spot need to be available. For complete details on the spillover
correction approach, please refer to [the original
publication](https://www.cell.com/cell-systems/fulltext/S2405-4712(18)30063-2) [@Chevrier2017].

**Spillover slide preparation:**  

* Prepare 2% agarose in double distilled H$_2$O in a beaker and melt it in a microwave until well dissolved. 
* Dip a blank superfrost plus glass microscope slide into the agarose and submerge it until the label. 
* Remove the slide and prop it up against a support to allow the excess agarose to run off onto paper towels.  
* Allow the slide to dry completely (at least 30 minutes).  
* Retrieve all the antibody conjugates used in the panel for which the spillover matrix is to be generated and place them on ice. 
* Arrange them in a known order (e.g. mass of the conjugated metal).  
* Pipette 0.3 µl spots of 0.4% trypan blue dye into an array on the slide. Prepare one spot per antibody, and make sure the spots are well separated. 
* Pipette 0.3 µl of each antibody conjugate (usually at 0.5 mg/ml) onto a unique blue spot, taking care to avoid different antibodies bleeding into each other. Note the exact location of each conjugate on the slide. 
* Let the spots dry completely, at least 1 hour. 

**Spillover slide acquisition:**  

* Create a .jpeg or .png image of the slide using a mobile phone camera or flat-bed scanner.  
* In the CyTOF software, create a new file and import the slide image into it. 
* Create a panorama across all the spots to visualize their locations.  
* Within each spot, create a region of interest (ROI) with a width of 200 pixels and a height of 10 pixels.  
* Name each ROI with the mass and name of the metal conjugate contained in the spot, e.g "Ir193" or "Ho165". This will be how each .txt file is named. 
* Set the profiling type of each ROI to "Local".  
* Apply the antibody panel to all the ROIs. This panel should contain all (or more) of the isotopes in the panel, with the correct metal specified. For example: if the metal used is Barium 138, make sure this, rather than Lanthanum 138, is selected. 
* Save the file, make sure "Generate Text File" is selected, and start the acquisition. 

This procedure will generate an MCD file similar to the one available on zenodo:
[https://zenodo.org/record/5949116/files/compensation.zip](https://zenodo.org/record/5949116/files/compensation.zip)

The original code of the spillover correction manuscript is available on Github
[here](https://github.com/BodenmillerGroup/cyTOFcompensation); however, due to
changes in the
[CATALYST](https://bioconductor.org/packages/release/bioc/html/CATALYST.html)
package, users were not able to reproduce the analysis using the newest software
versions. The following workflow uses the newest package versions to generate a
spillover matrix.

In brief, the highlighted workflow comprises 5 steps:

1. Reading in the data
2. Quality control 
3. (Optional) pixel binning
4. "Debarcoding" for pixel assignment
5. Pixel selection for spillover matrix estimation
6. Spillover matrix generation
7. Saving the results
8. Single-cell compensation
9. Image compensation

## Generate the spillover matrix

In the first step, we will generate a spillover matrix based on the single-metal
spots and save it for later use.

### Read in the data

Here, we will read in the individual .txt files into a `SingleCellExperiment`
object. This object can be used directly by the `CATALYST` package to estimate
the spillover.

For this to work, the .txt file names need to contain the spotted metal isotope
name. By default, the first occurrence of the isotope in the format `(mt)(mass)`
(e.g. `Sm152` for Samarium isotope with the atomic mass 152) will be used as
spot identifier. Alternatively, a named list of already read-in pixel intensities
can be provided. For more inforation, please refer to the man page `?readSCEfromTXT`.

For further downstream analysis, we will asinh-transform the data using a
cofactor of 5; a common transformation for CyTOF data [@Bendall2011].

```{r read-txts, message=FALSE}
library(imcRtools)

# Create SingleCellExperiment from .txt files
sce <- readSCEfromTXT("data/compensation/") 
assay(sce, "exprs") <- asinh(counts(sce)/5)
```

### Quality control

In the next step, we will observe the median pixel intensities per spot and
threshold on medians < 200 counts.
These types of visualization serve two purposes:

1. Small median pixel intensities (< 200 counts) might hinder the robust
estimation of the channel spillover. In that case, consecutive pixels can be
summed (see [Optional pixel binning](#pixel_binning)).

2. Each spotted metal (row) should show the highest median pixel intensity in its
corresponding channel (column). If this is not the case, either the naming of the
.txt files was incorrect or the incorrect metal was spotted.

```{r QC-heatmap, message = FALSE, fig.width=7, fig.height=7}
# Log10 median pixel counts per spot and channel
plotSpotHeatmap(sce)

# Thresholded on 200 pixel counts
plotSpotHeatmap(sce, log = FALSE, threshold = 200)
```

As we can see, all median pixel intensities are > 200 counts for each spot.
We also observe acquired channels for which no spot was placed (Xe134, Ir191, Ir193). 

### Optional pixel binning {#pixel_binning}

In cases where median pixel intensities are low (< 200 counts), consecutive
pixels can be summed to increase the robustness of the spillover estimation.
The `imcRtools` package provides the `binAcrossPixels` function,
which performs aggregation for each channel across `bin_size` consecutive pixels 
per spotted metal.

```{r binning, message=FALSE, fig.width=7, fig.height=7}
# Define grouping
bin_size = 10

sce2 <- binAcrossPixels(sce, bin_size = bin_size)

# Log10 median pixel counts per spot and channel
plotSpotHeatmap(sce2)

# Thresholded on 200 pixel counts
plotSpotHeatmap(sce2, log = FALSE, threshold = 200)
```

Here, we can see an increase in the median pixel intensities and accumulation of
off-diagonal signal. Due to already high original pixel intensities, we will
refrain from aggregating across consecutive pixels for this demonstration.

### Filtering incorrectly assigned pixels

The following step uses functions provided by the `CATALYST` package to
"debarcode" the pixels. Based on the intensity distribution of all channels,
pixels are assigned to their corresponding barcode; here this is the already
known metal spot. This procedure serves the purpose to identify pixels that
cannot be robustly assigned to the spotted metal. Pixels of such kind can be
regarded as "noisy", "background" or "artefacts" that should be removed prior to
spillover estimation.

We will also need to specify which channels were spotted (argument `bc_key`).
This information is directly contained in the `colData(sce)` slot.
To facilitate visualization, we will order the `bc_key` by mass.

The general workflow for pixel debarcoding is as follows:

1. assign a preliminary metal mass to each pixel
2. for each pixel, estimate a cutoff parameter for the distance between 
positive and negative pixel sets
3. apply the estimated cutoffs to identify truly positive pixels

```{r debarcoding, message=FALSE}
library(CATALYST)

bc_key <- as.numeric(unique(sce$sample_mass))
bc_key <- bc_key[order(bc_key)]

sce <- assignPrelim(sce, bc_key = bc_key)
sce <- estCutoffs(sce)
sce <- applyCutoffs(sce)
```

The obtained `SingleCellExperiment` now contains the additional `bc_id` entry.
For each pixel, this vector indicates the assigned mass (e.g. `161`) or
`0`, meaning unassigned. 

This information can be visualized in form of a heatmap:

```{r assignment-heatmap, fig.width=7, fig.height=7}
library(pheatmap)
cur_table <- table(sce$bc_id, sce$sample_mass)

pheatmap(log10(cur_table + 1),
         cluster_rows = FALSE,
         cluster_cols = FALSE)
```

We can see here, that all pixels were assigned to the right mass and that all
pixel sets are made up of > 1000 pixels.

However, in cases where incorrect assignment occurred or where few pixels were
measured for some spots, the `imcRtools` package exports a simple helper
function to exclude pixels based on these criteria:

```{r pixel-filtering}
sce <- filterPixels(sce, minevents = 40, correct_pixels = TRUE)
```

### Compute spillover matrix

Based on the single-positive pixels, we use the `CATALYST::computeSpillmat()`
function to compute the spillover matrix and `CATALYST::plotSpillmat()` to
visualize it. The `plotSpillmat` function checks the spotted and acquired
metal isotopes against a pre-defined `CATALYST::isotope_list()`. In this data,
the `Ar80` channel was additionally acquired to check for deviations in signal
intensity. `Ar80` needs to be added to a custom `isotope_list` object for
visualization.

```{r compute-spillover, fig.width=7, fig.height=7}
sce <- computeSpillmat(sce)

isotope_list <- CATALYST::isotope_list
isotope_list$Ar <- 80

plotSpillmat(sce, isotope_list = isotope_list)

# Save spillover matrix in new object
sm <- metadata(sce)$spillover_matrix
```

As we can see, the largest spillover appears in `In113 --> In115` and we also
observe the `+16` oxide impurities for e.g. `Nd148 --> Dy164`.

## Single-cell data compensation

The `CATALYST` package can be used to perform spillover compensation on the
**single-cell mean intensities**. Here, the `SpatialExperiment` object generated
in Section \@ref(read-data) is read in. The `CATALYST` package requires an entry
to `rowData(spe)$channel_name` for the `compCytof` function to run. This entry
should contain the metal isotopes in the form (mt)(mass)Di (e.g., `Sm152Di` for
Samarium isotope with the atomic mass 152). 

The `compCytof` function performs channel spillover compensation on the mean
pixel intensities per channel and cell. Here, we will not overwrite the assays
in the `SpatialExperiment` object to later highlight the effect of compensation.
As shown in Section \@ref(read-data), also the compensated counts are
asinh-transformed using a cofactor of 1.

```{r single-cell-compensation}
spe <- readRDS("data/spe.rds")
rowData(spe)$channel_name <- paste0(rowData(spe)$channel, "Di")

spe <- compCytof(spe, sm, 
                 transform = TRUE, cofactor = 1,
                 isotope_list = isotope_list, 
                 overwrite = FALSE)
```

To check the effect of channel spillover compensation, the expression of markers
that are affected by spillover (e.g., E cadherin in channel Yb173 and CD303 in
channel Yb174) can be visualized in form of scatter plots before and after
compensation.

```{r visualize-single-cell-spillover, message=FALSE}
library(dittoSeq)
library(patchwork)
before <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
                           assay.x = "exprs", assay.y = "exprs") +
    ggtitle("Before compensation")

after <- dittoScatterPlot(spe, x.var = "Ecad", y.var = "CD303",
                          assay.x = "compexprs", assay.y = "compexprs") +
    ggtitle("After compensation")
before + after
```

We observe that the spillover Yb173 --> Yb174 was successfully corrected.
To facilitate further downstream analysis, the non-compensated assays can now be
replaced by their compensated counterparts:

```{r overwrite-assays}
assay(spe, "counts") <- assay(spe, "compcounts") 
assay(spe, "exprs") <- assay(spe, "compexprs") 
assay(spe, "compcounts") <- assay(spe, "compexprs") <- NULL
```

## Image compensation

The [cytomapper](https://github.com/BodenmillerGroup/cytomapper) package allows channel
spillover compensation directly on **multi-channel images**. 
The `compImage` function takes a `CytoImageList` object and the estimated
spillover matrix as input. More info on how to work with `CytoImageList`
objects can be seen in Section \@ref(image-visualization).

At this point, we can read in the `CytoImageList` object containing multi-channel
images as generated in Section \@ref(read-data).
The `channelNames` need to be set according to their metal isotope in the form
(mt)(mass)Di and therefore match `colnames(sm)`.

```{r read-in-image, message=FALSE}
library(cytomapper)

images <- readRDS("data/images.rds")
channelNames(images) <- rowData(spe)$channel_name
```

The CATALYST package provides the `adaptSpillmat` function that corrects the
spillover matrix in a way that rows and columns match a predefined set of
metals. Please refer to `?compCytof` for more information how metals in the
spillover matrix are matched to acquired channels in the `SingleCellExperiment`
object.

The spillover matrix can now be adapted to exclude channels that are not part of
the measurement (`keep == 0`).

```{r}
library(tiff)
panel <- read.csv("data/steinbock/panel.csv")
adapted_sm <- adaptSpillmat(sm, paste0(panel$channel[panel$keep == 1], "Di"), 
                            isotope_list = isotope_list)
```

The adpated spillover matrix now matches the `channelNames` of the
`CytoImageList` object and can be used to perform pixel-level spillover
compensation.

```{r image-compensation, message = FALSE}
library(BiocParallel)

images_comp <- compImage(images, adapted_sm, BPPARAM = MulticoreParam())
```

As a sanity check, we will visualize the image before and after compensation:

```{r image-visualization}
# Before compensation
plotPixels(images[5], colour_by = "Yb173Di", 
           image_title = list(text = "Yb173 (Ecad) - before", position = "topleft"), 
           legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images[5], colour_by = "Yb174Di", 
           image_title = list(text = "Yb174 (CD303) - before", position = "topleft"), 
           legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))

# After compensation
plotPixels(images_comp[5], colour_by = "Yb173Di",
           image_title = list(text = "Yb173 (Ecad) - after", position = "topleft"), 
           legend = NULL, bcg = list(Yb173Di = c(0, 4, 1)))
plotPixels(images_comp[5], colour_by = "Yb174Di", 
           image_title = list(text = "Yb174 (CD303) - after", position = "topleft"),
           legend = NULL, bcg = list(Yb174Di = c(0, 4, 1)))
```

For convenience, we will re-set the `channelNames` to their biological targtes:

```{r re-set-channels}
channelNames(images_comp) <- rownames(spe)
```

## Write out compensated images

In the final step, the compensated images are written out as 16-bit TIFF
files:

```{r write-images, message=FALSE, results='hide', error=TRUE}
library(tiff)
dir.create("data/comp_img")
lapply(names(images_comp), function(x){
  writeImage(as.array(images_comp[[x]])/(2^16 - 1), 
             paste0("data/comp_img/", x, ".tiff"),
             bits.per.sample = 16)
})
```

## Save objects

For further downstream analysis, the compensated `SpatialExperiment` and
`CytoImageList` objects are saved replacing the former objects:

```{r save-objects-spill, error=TRUE}
saveRDS(spe, "data/spe.rds")
saveRDS(images_comp, "data/images.rds")
```

```{r integration-test-original_sm, include = FALSE}
#library(testthat)
#correct_sm <- as.matrix(read.csv("data/Figure_S5/paper_version_Spillover_Matrix_2_sm.csv", row.names = 1))
#current_sm <- metadata(sce)$spillover_matrix
#correct_sm <- correct_sm[rownames(current_sm), colnames(current_sm)]

#expect_equal(correct_sm, current_sm, tolerance = 0.001)
```

```{r integration-test-tiff_sm, include = FALSE}
#correct_sm <- readTIFF("data/Figure_4/IMC_image/analysis/imc_full_sm.tiff")
#current_sm <- adapted_sm
#dimnames(current_sm) <- NULL

#expect_equal(correct_sm, current_sm, tolerance = 0.0005)
```

```{r integration-test-img-comp, include = FALSE}
#correct_img <- loadImages("data/Figure_4/IMC_image/cpoutput/imc_example_image_a0_full_spillcorr_nnls.tif")

#expect_equal(as.array(correct_img$imc_example_image_a0_full_spillcorr_nnls),
#             as.array(img_comp$imc_example_image_a0_full), 
#             check.attributes = FALSE, tolerance = 0.0005)
```

## Session Info

<details>
   <summary>SessionInfo</summary>
   
```{r, echo = FALSE}
sessionInfo()
```
</details>