-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
niklasmueboe
committed
Jun 12, 2024
1 parent
b95b25a
commit 35b1341
Showing
6 changed files
with
565 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
#= | ||
# Example analysis | ||
This is an example workflow demonstrating all the different functionalities the | ||
`StereoSSAM.jl` package offers. | ||
## General workflow | ||
We start by loading the package and defining the path of some example data. | ||
=# | ||
|
||
using StereoSSAM | ||
|
||
using ImageTransformations | ||
using ImageShow | ||
using Printf | ||
|
||
data_path = "./data"; | ||
|
||
stereoseq_file = joinpath(data_path, "Mouse_brain_Adult_GEM_bin1.tsv.gz"); | ||
signature_file = joinpath(data_path, "signatures.csv"); | ||
color_file = joinpath(data_path, "celltype_colors.json"); | ||
|
||
#= | ||
### Loading data | ||
We read the data from file using [`readstereoseq`](@ref). The input will | ||
usually be a GEM file of a StereoSeq experiment. The output is a | ||
`DimensionalData.DimArray` of gene counts stored as | ||
[`SparseArrays.SparseMatrixCSC`](@extref). The input data stems from the | ||
original [Stereo-seq publication](https://doi.org/10.1016/j.cell.2022.04.003) and can | ||
be downloaded [here](https://db.cngb.org/stomics/mosta/download/). | ||
=# | ||
|
||
counts = readstereoseq(stereoseq_file); | ||
|
||
#= | ||
We also load pre-defined cell-type signatures | ||
=# | ||
|
||
using CSV | ||
using DataFrames | ||
using DimensionalData | ||
|
||
## read pre-determined cell-type signatures | ||
signatures = CSV.read(signature_file, DataFrame); | ||
celltypes = signatures.Celltype; | ||
select!(signatures, Not(:Celltype)); | ||
|
||
## remove genes that are not detected in Stereo-seq experiment | ||
signatures = signatures[:, names(signatures) .∈ [dims(counts, 1)]]; | ||
|
||
print(signatures[1:5, 1:8]) | ||
|
||
#= | ||
We can have a look at the overall structure by calculating the [`totalrna`](@ref) | ||
which is basically the sum across all genes per pixel. | ||
=# | ||
|
||
total_counts = totalrna(counts); | ||
simshow(imresize(total_counts; ratio=1 / 45)) | ||
|
||
#= | ||
### Kernel | ||
Next, we define a [`gaussiankernel`](@ref) to use for smoothing and thus integrating | ||
the gene expression locally across pixels. The relevant parameters are the abndwidth of | ||
the kernel and the radius i.e. how large the kernel will be measured in bandwidths. | ||
For example, setting the bandwidth ``\sigma=8`` and the radius ``r=2`` will result in a | ||
kernel of size ``2r*\sigma+1=33``. | ||
We can also convert the kernel to `Float32` to reduce memory usage later on. | ||
=# | ||
|
||
kernel = gaussiankernel(8, 2); | ||
kernel = convert.(Float32, kernel); | ||
|
||
#= | ||
We can evaluate the effects by e.g. applying the kernel to the totalRNA. | ||
=# | ||
|
||
total_rna = kde(Matrix(totalrna(counts)), kernel); | ||
simshow(imresize(total_rna; ratio=1 / 45)) | ||
|
||
#= | ||
### Local Maxima | ||
The smoothed totalRNA is used to find local maxima of that can be used as cell proxies | ||
to identify gene signatures or for further analysis. | ||
[`findlocalmaxima`](@ref) allows you to specify a minimum distance between to | ||
neighboring local maxima. | ||
=# | ||
|
||
lm = findlocalmaxima(total_rna, 6); | ||
println("$(length(lm)) local maxima detected") | ||
|
||
#= | ||
Once we identified the local maxima we can load them for a defined set of genes. | ||
[`getlocalmaxima`](@ref) allows you to load the maxima in a variety of output types. | ||
Have a look at the existing `StereoSSAM.jl` extensions. Here we are going to load them | ||
as `Muon.AnnData` object. | ||
=# | ||
|
||
using Muon | ||
|
||
ad = getlocalmaxima(AnnData, counts, lm, kernel; genes=names(signatures)) | ||
|
||
#= | ||
### Cell-type map | ||
Once we have identified gene signatures from literature, previous experiments, | ||
or by analysing local maxima we can assign a celltype to each pixel. This is usually the | ||
most time-intensive processing step. [`assigncelltype`](@ref) returns 2 matrices; | ||
a map of assigned celltypes of each pixel as | ||
`CategoricalArrays.CategoricalMatrix` and the cosine similarity of the | ||
assigned cell type for each pixel as matrix. | ||
=# | ||
|
||
celltype_map, cosine_sim = assigncelltype(counts, signatures, kernel; celltypes=celltypes); | ||
|
||
#= | ||
In the final step we can visualize the cell-type map. | ||
=# | ||
|
||
using Colors | ||
using JSON | ||
|
||
## generate the lookup-table for cell-type colors | ||
lut = Dict(Iterators.map(((k, v),) -> k => RGB(v...), pairs(JSON.parsefile(color_file)))); | ||
|
||
background_color = RGB(0, 0, 0); | ||
|
||
#- | ||
|
||
celltype_img = map(x -> get(lut, x, background_color), celltype_map); | ||
simshow(imresize(celltype_img; ratio=1 / 45)) | ||
|
||
#= | ||
When visualizing the cell-type map, it might be beneficial to remove background noise. | ||
A histogram of the totalRNA can be helpful to define a threshold. | ||
=# | ||
|
||
using SparseArrays | ||
using Plots | ||
|
||
histogram(nonzeros(sparse(total_rna)); title="KDE of totalRNA", bins=200) | ||
|
||
#- | ||
|
||
celltype_img[total_rna .< 1.5] .= background_color; | ||
|
||
simshow(imresize(celltype_img; ratio=1 / 45)) | ||
|
||
#= | ||
## Utils | ||
Some additional useful functions are included in `StereoSSAM.jl`. | ||
### Cropping & Masking | ||
[`crop!`](@ref) allows you to slice the counts across all genes in x-y dimension. | ||
=# | ||
crop!(counts, (4_001:7_000, 6_001:9_000)); | ||
|
||
simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10)) | ||
|
||
#= | ||
[`mask!`](@ref) allows you to filter the field of view by an arbitrary binary mask. | ||
=# | ||
|
||
## create a mask | ||
roi_mask = zeros(Bool, 3_000, 3_000) | ||
roi_mask[1:1_500, 1_501:3_000] .= true | ||
roi_mask[1_501:3_000, 1:1_500] .= true | ||
|
||
mask!(counts, roi_mask); | ||
|
||
simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10)) | ||
|
||
#= | ||
### Binning | ||
The same extensions that are available to load local maxima are also available for | ||
reading the StereoSeq data into bins using [`readstereoseqbinned`](@ref). This can be | ||
useful for e.g. identifying gene signatures from binned data, or detecting | ||
highly variable genes that can be used for analysis. We again here demonstrate it using | ||
`Muon.AnnData`. | ||
=# | ||
|
||
ad = StereoSSAM.IO.readstereoseqbinned(AnnData, stereoseq_file, 50) |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
[deps] | ||
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b" | ||
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581" | ||
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0" | ||
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0" | ||
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19" | ||
ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1" | ||
ImageShow = "4e3cecfd-b093-5904-9786-8bbb286a6a31" | ||
ImageTransformations = "02fcd773-0e25-5acc-982a-7f6622650795" | ||
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6" | ||
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306" | ||
Muon = "446846d7-b4ce-489d-bf74-72da18fe3629" | ||
OpenCV = "f878e3a2-a245-4720-8660-60795d644f2a" | ||
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80" | ||
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7" | ||
StereoSSAM = "4c4340ea-7f9e-46b4-8512-61f1f6c41a11" | ||
|
||
[compat] | ||
Literate = "2.15.1" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
{ | ||
"Arterial": [0.44, 0.0, 0.5], | ||
"Astro": [0.52, 0.0, 0.58], | ||
"Axo-axonic": [0.26, 0.0, 0.63], | ||
"CA1": [0.0, 0.0, 0.71], | ||
"CA2-3": [0.0, 0.0, 0.87], | ||
"Choroid": [0.0, 0.37, 0.87], | ||
"DG": [0.0, 0.54, 0.87], | ||
"EXC": [0.0, 0.62, 0.81], | ||
"Endo": [0.0, 0.67, 0.66], | ||
"Ependymal": [0.0, 0.67, 0.55], | ||
"Habenula": [0.0, 0.62, 0.2], | ||
"INH": [0.0, 0.64, 0.0], | ||
"L2": [0.0, 0.75, 0.0], | ||
"L2/3": [0.0, 0.85, 0.0], | ||
"L3": [0.0, 0.95, 0.0], | ||
"L4/5": [0.29, 1.0, 0.0], | ||
"L5": [0.77, 0.99, 0.0], | ||
"L6": [0.93, 0.94, 0.0], | ||
"MSN": [0.98, 0.84, 0.0], | ||
"Microglia": [1.0, 0.71, 0.0], | ||
"Non-border": [1.0, 0.46, 0.0], | ||
"OPC": [1.0, 0.0, 0.0], | ||
"Oligo": [0.9, 0.0, 0.0], | ||
"Pericytes": [0.83, 0.0, 0.0], | ||
"Thal-Hind": [0.8, 0.24, 0.24], | ||
"VLMC": [0.8, 0.8, 0.8] | ||
} |
Oops, something went wrong.