Skip to content

Commit

Permalink
add example workflow
Browse files Browse the repository at this point in the history
  • Loading branch information
niklasmueboe committed Jun 12, 2024
1 parent b95b25a commit 35b1341
Show file tree
Hide file tree
Showing 6 changed files with 565 additions and 0 deletions.
191 changes: 191 additions & 0 deletions docs/src/examples/ExampleAnalysis.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
#=
# Example analysis
This is an example workflow demonstrating all the different functionalities the
`StereoSSAM.jl` package offers.
## General workflow
We start by loading the package and defining the path of some example data.
=#

using StereoSSAM

using ImageTransformations
using ImageShow
using Printf

data_path = "./data";

stereoseq_file = joinpath(data_path, "Mouse_brain_Adult_GEM_bin1.tsv.gz");
signature_file = joinpath(data_path, "signatures.csv");
color_file = joinpath(data_path, "celltype_colors.json");

#=
### Loading data
We read the data from file using [`readstereoseq`](@ref). The input will
usually be a GEM file of a StereoSeq experiment. The output is a
`DimensionalData.DimArray` of gene counts stored as
[`SparseArrays.SparseMatrixCSC`](@extref). The input data stems from the
original [Stereo-seq publication](https://doi.org/10.1016/j.cell.2022.04.003) and can
be downloaded [here](https://db.cngb.org/stomics/mosta/download/).
=#

counts = readstereoseq(stereoseq_file);

#=
We also load pre-defined cell-type signatures
=#

using CSV
using DataFrames
using DimensionalData

## read pre-determined cell-type signatures
signatures = CSV.read(signature_file, DataFrame);
celltypes = signatures.Celltype;
select!(signatures, Not(:Celltype));

## remove genes that are not detected in Stereo-seq experiment
signatures = signatures[:, names(signatures) .∈ [dims(counts, 1)]];

print(signatures[1:5, 1:8])

#=
We can have a look at the overall structure by calculating the [`totalrna`](@ref)
which is basically the sum across all genes per pixel.
=#

total_counts = totalrna(counts);
simshow(imresize(total_counts; ratio=1 / 45))

#=
### Kernel
Next, we define a [`gaussiankernel`](@ref) to use for smoothing and thus integrating
the gene expression locally across pixels. The relevant parameters are the abndwidth of
the kernel and the radius i.e. how large the kernel will be measured in bandwidths.
For example, setting the bandwidth ``\sigma=8`` and the radius ``r=2`` will result in a
kernel of size ``2r*\sigma+1=33``.
We can also convert the kernel to `Float32` to reduce memory usage later on.
=#

kernel = gaussiankernel(8, 2);
kernel = convert.(Float32, kernel);

#=
We can evaluate the effects by e.g. applying the kernel to the totalRNA.
=#

total_rna = kde(Matrix(totalrna(counts)), kernel);
simshow(imresize(total_rna; ratio=1 / 45))

#=
### Local Maxima
The smoothed totalRNA is used to find local maxima of that can be used as cell proxies
to identify gene signatures or for further analysis.
[`findlocalmaxima`](@ref) allows you to specify a minimum distance between to
neighboring local maxima.
=#

lm = findlocalmaxima(total_rna, 6);
println("$(length(lm)) local maxima detected")

#=
Once we identified the local maxima we can load them for a defined set of genes.
[`getlocalmaxima`](@ref) allows you to load the maxima in a variety of output types.
Have a look at the existing `StereoSSAM.jl` extensions. Here we are going to load them
as `Muon.AnnData` object.
=#

using Muon

ad = getlocalmaxima(AnnData, counts, lm, kernel; genes=names(signatures))

#=
### Cell-type map
Once we have identified gene signatures from literature, previous experiments,
or by analysing local maxima we can assign a celltype to each pixel. This is usually the
most time-intensive processing step. [`assigncelltype`](@ref) returns 2 matrices;
a map of assigned celltypes of each pixel as
`CategoricalArrays.CategoricalMatrix` and the cosine similarity of the
assigned cell type for each pixel as matrix.
=#

celltype_map, cosine_sim = assigncelltype(counts, signatures, kernel; celltypes=celltypes);

#=
In the final step we can visualize the cell-type map.
=#

using Colors
using JSON

## generate the lookup-table for cell-type colors
lut = Dict(Iterators.map(((k, v),) -> k => RGB(v...), pairs(JSON.parsefile(color_file))));

background_color = RGB(0, 0, 0);

#-

celltype_img = map(x -> get(lut, x, background_color), celltype_map);
simshow(imresize(celltype_img; ratio=1 / 45))

#=
When visualizing the cell-type map, it might be beneficial to remove background noise.
A histogram of the totalRNA can be helpful to define a threshold.
=#

using SparseArrays
using Plots

histogram(nonzeros(sparse(total_rna)); title="KDE of totalRNA", bins=200)

#-

celltype_img[total_rna .< 1.5] .= background_color;

simshow(imresize(celltype_img; ratio=1 / 45))

#=
## Utils
Some additional useful functions are included in `StereoSSAM.jl`.
### Cropping & Masking
[`crop!`](@ref) allows you to slice the counts across all genes in x-y dimension.
=#
crop!(counts, (4_001:7_000, 6_001:9_000));

simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10))

#=
[`mask!`](@ref) allows you to filter the field of view by an arbitrary binary mask.
=#

## create a mask
roi_mask = zeros(Bool, 3_000, 3_000)
roi_mask[1:1_500, 1_501:3_000] .= true
roi_mask[1_501:3_000, 1:1_500] .= true

mask!(counts, roi_mask);

simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10))

#=
### Binning
The same extensions that are available to load local maxima are also available for
reading the StereoSeq data into bins using [`readstereoseqbinned`](@ref). This can be
useful for e.g. identifying gene signatures from binned data, or detecting
highly variable genes that can be used for analysis. We again here demonstrate it using
`Muon.AnnData`.
=#

ad = StereoSSAM.IO.readstereoseqbinned(AnnData, stereoseq_file, 50)
295 changes: 295 additions & 0 deletions docs/src/examples/ExampleAnalysis.md

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions docs/src/examples/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[deps]
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0"
ImageIO = "82e4d734-157c-48bb-816b-45c225c6df19"
ImageMagick = "6218d12a-5da1-5696-b52f-db25d2ecc6d1"
ImageShow = "4e3cecfd-b093-5904-9786-8bbb286a6a31"
ImageTransformations = "02fcd773-0e25-5acc-982a-7f6622650795"
JSON = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Muon = "446846d7-b4ce-489d-bf74-72da18fe3629"
OpenCV = "f878e3a2-a245-4720-8660-60795d644f2a"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
StereoSSAM = "4c4340ea-7f9e-46b4-8512-61f1f6c41a11"

[compat]
Literate = "2.15.1"
28 changes: 28 additions & 0 deletions docs/src/examples/data/celltype_colors.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"Arterial": [0.44, 0.0, 0.5],
"Astro": [0.52, 0.0, 0.58],
"Axo-axonic": [0.26, 0.0, 0.63],
"CA1": [0.0, 0.0, 0.71],
"CA2-3": [0.0, 0.0, 0.87],
"Choroid": [0.0, 0.37, 0.87],
"DG": [0.0, 0.54, 0.87],
"EXC": [0.0, 0.62, 0.81],
"Endo": [0.0, 0.67, 0.66],
"Ependymal": [0.0, 0.67, 0.55],
"Habenula": [0.0, 0.62, 0.2],
"INH": [0.0, 0.64, 0.0],
"L2": [0.0, 0.75, 0.0],
"L2/3": [0.0, 0.85, 0.0],
"L3": [0.0, 0.95, 0.0],
"L4/5": [0.29, 1.0, 0.0],
"L5": [0.77, 0.99, 0.0],
"L6": [0.93, 0.94, 0.0],
"MSN": [0.98, 0.84, 0.0],
"Microglia": [1.0, 0.71, 0.0],
"Non-border": [1.0, 0.46, 0.0],
"OPC": [1.0, 0.0, 0.0],
"Oligo": [0.9, 0.0, 0.0],
"Pericytes": [0.83, 0.0, 0.0],
"Thal-Hind": [0.8, 0.24, 0.24],
"VLMC": [0.8, 0.8, 0.8]
}
Loading

0 comments on commit 35b1341

Please sign in to comment.