add example workflow

niklasmueboe · Jun 12, 2024 · 35b1341 · 35b1341
1 parent b95b25a
commit 35b1341
Show file tree

Hide file tree

Showing 6 changed files with 565 additions and 0 deletions.
diff --git a/docs/src/examples/ExampleAnalysis.jl b/docs/src/examples/ExampleAnalysis.jl
@@ -0,0 +1,191 @@
+#=
+# Example analysis
+
+This is an example workflow demonstrating all the different functionalities the 
+`StereoSSAM.jl` package offers.
+
+## General workflow
+
+We start by loading the package and defining the path of some example data.
+=#
+
+using StereoSSAM
+
+using ImageTransformations
+using ImageShow
+using Printf
+
+data_path = "./data";
+
+stereoseq_file = joinpath(data_path, "Mouse_brain_Adult_GEM_bin1.tsv.gz");
+signature_file = joinpath(data_path, "signatures.csv");
+color_file = joinpath(data_path, "celltype_colors.json");
+
+#=
+### Loading data
+
+We read the data from file using [`readstereoseq`](@ref). The input will 
+usually be a GEM file of a StereoSeq experiment. The output is a 
+`DimensionalData.DimArray` of gene counts stored as 
+[`SparseArrays.SparseMatrixCSC`](@extref). The input data stems from the
+original [Stereo-seq publication](https://doi.org/10.1016/j.cell.2022.04.003) and can
+be downloaded [here](https://db.cngb.org/stomics/mosta/download/).
+=#
+
+counts = readstereoseq(stereoseq_file);
+
+#=
+We also load pre-defined cell-type signatures
+=#
+
+using CSV
+using DataFrames
+using DimensionalData
+
+## read pre-determined cell-type signatures
+signatures = CSV.read(signature_file, DataFrame);
+celltypes = signatures.Celltype;
+select!(signatures, Not(:Celltype));
+
+## remove genes that are not detected in Stereo-seq experiment
+signatures = signatures[:, names(signatures) .∈ [dims(counts, 1)]];
+
+print(signatures[1:5, 1:8])
+
+#=
+We can have a look at the overall structure by calculating the [`totalrna`](@ref) 
+which is basically the sum across all genes per pixel.
+=#
+
+total_counts = totalrna(counts);
+simshow(imresize(total_counts; ratio=1 / 45))
+
+#=
+### Kernel
+
+Next, we define a [`gaussiankernel`](@ref) to use for smoothing and thus integrating 
+the gene expression locally across pixels. The relevant parameters are the abndwidth of 
+the kernel and the radius i.e. how large the kernel will be measured in bandwidths.
+For example, setting the bandwidth ``\sigma=8`` and the radius ``r=2`` will result in a 
+kernel of size ``2r*\sigma+1=33``.
+
+We can also convert the kernel to `Float32` to reduce memory usage later on.
+=#
+
+kernel = gaussiankernel(8, 2);
+kernel = convert.(Float32, kernel);
+
+#=
+We can evaluate the effects by e.g. applying the kernel to the totalRNA.
+=#
+
+total_rna = kde(Matrix(totalrna(counts)), kernel);
+simshow(imresize(total_rna; ratio=1 / 45))
+
+#=
+### Local Maxima
+
+The smoothed totalRNA is used to find local maxima of that can be used as cell proxies 
+to identify gene signatures or for further analysis.
+
+[`findlocalmaxima`](@ref) allows you to specify a minimum distance between to 
+neighboring local maxima.
+=#
+
+lm = findlocalmaxima(total_rna, 6);
+println("$(length(lm)) local maxima detected")
+
+#=
+Once we identified the local maxima we can load them for a defined set of genes. 
+[`getlocalmaxima`](@ref) allows you to load the maxima in a variety of output types.
+Have a look at the existing `StereoSSAM.jl` extensions. Here we are going to load them
+as `Muon.AnnData` object.
+=#
+
+using Muon
+
+ad = getlocalmaxima(AnnData, counts, lm, kernel; genes=names(signatures))
+
+#=
+### Cell-type map
+
+Once we have identified gene signatures from literature, previous experiments, 
+or by analysing local maxima we can assign a celltype to each pixel. This is usually the 
+most time-intensive processing step. [`assigncelltype`](@ref) returns 2 matrices;
+a map of assigned celltypes of each pixel as 
+`CategoricalArrays.CategoricalMatrix` and the cosine similarity of the 
+assigned cell type for each pixel as matrix.
+=#
+
+celltype_map, cosine_sim = assigncelltype(counts, signatures, kernel; celltypes=celltypes);
+
+#=
+In the final step we can visualize the cell-type map.
+=#
+
+using Colors
+using JSON
+
+## generate the lookup-table for cell-type colors
+lut = Dict(Iterators.map(((k, v),) -> k => RGB(v...), pairs(JSON.parsefile(color_file))));
+
+background_color = RGB(0, 0, 0);
+
+#-
+
+celltype_img = map(x -> get(lut, x, background_color), celltype_map);
+simshow(imresize(celltype_img; ratio=1 / 45))
+
+#=
+When visualizing the cell-type map, it might be beneficial to remove background noise.
+A histogram of the totalRNA can be helpful to define a threshold.
+=#
+
+using SparseArrays
+using Plots
+
+histogram(nonzeros(sparse(total_rna)); title="KDE of totalRNA", bins=200)
+
+#-
+
+celltype_img[total_rna .< 1.5] .= background_color;
+
+simshow(imresize(celltype_img; ratio=1 / 45))
+
+#=
+## Utils
+
+Some additional useful functions are included in `StereoSSAM.jl`.
+
+### Cropping & Masking
+
+[`crop!`](@ref) allows you to slice the counts across all genes in x-y dimension.
+=#
+crop!(counts, (4_001:7_000, 6_001:9_000));
+
+simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10))
+
+#=
+[`mask!`](@ref) allows you to filter the field of view by an arbitrary binary mask.
+=#
+
+## create a mask
+roi_mask = zeros(Bool, 3_000, 3_000)
+roi_mask[1:1_500, 1_501:3_000] .= true
+roi_mask[1_501:3_000, 1:1_500] .= true
+
+mask!(counts, roi_mask);
+
+simshow(imresize(kde(totalrna(counts), kernel); ratio=1 / 10))
+
+#=
+### Binning
+
+The same extensions that are available to load local maxima are also available for 
+reading the StereoSeq data into bins using [`readstereoseqbinned`](@ref). This can be 
+useful for e.g. identifying gene signatures from binned data, or detecting 
+highly variable genes that can be used for analysis. We again here demonstrate it using
+`Muon.AnnData`.
+=#
+
+ad = StereoSSAM.IO.readstereoseqbinned(AnnData, stereoseq_file, 50)
diff --git a/docs/src/examples/ExampleAnalysis.md b/docs/src/examples/ExampleAnalysis.md
diff --git a/docs/src/examples/Project.toml b/docs/src/examples/Project.toml
@@ -0,0 +1,19 @@
+[deps]
+CSV                  = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
+Colors               = "5ae59095-9a9b-59fe-a467-6f913c188581"
+DataFrames           = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
+DimensionalData      = "0703355e-b756-11e9-17c0-8b28908087d0"
+ImageIO              = "82e4d734-157c-48bb-816b-45c225c6df19"
+ImageMagick          = "6218d12a-5da1-5696-b52f-db25d2ecc6d1"
+ImageShow            = "4e3cecfd-b093-5904-9786-8bbb286a6a31"
+ImageTransformations = "02fcd773-0e25-5acc-982a-7f6622650795"
+JSON                 = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
+Literate             = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
+Muon                 = "446846d7-b4ce-489d-bf74-72da18fe3629"
+OpenCV               = "f878e3a2-a245-4720-8660-60795d644f2a"
+Plots                = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
+Printf               = "de0858da-6303-5e67-8744-51eddeeeb8d7"
+StereoSSAM           = "4c4340ea-7f9e-46b4-8512-61f1f6c41a11"
+
+[compat]
+Literate = "2.15.1"
diff --git a/docs/src/examples/data/celltype_colors.json b/docs/src/examples/data/celltype_colors.json
@@ -0,0 +1,28 @@
+{
+  "Arterial": [0.44, 0.0, 0.5],
+  "Astro": [0.52, 0.0, 0.58],
+  "Axo-axonic": [0.26, 0.0, 0.63],
+  "CA1": [0.0, 0.0, 0.71],
+  "CA2-3": [0.0, 0.0, 0.87],
+  "Choroid": [0.0, 0.37, 0.87],
+  "DG": [0.0, 0.54, 0.87],
+  "EXC": [0.0, 0.62, 0.81],
+  "Endo": [0.0, 0.67, 0.66],
+  "Ependymal": [0.0, 0.67, 0.55],
+  "Habenula": [0.0, 0.62, 0.2],
+  "INH": [0.0, 0.64, 0.0],
+  "L2": [0.0, 0.75, 0.0],
+  "L2/3": [0.0, 0.85, 0.0],
+  "L3": [0.0, 0.95, 0.0],
+  "L4/5": [0.29, 1.0, 0.0],
+  "L5": [0.77, 0.99, 0.0],
+  "L6": [0.93, 0.94, 0.0],
+  "MSN": [0.98, 0.84, 0.0],
+  "Microglia": [1.0, 0.71, 0.0],
+  "Non-border": [1.0, 0.46, 0.0],
+  "OPC": [1.0, 0.0, 0.0],
+  "Oligo": [0.9, 0.0, 0.0],
+  "Pericytes": [0.83, 0.0, 0.0],
+  "Thal-Hind": [0.8, 0.24, 0.24],
+  "VLMC": [0.8, 0.8, 0.8]
+}