Add support for Seurat pipeline #339

kverstae · 2021-05-03T13:45:36Z

This is a complete workflow to process single-cell 10x data from raw matrix to SCope-loom file in the Seurat (R) ecosystem.
It uses the same (or atleast very similar) structure as the scanpy single_sample workflow since the functionality is almost
identical.

Following steps are implemented:

Conversion of 10x cellranger output to Seurat Rds
Filtering of the data (cells + features)
Normalization (both the default Seurat implementation + SCTransform)
HVG detection + scaling
Dimensionality reduction
Clustering
DEG calculation
Reporting in Rmd (+ rendering of this Rmd to html)
Conversion of Seurat Rds to SCope-ready loom file

This is a very naive conversion that will only reliably work if there is 1 modality in the cellranger output. More checks need to be added to handle multi-modal data properly.

This pipeline goes from a fresh Seurat-object to a fully processed one. Reporting and proper publishing of the files is not implemented yet, so this is NOT production ready just yet...

We now also support normalization, scaling and HVG using the non-SCT workflow of Seurat.

This allows to run pcacv in the workflow to determine the optimal number of PCs to use in PCA, neighborhood graph, tSNE and UMAP

This is a very minimal and propably not very robust way of converting Seurat Rds files to SCope compatible loom files. This conversion should be made more robust so it can coop with non-standard Seurat objects.

Argparse has a dependency on python. When running in docker, this gave issues since it couldn't find the python installation. This somehow worked in singularity without any issues, starting from the same docker image...

The filtering of cells might fail if no cells are found that match the filtering criteria. One way this can happen, is when an invalid mitochondrial-gene prefix is provided while also filtering on the percent mitochondrial genes. By adding a check for the prefix, we can provide the user with a more meaninful error instead of the default Seurat error.

This workflow can merge multiple samples together WITHOUT performing any batch correction

We now define parameters in the YAML header instead of setting them in the middle of the script. Reporting for dimensionality reduction mimics the scanpy output.

We might need these reports later if we want to concatenate them together into 1 final report.

This container is now able to use python from R. This allows following methods: - dimensionality reducution using umap-learn - clustering with leiden algorithm There also were some missing R packages that can be used to calculate DEG: - DESeq2 - MAST

Conversion of Seurat Rds to loom would only work with RNA assays. By changing the default values, we can accomodate for other assay types such as SCT

SCT has been split out to its own profile/config to clean up the seurat normalization config. Reports are now also generated from the SCT pipeline, making the output identical to the default normalization/scaling/HVG pipeline.

KrisDavie · 2021-05-03T14:05:43Z

Thanks Kevin! I've approved the tests to run, and I'll get to the review asap.

The tiny dataset doesn't contain any mitochondrial genes, so we need to set the filter threshold to something < 0 so we can avoid the check for mito genes

The filter parameters had been renamed at some point in the normal seurat pipeline, but not in the test config. This caused the test config to fall back to the default value, which was not compatible with the tiny dataset for testing.

kverstae · 2021-05-05T08:13:02Z

I have been testing with the wrong dataset, which caused no issues on my end 😓
Issues should now be resolved and the test should pass now...

KrisDavie

Just a couple of small changes and questions.

I'd also still like to go through and test it properly which I haven't had a chance to do yet.

KrisDavie · 2021-05-05T13:51:53Z

src/pcacv/bin/run_pca_cv.R

+		# FIXME: what do we do when there are more than 1 variable feature column in the data 
+		# e.g. after running SCT and FindVariableFeatures on the same dataset. 
+		# This is highly unlikely, but we should still catch it here.


I think it would make most sense to take the union of the two if this was the case. Either way it should be documented if a decision is made.

I don't know if taking the union as a good idea. Sometimes the HVG between SCT and FindVariableFeatures can be quite different. When you then take the union, you risk getting only 1000 genes. This can be on the lower side (depending on the complexity of the data).
I will just use the HVG from the current default assay instead of always defaulting to the RNA assay. In case SCT is used, the default assay in Seurat should automatically be set to SCT.

I tried adding what I mentioned above, but ran into some issues with the Seurat version in the pca_cv container.
It needs a version >= 4.0.0. I don't want to mess too much with this container and risk breaking compatibility with other parts of the pipeline, so not sure how to proceed

@kverstae I've built a new pcacv container which contains R version 4.0.0: vibsinglecellnf/pcacv:0.3.0. Sorry for the late response.

@dweemx As far as I can see it still contains an older version of Seurat (3.1.5), but I need something >= 4.0.0
Not everything of Seurat 4.0.x is backwards compatible with the older Seurat 3.x.x versions.
Sorry for the inconvenience :(

@kverstae I've updated vibsinglecellnf/pcacv:0.3.0 which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1)

src/seurat/bin/filter/sc_cell_gene_filtering.R

src/seurat/bin/merge/sc_merge.R

src/seurat/bin/utils/sc_marker_genes_to_xlsx.R

src/seurat/workflows/hvg_selection.nf

src/seurat/workflows/multi_sample.nf

Co-authored-by: Kris Davie <[email protected]>

Using SCT after mergin data is not adviced, since this can have a negative effect on the batch effect.

dweemx

In general, very nice work! and tested single_sample_seurat with success ✔️

A couple of things that would be nice to have:

Get pcacv to work with Seurat (container should now container Seurat v4)
CI is missing for multi_sample_seurat
Remove .gitkeep files in folder which are not empty

dweemx · 2021-06-01T14:32:31Z

src/seurat/bin/dim_reduction/sc_dim_reduction.R

+        npcs = args$n_comps,
+        verbose = FALSE
+    )
+} else if (tolower(args$method == "umap")) {


Suggested change

} else if (tolower(args$method == "umap")) {

} else if (tolower(args$method) == "umap") {

dweemx · 2021-06-01T14:33:27Z

src/seurat/bin/dim_reduction/sc_dim_reduction.R

+        seed.use = args$seed,
+        verbose = FALSE
+    )
+} else if (tolower(args$method == "tsne")) {


Suggested change

} else if (tolower(args$method == "tsne")) {

} else if (tolower(args$method) == "tsne") {

src/seurat/bin/filter/sc_cell_gene_filtering.R

dweemx · 2021-06-01T14:59:11Z

src/seurat/bin/merge/sc_merge.R

+
+merged <- merge(objects[[1]], y = objects[-1])
+


Should we make sure here that the different objects to merge have the same feature space ? Not sure how the merge is handled (not clear from the examples/docs) when the feature space is not exactly the same (i.e.: inner merge, outer merge, ...)

dweemx · 2021-07-07T23:25:17Z

src/pcacv/bin/run_pca_cv.R

+		# FIXME: what do we do when there are more than 1 variable feature column in the data 
+		# e.g. after running SCT and FindVariableFeatures on the same dataset. 
+		# This is highly unlikely, but we should still catch it here.


@kverstae I've updated vibsinglecellnf/pcacv:0.3.0 which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1)

kverstae added 30 commits April 2, 2021 12:36

Seurat: initialize module

6dc0d30

Utils: very basic converter for CR -> Seurat

8f5face

This is a very naive conversion that will only reliably work if there is 1 modality in the cellranger output. More checks need to be added to handle multi-modal data properly.

Utils: WIP coverter for seurat Rds -> SCope loom

c30a7c3

Seurat: first basic implementation of a pipeline

0445ce7

This pipeline goes from a fresh Seurat-object to a fully processed one. Reporting and proper publishing of the files is not implemented yet, so this is NOT production ready just yet...

Add 'single_sample_seurat' profile + main workflow

a5eb905

Seurat: implement 'normal' normalize/scale/hvg

02f8d33

We now also support normalization, scaling and HVG using the non-SCT workflow of Seurat.

pcacv: fix support for Seurat objects

29519c4

Seurat: add support for dynamic # components/PCs

e0ba7ff

This allows to run pcacv in the workflow to determine the optimal number of PCs to use in PCA, neighborhood graph, tSNE and UMAP

Utils: working seurat rds to SCope loom coverter

69844f2

This is a very minimal and propably not very robust way of converting Seurat Rds files to SCope compatible loom files. This conversion should be made more robust so it can coop with non-standard Seurat objects.

Seurat: add conversion step to SCope loom file

a843a83

Seurat: bugfix: can't replaceAll on null

11c58e8

Seurat: make filter config and process compatible

69d9229

Seurat: replace 'argparse' with 'optparse'

d774782

Argparse has a dependency on python. When running in docker, this gave issues since it couldn't find the python installation. This somehow worked in singularity without any issues, starting from the same docker image...

Add GH actions workflow for single_sample_seurat

4577d81

Seurat: start of documentation

5b2753c

Add single_sample_seurat to README

3f52d07

Seurat: fix inconsistancy in config names

a864a7e

Utils: add sample_id as project in Seurat object

6b43ddd

Seurat: implement multi_sample workflow

55c724d

This workflow can merge multiple samples together WITHOUT performing any batch correction

Register multi_sample_seurat profile and workflow

1c96769

Seurat: add converter for marker genes Rds to xlsx

50e4189

Seurat: correct publishing of results

a5f671b

Seurat: WIP reporting in Rmd format

a3d5273

Seurat: simplify report generation

72ccb3d

Seurat: cleanup Rmarkdowns + add reporting dimred

0d1a65e

We now define parameters in the YAML header instead of setting them in the middle of the script. Reporting for dimensionality reduction mimics the scanpy output.

Seurat: add 'rmarkdown` to container

b0ba521

Seurat: remove unused nextflow config

91bbe1a

Seurat: add report for HVG

7c4f4c7

Seurat: emit reports where possible

cf88de0

We might need these reports later if we want to concatenate them together into 1 final report.

kverstae added 4 commits April 14, 2021 15:49

Util: fix conversion Seurat Rds -> loom

23c8d7c

Conversion of Seurat Rds to loom would only work with RNA assays. By changing the default values, we can accomodate for other assay types such as SCT

Seurat: make SCT behave identical to normal flow

0a9b819

SCT has been split out to its own profile/config to clean up the seurat normalization config. Reports are now also generated from the SCT pipeline, making the output identical to the default normalization/scaling/HVG pipeline.

Seurat: add very basic filtering report

6282e3f

kverstae and others added 3 commits May 3, 2021 16:54

Seurat: fix test missing MT-genes

7cce1e9

The tiny dataset doesn't contain any mitochondrial genes, so we need to set the filter threshold to something < 0 so we can avoid the check for mito genes

Seurat: fix filtering report without mito genes

31f04c9

KrisDavie requested review from cflerin and dweemx and removed request for cflerin May 5, 2021 13:19

KrisDavie suggested changes May 5, 2021

View reviewed changes

kverstae and others added 2 commits May 6, 2021 09:25

Seurat: typo in error message merge

2191657

Co-authored-by: Kris Davie <[email protected]>

Seurat: show incompatibility message SCT merge

9d3b014

Using SCT after mergin data is not adviced, since this can have a negative effect on the batch effect.

dweemx suggested changes Jul 9, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Seurat pipeline #339

Add support for Seurat pipeline #339

kverstae commented May 3, 2021

KrisDavie commented May 3, 2021

kverstae commented May 5, 2021

KrisDavie left a comment

KrisDavie May 5, 2021

kverstae May 6, 2021

kverstae May 6, 2021

dweemx Jun 1, 2021

kverstae Jun 7, 2021 •

edited

Loading

dweemx Jul 7, 2021

dweemx left a comment

dweemx Jun 1, 2021

dweemx Jun 1, 2021

dweemx Jun 1, 2021

dweemx Jul 7, 2021

	} else if (tolower(args$method == "umap")) {
	} else if (tolower(args$method) == "umap") {

	} else if (tolower(args$method == "tsne")) {
	} else if (tolower(args$method) == "tsne") {

Add support for Seurat pipeline #339

Are you sure you want to change the base?

Add support for Seurat pipeline #339

Conversation

kverstae commented May 3, 2021

KrisDavie commented May 3, 2021

kverstae commented May 5, 2021

KrisDavie left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kverstae Jun 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dweemx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kverstae Jun 7, 2021 •

edited

Loading