Skip to content
This repository was archived by the owner on Apr 19, 2023. It is now read-only.

Add support for Seurat pipeline #339

Open
wants to merge 39 commits into
base: develop
Choose a base branch
from

Conversation

kverstae
Copy link

@kverstae kverstae commented May 3, 2021

This is a complete workflow to process single-cell 10x data from raw matrix to SCope-loom file in the Seurat (R) ecosystem.
It uses the same (or atleast very similar) structure as the scanpy single_sample workflow since the functionality is almost
identical.

Following steps are implemented:

  • Conversion of 10x cellranger output to Seurat Rds
  • Filtering of the data (cells + features)
  • Normalization (both the default Seurat implementation + SCTransform)
  • HVG detection + scaling
  • Dimensionality reduction
  • Clustering
  • DEG calculation
  • Reporting in Rmd (+ rendering of this Rmd to html)
  • Conversion of Seurat Rds to SCope-ready loom file

kverstae added 30 commits April 2, 2021 12:36
This is a very naive conversion that will only reliably work if there is
1 modality in the cellranger output. More checks need to be added to
  handle multi-modal data properly.
This pipeline goes from a fresh Seurat-object to a fully processed one.

Reporting and proper publishing of the files is not implemented yet, so
this is NOT production ready just yet...
We now also support normalization, scaling and HVG using the non-SCT
workflow of Seurat.
This allows to run pcacv in the workflow to determine the optimal number
of PCs to use in PCA, neighborhood graph, tSNE and UMAP
This is a very minimal and propably not very robust way of converting
Seurat Rds files to SCope compatible loom files. This conversion should
be made more robust so it can coop with non-standard Seurat objects.
Argparse has a dependency on python. When running in docker, this gave
issues since it couldn't find the python installation. This somehow
worked in singularity without any issues, starting from the same
docker image...
The filtering of cells might fail if no cells are found that match the
filtering criteria. One way this can happen, is when an invalid
mitochondrial-gene prefix is provided while also filtering on the
percent mitochondrial genes.
By adding a check for the prefix, we can provide the user with a more
meaninful error instead of the default Seurat error.
This workflow can merge multiple samples together WITHOUT performing any
batch correction
We now define parameters in the YAML header instead of setting them in
the middle of the script.

Reporting for dimensionality reduction mimics the scanpy output.
We might need these reports later if we want to concatenate them
together into 1 final report.
This container is now able to use python from R. This allows following
methods:
- dimensionality reducution using umap-learn
- clustering with leiden algorithm

There also were some missing R packages that can be used to calculate
DEG:
- DESeq2
- MAST
Conversion of Seurat Rds to loom would only work with RNA assays.
By changing the default values, we can accomodate for other assay types
such as SCT
SCT has been split out to its own profile/config to clean up the seurat
normalization config. Reports are now also generated from the SCT
pipeline, making the output identical to the default
normalization/scaling/HVG pipeline.
@KrisDavie
Copy link
Member

Thanks Kevin! I've approved the tests to run, and I'll get to the review asap.

kverstae and others added 3 commits May 3, 2021 16:54
The tiny dataset doesn't contain any mitochondrial genes, so we need to set the filter threshold to something < 0 so we can avoid the check for mito genes
The filter parameters had been renamed at some point in the normal
seurat pipeline, but not in the test config. This caused the test config
to fall back to the default value, which was not compatible with the
tiny dataset for testing.
@kverstae
Copy link
Author

kverstae commented May 5, 2021

I have been testing with the wrong dataset, which caused no issues on my end 😓
Issues should now be resolved and the test should pass now...

@KrisDavie KrisDavie requested review from cflerin and dweemx and removed request for cflerin May 5, 2021 13:19
Copy link
Member

@KrisDavie KrisDavie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small changes and questions.

I'd also still like to go through and test it properly which I haven't had a chance to do yet.

Comment on lines +278 to +280
# FIXME: what do we do when there are more than 1 variable feature column in the data
# e.g. after running SCT and FindVariableFeatures on the same dataset.
# This is highly unlikely, but we should still catch it here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make most sense to take the union of the two if this was the case. Either way it should be documented if a decision is made.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if taking the union as a good idea. Sometimes the HVG between SCT and FindVariableFeatures can be quite different. When you then take the union, you risk getting only 1000 genes. This can be on the lower side (depending on the complexity of the data).
I will just use the HVG from the current default assay instead of always defaulting to the RNA assay. In case SCT is used, the default assay in Seurat should automatically be set to SCT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried adding what I mentioned above, but ran into some issues with the Seurat version in the pca_cv container.
It needs a version >= 4.0.0. I don't want to mess too much with this container and risk breaking compatibility with other parts of the pipeline, so not sure how to proceed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kverstae I've built a new pcacv container which contains R version 4.0.0: vibsinglecellnf/pcacv:0.3.0. Sorry for the late response.

Copy link
Author

@kverstae kverstae Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dweemx As far as I can see it still contains an older version of Seurat (3.1.5), but I need something >= 4.0.0
Not everything of Seurat 4.0.x is backwards compatible with the older Seurat 3.x.x versions.
Sorry for the inconvenience :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kverstae I've updated vibsinglecellnf/pcacv:0.3.0 which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1)

kverstae and others added 2 commits May 6, 2021 09:25
Using SCT after mergin data is not adviced, since this can have a
negative effect on the batch effect.
Copy link
Contributor

@dweemx dweemx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, very nice work! and tested single_sample_seurat with success ✔️

A couple of things that would be nice to have:

  • Get pcacv to work with Seurat (container should now container Seurat v4)
  • CI is missing for multi_sample_seurat
  • Remove .gitkeep files in folder which are not empty

npcs = args$n_comps,
verbose = FALSE
)
} else if (tolower(args$method == "umap")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (tolower(args$method == "umap")) {
} else if (tolower(args$method) == "umap") {

seed.use = args$seed,
verbose = FALSE
)
} else if (tolower(args$method == "tsne")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (tolower(args$method == "tsne")) {
} else if (tolower(args$method) == "tsne") {

Comment on lines +34 to +36

merged <- merge(objects[[1]], y = objects[-1])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make sure here that the different objects to merge have the same feature space ? Not sure how the merge is handled (not clear from the examples/docs) when the feature space is not exactly the same (i.e.: inner merge, outer merge, ...)

Comment on lines +278 to +280
# FIXME: what do we do when there are more than 1 variable feature column in the data
# e.g. after running SCT and FindVariableFeatures on the same dataset.
# This is highly unlikely, but we should still catch it here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kverstae I've updated vibsinglecellnf/pcacv:0.3.0 which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants