-
Notifications
You must be signed in to change notification settings - Fork 34
Add support for Seurat pipeline #339
base: develop
Are you sure you want to change the base?
Add support for Seurat pipeline #339
Conversation
This is a very naive conversion that will only reliably work if there is 1 modality in the cellranger output. More checks need to be added to handle multi-modal data properly.
This pipeline goes from a fresh Seurat-object to a fully processed one. Reporting and proper publishing of the files is not implemented yet, so this is NOT production ready just yet...
We now also support normalization, scaling and HVG using the non-SCT workflow of Seurat.
This allows to run pcacv in the workflow to determine the optimal number of PCs to use in PCA, neighborhood graph, tSNE and UMAP
This is a very minimal and propably not very robust way of converting Seurat Rds files to SCope compatible loom files. This conversion should be made more robust so it can coop with non-standard Seurat objects.
Argparse has a dependency on python. When running in docker, this gave issues since it couldn't find the python installation. This somehow worked in singularity without any issues, starting from the same docker image...
The filtering of cells might fail if no cells are found that match the filtering criteria. One way this can happen, is when an invalid mitochondrial-gene prefix is provided while also filtering on the percent mitochondrial genes. By adding a check for the prefix, we can provide the user with a more meaninful error instead of the default Seurat error.
This workflow can merge multiple samples together WITHOUT performing any batch correction
We now define parameters in the YAML header instead of setting them in the middle of the script. Reporting for dimensionality reduction mimics the scanpy output.
We might need these reports later if we want to concatenate them together into 1 final report.
This container is now able to use python from R. This allows following methods: - dimensionality reducution using umap-learn - clustering with leiden algorithm There also were some missing R packages that can be used to calculate DEG: - DESeq2 - MAST
Conversion of Seurat Rds to loom would only work with RNA assays. By changing the default values, we can accomodate for other assay types such as SCT
SCT has been split out to its own profile/config to clean up the seurat normalization config. Reports are now also generated from the SCT pipeline, making the output identical to the default normalization/scaling/HVG pipeline.
Thanks Kevin! I've approved the tests to run, and I'll get to the review asap. |
The tiny dataset doesn't contain any mitochondrial genes, so we need to set the filter threshold to something < 0 so we can avoid the check for mito genes
The filter parameters had been renamed at some point in the normal seurat pipeline, but not in the test config. This caused the test config to fall back to the default value, which was not compatible with the tiny dataset for testing.
I have been testing with the wrong dataset, which caused no issues on my end 😓 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small changes and questions.
I'd also still like to go through and test it properly which I haven't had a chance to do yet.
# FIXME: what do we do when there are more than 1 variable feature column in the data | ||
# e.g. after running SCT and FindVariableFeatures on the same dataset. | ||
# This is highly unlikely, but we should still catch it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make most sense to take the union of the two if this was the case. Either way it should be documented if a decision is made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if taking the union as a good idea. Sometimes the HVG between SCT and FindVariableFeatures can be quite different. When you then take the union, you risk getting only 1000 genes. This can be on the lower side (depending on the complexity of the data).
I will just use the HVG from the current default assay instead of always defaulting to the RNA assay. In case SCT is used, the default assay in Seurat should automatically be set to SCT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding what I mentioned above, but ran into some issues with the Seurat version in the pca_cv container.
It needs a version >= 4.0.0. I don't want to mess too much with this container and risk breaking compatibility with other parts of the pipeline, so not sure how to proceed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kverstae I've built a new pcacv
container which contains R version 4.0.0: vibsinglecellnf/pcacv:0.3.0
. Sorry for the late response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dweemx As far as I can see it still contains an older version of Seurat (3.1.5), but I need something >= 4.0.0
Not everything of Seurat 4.0.x is backwards compatible with the older Seurat 3.x.x versions.
Sorry for the inconvenience :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kverstae I've updated vibsinglecellnf/pcacv:0.3.0
which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1
)
Co-authored-by: Kris Davie <[email protected]>
Using SCT after mergin data is not adviced, since this can have a negative effect on the batch effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, very nice work! and tested single_sample_seurat
with success ✔️
A couple of things that would be nice to have:
- Get
pcacv
to work with Seurat (container should now container Seurat v4) - CI is missing for
multi_sample_seurat
- Remove .gitkeep files in folder which are not empty
npcs = args$n_comps, | ||
verbose = FALSE | ||
) | ||
} else if (tolower(args$method == "umap")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (tolower(args$method == "umap")) { | |
} else if (tolower(args$method) == "umap") { |
seed.use = args$seed, | ||
verbose = FALSE | ||
) | ||
} else if (tolower(args$method == "tsne")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (tolower(args$method == "tsne")) { | |
} else if (tolower(args$method) == "tsne") { |
|
||
merged <- merge(objects[[1]], y = objects[-1]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make sure here that the different objects to merge have the same feature space ? Not sure how the merge
is handled (not clear from the examples/docs) when the feature space is not exactly the same (i.e.: inner merge, outer merge, ...)
# FIXME: what do we do when there are more than 1 variable feature column in the data | ||
# e.g. after running SCT and FindVariableFeatures on the same dataset. | ||
# This is highly unlikely, but we should still catch it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kverstae I've updated vibsinglecellnf/pcacv:0.3.0
which is based on your Docker image of Seurat (i.e.: kverstae/r-seurat:4.0.1
)
This is a complete workflow to process single-cell 10x data from raw matrix to SCope-loom file in the Seurat (R) ecosystem.
It uses the same (or atleast very similar) structure as the scanpy
single_sample
workflow since the functionality is almostidentical.
Following steps are implemented: