This repository holds a Nextflow workflow to process samples as part of the Single-cell Pediatric Cancer Atlas (ScPCA) project.
Fastq files for single-cell and single-nuclei RNA-seq samples are processed using alevin-fry.
All samples are aligned, using selective alignment, to an index with transcripts corresponding to spliced cDNA and to intronic regions, denoted by alevin-fry as splici
.
scpca-nf
can also process CITE-seq, bulk RNA-seq, and spatial transcriptomics.
For more information on the processing of all modalities, please see the ScPCA Portal docs.
The scpca-nf
workflow is currently set up to process samples as part of the ScPCA portal and requires access to AWS through the Data Lab.
For all other users, scpca-nf
can be set up to process your samples in your computing environment by following the instructions for using scpca-nf
with external data.
To run scpca-nf
on your own samples, you will need to complete the following steps:
- Organize your files so that each folder contains fastq files relevant to a single sequencing run.
- Prepare a metadata file with one row per library containing all information needed to process your samples.
- Set up a configuration file, including the definition of a profile, dictating where nextflow should execute the workflow.
You may also test your configuration file using example data.
The instructions below assume that you are a member of the CCDL with access to AWS.
Most of the workflow settings described are configured for the ALSF Childhood Cancer Data Lab computational infrastructure.
To process samples that are not part of the ScPCA project, please see the instructions on using scpca-nf
with external data.
To process single-cell and single-nuclei samples using scpca-nf
you will need access to 24 GB of RAM and 12 CPUs, so we recommend using AWS batch.
The first step in running the workflow is ensuring that your AWS credentials are configured.
You can then run the workflow with the batch
profile, which has been named in the nextflow.config
file with full settings set up in the profile_awsbatch.config
file.
Note that you will also need to use the ccdl
profile to set file paths, and you can specify both profiles by separating them with a comma.
You can then run the workflow use the following command:
nextflow run AlexsLemonade/scpca-nf -profile ccdl,batch
When running the workflow for a project or group of samples that is ready to be released on ScPCA portal, please use the tag for the latest release:
nextflow run AlexsLemonade/scpca-nf -r v0.3.3 -profile ccdl,batch --project SCPCP000000
We provide an example of the expected outputs after running scpca-nf
available for external users.
If there have been major updates to the directory structure or the contents of the output, the example data should be re-processed such that the example output we provide mimics the current expected output from scpca-nf
.
The following command should be used to run the workflow and process the example data:
nextflow run AlexsLemonade/scpca-nf \
-profile ccdl,batch \
--run_ids run01 \
--run_metafile s3://scpca-references/example-data/example_metadata.tsv \
--outdir s3://scpca-references/example-data/scpca_out
After successful completion of the run, the scpca_out
folder containing the outputs from scpca-nf
should be zipped up and stored at the following location: s3://scpca-references/example-data/scpca_out.zip
.
Make sure to adjust the settings to make the zip file publicly accessible.