The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
[3.1] - 2021-05-13
- Samplesheet format has changed from
group,replicate,fastq_1,fastq_2,strandedness
tosample,fastq_1,fastq_2,strandedness
- This gives users the flexibility to name their samples however they wish (see #550)
- PCA generated by DESeq2 will now be monochrome and will not be grouped by using the replicate id
- Updated Nextflow version to
v21.04.0
(see nextflow#572) - Restructure pipeline scripts into
modules/
,subworkflows/
andworkflows/
directories
- Updated pipeline template to nf-core/tools
1.14
- Initial implementation of a standardised samplesheet JSON schema to use with user interfaces and for validation
- Only FastQ files that require to be concatenated will be passed to
CAT_FASTQ
process - [#449] -
--genomeSAindexNbases
will now be auto-calculated before building STAR indices - [#460] - Auto-detect and bypass featureCounts execution if biotype doesn't exist in GTF
- [#544] - Update test-dataset for pipeline
- [#553] - Make tximport output files using all the samples; identified by @j-andrews7
- [#561] - Add gene symbols to merged output; identified by @grst
- [#563] - samplesheet.csv merge error
- [#567] - Update docs to mention trimgalore core usage nuances
- [#568] -
--star_index
argument is ignored with--aligner star_rsem
option - [#569] - nextflow edge release documentation for running 3.0
- [#575] - Remove duplicated salmon output files
- [#576] - umi_tools dedup : Run before salmon to dedup counts
- [#582] - Generate a separate bigwig tracks for each strand
- [#583] - Samtools error during run requires use of BAM CSI index
- [#585] - Clarify salmon uncertainty for some transcripts
- [#604] - Additional fasta with GENCODE annotation results in biotype error
- [#610] - save R objects as RDS
- [#619] - implicit declaration of the workflow in main
- [#629] - Add and fix EditorConfig linting in entire pipeline
- [nf-core/modules#423] - Replace
publish_by_id
module option topublish_by_meta
- [nextflow#2060] - Pipeline execution hang when native task fail to be submitted
Old parameter | New parameter |
---|---|
--hisat_build_memory |
--hisat2_build_memory |
--gtf_count_type |
--featurecounts_feature_type |
--gtf_group_features_type |
--featurecounts_group_type |
--bam_csi_index |
|
--schema_ignore_params |
|
--show_hidden_params |
|
--validate_params |
|
--clusterOptions |
NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if parameter information isn't present.
Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
Dependency | Old version | New version |
---|---|---|
bedtools |
2.29.2 | 2.30.0 |
multiqc |
1.9 | 1.10.1 |
preseq |
2.0.3 | 3.1.2 |
NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if version information isn't present.
[3.0] - 2020-12-15
- You will need to install Nextflow
>=20.11.0-edge
to run the pipeline. If you are using Singularity, then features introduced in that release now enable the pipeline to directly download Singularity images hosted by Biocontainers as opposed to performing a conversion from Docker images (see #496). - The previous default of aligning BAM files using STAR and quantifying using featureCounts (
--aligner star
) has been removed. The new default is to align with STAR and quantify using Salmon (--aligner star_salmon
).- This decision was made primarily because of the limitations of featureCounts to appropriately quantify gene expression data. Please see Zhao et al., 2015 and Soneson et al., 2015).
- For similar reasons, quantification will not be performed if using
--aligner hisat2
due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments.- This pipeline option is still available for those who have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2. No gene-level quantification results will be generated.
- In a future release we hope to add back quantitation for HISAT2 using different tools.
- Updated pipeline template to nf-core/tools
1.12.1
- Bumped Nextflow version
20.07.1
->20.11.0-edge
- Added UCSC
bedClip
module to restrict bedGraph file coordinates to chromosome boundaries - Check if Bioconda and conda-forge channels are set-up correctly when running with
-profile conda
- Use
rsem-prepare-reference
and notgffread
to create transcriptome fasta file - [#494] - Issue running rnaseq v2.0 (DSL2) with test profile
- [#496] - Direct download of Singularity images via HTTPS
- [#498] - Significantly different versions of STAR in star_rsem (2.7.6a) and star (2.6.1d)
- [#499] - Use of salmon counts for DESeq2
- [#500, #509] - Error with AWS batch params
- [#511] - rsem/star index fails with large genome
- [#515] - Add decoy-aware indexing for salmon
- [#516] - Unexpected error [InvocationTargetException]
- [#525] - sra_ids_to_runinfo.py UnicodeEncodeError
- [#550] - handle samplesheets with replicate=0
Old parameter | New parameter |
---|---|
--fc_extra_attributes |
--gtf_extra_attributes |
--fc_group_features |
--gtf_group_features |
--fc_count_type |
--gtf_count_type |
--fc_group_features_type |
--gtf_group_features_type |
--singularity_pull_docker_container |
|
--skip_featurecounts |
NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if parameter information isn't present.
Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
Dependency | Old version | New version |
---|---|---|
bioconductor-summarizedexperiment |
1.18.1 | 1.20.0 |
bioconductor-tximeta |
1.6.3 | 1.8.0 |
picard |
2.23.8 | 2.23.9 |
requests |
2.24.0 | |
salmon |
1.3.0 | 1.4.0 |
ucsc-bedclip |
377 | |
umi_tools |
1.0.1 | 1.1.1 |
NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if version information isn't present.
[2.0] - 2020-11-12
- Pipeline has been re-implemented in Nextflow DSL2
- All software containers are now exclusively obtained from Biocontainers
- Added a separate workflow to download FastQ files via SRA, ENA or GEO ids and to auto-create the input samplesheet (
ENA FTP
; see--public_data_ids
parameter) - Added and refined a Groovy
lib/
of functions that include the automatic rendering of parameters defined in the JSON schema for the help and summary log information - Replace edgeR with DESeq2 for the generation of PCA and heatmaps (also included in the MultiQC report)
- Creation of bigWig coverage files using BEDTools and bedGraphToBigWig
- [#70] - Added new genome mapping and quantification route with RSEM via the
--aligner star_rsem
parameter - [#72] - Samples skipped due to low alignment reported in the MultiQC report
- [#73, #435] - UMI barcode support
- [#91] - Ability to concatenate multiple runs of the same samples via the input samplesheet
- [#123] - The primary input for the pipeline has changed from
--reads
glob to samplesheet--input
. See usage docs. - [#197] - Samples failing strand-specificity checks reported in the MultiQC report
- [#227] - Removal of ribosomal RNA via SortMeRNA
- [#419] - Add
--additional_fasta
parameter to provide ERCC spike-ins, transgenes such as GFP or CAR-T as additional sequences to align to
- Updated pipeline template to nf-core/tools
1.11
- Optimise MultiQC configuration for faster run-time on huge sample numbers
- Add information about SILVA licensing when removing rRNA to
usage.md
- Fixed ansi colours for pipeline summary, added summary logs of alignment results
- [#281] - Add nag to cite the pipeline in summary
- [#302] - Fixed MDS plot axis labels
- [#338] - Add option for turning on/off STAR command line option (--sjdbGTFfile)
- [#344] - Added multi-core TrimGalore support
- [#351] - Fixes missing Qualimap parameter
-p
- [#353] - Fixes an issue where MultiQC fails to run with
--skip_biotype_qc
option - [#357] - Fixes broken links
- [#362] - Fix error with gzipped annotation file
- [#384] - Changed SortMeRNA reference dbs path to use stable URLs (v4.2.0)
- [#396] - Deterministic mapping for STAR aligner
- [#412] - Fix Qualimap not being passed on correct strand-specificity parameter
- [#413] - Fix STAR unmapped reads not output
- [#434] - Fix typo reported for work-dir
- [#437] - FastQC uses correct number of threads now
- [#440] - Fixed issue where featureCounts process fails when setting
--fc_count_type
to gene - [#452] - Fix
--gff
input bug - [#345] - Fixes label name in FastQC process
- [#391] - Make publishDir mode configurable
- [#431] - Update AWS GitHub actions workflow with organization level secrets
- [#435] - Fix a bug where gzipped references were not extracted when
--additional_fasta
was not specified - [#435] - Fix a bug where merging of RSEM output would fail if only one fastq provided as input
- [#435] - Correct RSEM output name (was saving counts but calling them TPMs; now saving both properly labelled)
- [#436] - Fix a bug where the RSEM reference could not be built
- [#458] - Fix
TMP_DIR
for process MarkDuplicates and Qualimap
Old parameter | New parameter |
---|---|
--reads |
--input |
--igenomesIgnore |
--igenomes_ignore |
--removeRiboRNA |
--remove_ribo_rna |
--rRNA_database_manifest |
--ribo_database_manifest |
--save_nonrRNA_reads |
--save_non_ribo_reads |
--saveAlignedIntermediates |
--save_align_intermeds |
--saveReference |
--save_reference |
--saveTrimmed |
--save_trimmed |
--saveUnaligned |
--save_unaligned |
--skipAlignment |
--skip_alignment |
--skipBiotypeQC |
--skip_biotype_qc |
--skipDupRadar |
--skip_dupradar |
--skipFastQC |
--skip_fastqc |
--skipMultiQC |
--skip_multiqc |
--skipPreseq |
--skip_preseq |
--skipQC |
--skip_qc |
--skipQualimap |
--skip_qualimap |
--skipRseQC |
--skip_rseqc |
--skipTrimming |
--skip_trimming |
--stringTieIgnoreGTF |
--stringtie_ignore_gtf |
--additional_fasta
- FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences--deseq2_vst
- Use vst transformation instead of rlog with DESeq2--enable_conda
- Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter--min_mapped_reads
- Minimum percentage of uniquely mapped reads below which samples are removed from further processing--multiqc_title
- MultiQC report title. Printed as page header, used for filename if not otherwise specified--public_data_ids
- File containing SRA/ENA/GEO identifiers one per line in order to download their associated FastQ files--publish_dir_mode
- Method used to save pipeline results to output directory--rsem_index
- Path to directory or tar.gz archive for pre-built RSEM index--rseqc_modules
- Specify the RSeQC modules to run--save_merged_fastq
- Save FastQ files after merging re-sequenced libraries in the results directory--save_umi_intermeds
- If this option is specified, intermediate FastQ and BAM files produced by UMI-tools are also saved in the results directory--skip_bigwig
- Skip bigWig file creation--skip_deseq2_qc
- Skip DESeq2 PCA and heatmap plotting--skip_featurecounts
- Skip featureCounts--skip_markduplicates
- Skip picard MarkDuplicates step--skip_sra_fastq_download
- Only download metadata for public data database ids and don't download the FastQ files--skip_stringtie
- Skip StringTie--star_ignore_sjdbgtf
- See #338--umitools_bc_pattern
- The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI--umitools_extract_method
- UMI pattern to use. Can be either 'string' (default) or 'regex'--with_umi
- Enable UMI-based read deduplication
--awsqueue
can now be provided via nf-core/configs if using AWS--awsregion
can now be provided via nf-core/configs if using AWS--compressedReference
now auto-detected--markdup_java_options
in favour of updating centrally on nf-core/modules--project
parameter from old NGI template--readPaths
is not required since these are provided from the input samplesheet--sampleLevel
not required--singleEnd
is now auto-detected from the input samplesheet--skipEdgeR
qc not performed by DESeq2 instead--star_memory
in favour of updating centrally on nf-core/modules if required- Strandedness is now specified at the sample-level via the input samplesheet
--forwardStranded
--reverseStranded
--unStranded
--pico
Note, since the pipeline is now using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference.
Dependency | Old version | New version |
---|---|---|
bioconductor-dupradar |
1.14.0 | 1.18.0 |
bioconductor-summarizedexperiment |
1.14.0 | 1.18.1 |
bioconductor-tximeta |
1.2.2 | 1.6.3 |
fastqc |
0.11.8 | 0.11.9 |
gffread |
0.11.4 | 0.12.1 |
hisat2 |
2.1.0 | 2.2.0 |
multiqc |
1.7 | 1.9 |
picard |
2.21.1 | 2.23.8 |
qualimap |
2.2.2c | 2.2.2d |
r-base |
3.6.1 | 4.0.3 |
salmon |
0.14.2 | 1.3.0 |
samtools |
1.9 | 1.10 |
sortmerna |
2.1b | 4.2.0 |
stringtie |
2.0 | 2.1.4 |
subread |
1.6.4 | 2.0.1 |
trim-galore |
0.6.4 | 0.6.6 |
bedtools |
- | 2.29.2 |
bioconductor-biocparallel |
- | 1.22.0 |
bioconductor-complexheatmap |
- | 2.4.2 |
bioconductor-deseq2 |
- | 1.28.0 |
bioconductor-tximport |
- | 1.16.0 |
perl |
- | 5.26.2 |
python |
- | 3.8.3 |
r-ggplot2 |
- | 3.3.2 |
r-optparse |
- | 1.6.6 |
r-pheatmap |
- | 1.0.12 |
r-rcolorbrewer |
- | 1.1_2 |
rsem |
- | 1.3.3 |
ucsc-bedgraphtobigwig |
- | 377 |
umi_tools |
- | 1.0.1 |
bioconductor-edger |
- | - |
deeptools |
- | - |
matplotlib |
- | - |
r-data.table |
- | - |
r-gplots |
- | - |
r-markdown |
- | - |
NB: Dependency has been updated if both old and new version information is present. NB: Dependency has been added if just the new version information is present. NB: Dependency has been removed if version information isn't present.
[1.4.2] - 2019-10-18
- Minor version release for keeping Git History in sync
- No changes with respect to 1.4.1 on pipeline level
[1.4.1] - 2019-10-17
Major novel changes include:
- Update
igenomes.config
with NCBIGRCh38
and most recent UCSC genomes - Set
autoMounts = true
by default forsingularity
profile
[1.4] - 2019-10-15
Major novel changes include:
- Support for Salmon as an alternative method to STAR and HISAT2
- Several improvements in
featureCounts
handling of types other thanexon
. It is possible now to handle nuclearRNAseq data. Nuclear RNA has un-spliced RNA, and the whole transcript, including the introns, needs to be counted, e.g. by specifying--fc_count_type transcript
. - Support for outputting unaligned data to results folders.
- Added options to skip several steps
- Skip trimming using
--skipTrimming
- Skip BiotypeQC using
--skipBiotypeQC
- Skip Alignment using
--skipAlignment
to only use pseudo-alignment using Salmon
- Skip trimming using
- Adjust wording of skipped samples in pipeline output
- Fixed link to guidelines #203
- Add
Citation
andQuick Start
section toREADME.md
- Add in documentation of the
--gff
parameter
- Generate MultiQC plots in the results directory #200
- Get MultiQC to save plots as standalone files
- Get MultiQC to write out the software versions in a
.csv
file #185 - Use
file
instead ofnew File
to createpipeline_report.{html,txt}
files, and properly create subfolders
- Restore
SummarizedExperimment
object creation in the salmon_merge process avoiding increasing memory with sample size. - Fix sample names in feature counts and dupRadar to remove suffixes added in other processes
- Removed
genebody_coverage
process #195 - Implemented Pearsons correlation instead of Euclidean distance #146
- Add
--stringTieIgnoreGTF
parameter #206 - Removed unused
stringtie
channels forMultiQC
- Integrate changes in
nf-core/tools v1.6
template which resolved #90 - Moved process
convertGFFtoGTF
beforemakeSTARindex
#215 - Change all boolean parameters from
snake_case
tocamelCase
and vice versa for value parameters - Add SM ReadGroup info for QualiMap compatibility#238
- Obtain edgeR + dupRadar version information #198 and #112
- Add
--gencode
option for compatibility of Salmon and featureCounts biotypes with GENCODE gene annotations - Added functionality to accept compressed reference data in the pipeline
- Check that gtf features are on chromosomes that exist in the genome fasta file #274
- Maintain all gff features upon gtf conversion (keeps
gene_biotype
orgene_type
to makefeatureCounts
happy) - Add SortMeRNA as an optional step to allow rRNA removal #280
- Minimal adjustment of memory and CPU constraints for clusters with locked memory / CPU relation
- Cleaned up usage,
parameters.settings.json
and thenextflow.config
- Dependency list is now sorted appropriately
- Force matplotlib=3.0.3
- Picard 2.20.0 -> 2.21.1
- bioconductor-dupradar 1.12.1 -> 1.14.0
- bioconductor-edger 3.24.3 -> 3.26.5
- gffread 0.9.12 -> 0.11.4
- trim-galore 0.6.1 -> 0.6.4
- gffread 0.9.12 -> 0.11.4
- rseqc 3.0.0 -> 3.0.1
- R-Base 3.5 -> 3.6.1
- Dropped CSVtk in favor of Unix's simple
cut
andpaste
utilities - Added Salmon 0.14.2
- Added TXIMeta 1.2.2
- Added SummarizedExperiment 1.14.0
- Added SortMeRNA 2.1b
- Add tximport and summarizedexperiment dependency #171
- Add Qualimap dependency #202
[1.3] - 2019-03-26
- Added configurable options to specify group attributes for featureCounts #144
- Added support for RSeqC 3.0 #148
- Added a
parameters.settings.json
file for use with the newnf-core launch
helper tool. - Centralized all configuration profiles using nf-core/configs
- Fixed all centralized configs for offline usage
- Hide %dup in multiqc report
- Add option for Trimming NextSeq data properly (@jburos work)
- Fixing HISAT2 Index Building for large reference genomes #153
- Fixing HISAT2 BAM sorting using more memory than available on the system
- Fixing MarkDuplicates memory consumption issues following #179
- Use
file
instead ofnew File
to create thepipeline_report.{html,txt}
files to avoid creating local directories when outputting to AWS S3 folders - Fix SortMeRNA default rRNA db paths specified in assets/rrna-db-defaults.txt
- RSeQC 2.6.4 -> 3.0.0
- Picard 2.18.15 -> 2.20.0
- r-data.table 1.11.4 -> 1.12.2
- bioconductor-edger 3.24.1 -> 3.24.3
- r-markdown 0.8 -> 0.9
- csvtk 0.15.0 -> 0.17.0
- stringtie 1.3.4 -> 1.3.6
- subread 1.6.2 -> 1.6.4
- gffread 0.9.9 -> 0.9.12
- multiqc 1.6 -> 1.7
- deeptools 3.2.0 -> 3.2.1
- trim-galore 0.5.0 -> 0.6.1
- qualimap 2.2.2b
- matplotlib 3.0.3
- r-base 3.5.1
[1.2] - 2018-12-12
- Removed some outdated documentation about non-existent features
- Config refactoring and code cleaning
- Added a
--fcExtraAttributes
option to specify more than ENSEMBL gene names infeatureCounts
- Remove legacy rseqc
strandRule
config code. #119 - Added STRINGTIE ballgown output to results folder #125
- HiSAT index build now requests
200GB
memory, enough to use the exons / splice junction option for building.- Added documentation about the
--hisatBuildMemory
option.
- Added documentation about the
- BAM indices are stored and re-used between processes #71
- Fixed conda bug which caused problems with environment resolution due to changes in bioconda #113
- Fixed wrong gffread command line #117
- Added
cpus = 1
toworkflow summary process
#130
[1.1] - 2018-10-05
- Wrote docs and made minor tweaks to the
--skip_qc
and associated options - Removed the depreciated
uppmax-modules
config profile - Updated the
hebbe
config profile to use the newwithName
syntax too - Use new
workflow.manifest
variables in the pipeline script - Updated minimum nextflow version to
0.32.0
- #77: Added back
executor = 'local'
for theworkflow_summary_mqc
- #95: Check if task.memory is false instead of null
- #97: Resolved edge-case where numeric sample IDs are parsed as numbers causing some samples to be incorrectly overwritten.
[1.0] - 2018-08-20
This release marks the point where the pipeline was moved from SciLifeLab/NGI-RNAseq over to the new nf-core community, at nf-core/rnaseq.
View the previous changelog at SciLifeLab/NGI-RNAseq/CHANGELOG.md
In addition to porting to the new nf-core community, the pipeline has had a number of major changes in this version. There have been 157 commits by 16 different contributors covering 70 different files in the pipeline: 7,357 additions and 8,236 deletions!
In summary, the main changes are:
- Rebranding and renaming throughout the pipeline to nf-core
- Updating many parts of the pipeline config and style to meet nf-core standards
- Support for GFF files in addition to GTF files
- Just use
--gff
instead of--gtf
when specifying a file path
- Just use
- New command line options to skip various quality control steps
- More safety checks when launching a pipeline
- Several new sanity checks - for example, that the specified reference genome exists
- Improved performance with memory usage (especially STAR and Picard)
- New BigWig file outputs for plotting coverage across the genome
- Refactored gene body coverage calculation, now much faster and using much less memory
- Bugfixes in the MultiQC process to avoid edge cases where it wouldn't run
- MultiQC report now automatically attached to the email sent when the pipeline completes
- New testing method, with data on GitHub
- Now run pipeline with
-profile test
instead of using bash scripts
- Now run pipeline with
- Rewritten continuous integration tests with Travis CI
- New explicit support for Singularity containers
- Improved MultiQC support for DupRadar and featureCounts
- Now works for all users instead of just NGI Stockholm
- New configuration for use on AWS batch
- Updated config syntax to support latest versions of Nextflow
- Built-in support for a number of new local HPC systems
- CCGA, GIS, UCT HEX, updates to UPPMAX, CFC, BINAC, Hebbe, c3se
- Slightly improved documentation (more updates to come)
- Updated software packages
...and many more minor tweaks.
Thanks to everyone who has worked on this release!