diff --git a/README.Rmd b/README.Rmd index 4a001977..f2d460ce 100755 --- a/README.Rmd +++ b/README.Rmd @@ -63,8 +63,7 @@ below. For more details, see [workflow.md](/workflow.md). -1. Collect patient **WTS data** from the [DRAGEN RNA][dragen-rna] - pipeline including per-gene read counts and gene fusions. +1. Collect sample **WTS data** including per-gene read counts and gene fusions. 2. Add expression data from **[reference cohorts](#reference-data)** to get an idea about the expression levels of genes of interest in other cancer patient cohorts. The read counts are normalised, transformed and converted into a scale @@ -83,14 +82,12 @@ below. For more details, see [workflow.md](/workflow.md). plots presenting expression levels of the genes of interest. The report consists of several sections described [here](./articles/report_structure.md). -[dragen-rna]: - ## Reference data The reference expression data are available for **33 cancer types** and were derived from [external](#external-reference-cohorts) ([TCGA](https://tcga-data.nci.nih.gov/)) and [internal](#internal-reference-cohort) -([UMCCR](https://research.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group)) +([UMCCR](https://mdhs.unimelb.edu.au/centre-for-cancer-research/our-research/precision-oncology-research-group)) resources. ### External reference cohorts @@ -148,46 +145,45 @@ There are two rationales for using the internal reference cohort: ## Input data -`RNAsum` accepts [WTS](#wts) data processed by the `DRAGEN RNA` -pipeline. Additionally, the WTS data -can be integrated with [WGS](#wgs)-based data processed using the -`umccrise` pipeline. In the latter case, the genome-based findings from the -corresponding patient sample are incorporated into the report and are used as a +`RNAsum` accepts [WTS](#wts) data processed by the state-of-the-art bioinformatic +tools such as kallisto and salmon for quantification and Arriba for fusion calling. +RNAsum can aso process and combine fusion output from Illumina's Dragen pipeline. +Additionally, the WTS data can be integrated with [WGS](#wgs)-based data processed +using the tools discussed in the section [WGS](#wgs). + +In the latter case, the genome-based findings from the +corresponding sample are incorporated into the report and are used as a primary source for expression profile prioritisation. ### WTS The only required WTS input data are **read counts** provided in a -quantification file from the `DRAGEN RNA` pipeline. +quantification file. -#### DRAGEN RNA +#### RNA The table below lists all input data accepted in `RNAsum`: | Input file | Tool | Example | Required | | -------------------------------------- | ---------------------------------------------- | ----------------------------------------------- | ---------- | -| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [TEST.quant.sf][salmon-ex] | **Yes** | -| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [TEST.fusion_candidates.final][dragen-rna-ex] | No | +| Quantified transcript **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.sf][salmon-ex] | **Yes** | +| Quantified gene **abundances** | [salmon][salmon] ([description][salmon-res]) | [*.quant.gene.sf][salmon-ex2] | **Yes** | +| **Fusion gene** list | [Arriba][arriba] | [fusions.tsv][dragen-rna-ex] | No | +| **Fusion gene** list | [DRAGEN RNA][dragen-rna] | [*.fusion_candidates.final][dragen-rna-ex] | No | [salmon]: [salmon-ex]: +[salmon-ex2]: [salmon-res]: +[arriba]: +[arriba-res]: [dragen-rna]: [dragen-rna-ex]: -These files are expected to be organised in the following structure: - -```text -| -|____ - |____quant.sf - |____.fusion_candidates.final -``` ### WGS -`RNAsum` is designed to be compatible with WGS outputs generated from -`umccrise`. +`RNAsum` is designed to be compatible with WGS outputs. The table below lists all input data accepted in `RNAsum`: @@ -204,20 +200,6 @@ The table below lists all input data accepted in `RNAsum`: [manta]: [manta-ex]: -These files are expected to be organised in the following structure: - -```text -| -|____umccrised - |____ - |____pcgr - | |____-somatic.pcgr.snvs_indels.tiers.tsv - |____purple - | |____.purple.gene.cnv - |____structural - |____-manta.tsv -``` - ## Usage ```{bash echo=TRUE, eval=FALSE} @@ -242,7 +224,7 @@ echo "" Human reference genome ***[GRCh38](https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.39)*** -(*Ensembl* based annotation version ***86***) is used for gene annotation by +(*Ensembl* based annotation version ***105***) is used for gene annotation by default. GRCh37 is no longer supported. ### Examples @@ -250,15 +232,16 @@ default. GRCh37 is no longer supported. Below are `RNAsum` CLI commands for generating HTML reports under different data availability scenarios: -1. [WTS data only](#1-wts-data-only) -2. [WTS and WGS data](#2-wts-and-wgs-data) + +1. [WTS and WGS data](#1-wts-and-wgs-data) +2. [WTS data only](#2-wts-data-only) 3. [WTS WGS and clinical data](#3-wts-wgs-and-clinical-data) **Note** * Example data is provided in the `/inst/rawdata/test_data` folder of the GitHub [repo][rnasum-gh]. -* The `RNAsum` runtime should be less than **20 minutes** using **16GB RAM** +* The `RNAsum` runtime should be less than **15 minutes** using **16GB RAM** memory and **1 CPU**. #### 1. WTS and WGS data @@ -267,9 +250,8 @@ This is the **most frequent and preferred case**, in which the [WGS](#wgs)-based findings will be used as a primary source for expression profile prioritisation. The genome-based results can be incorporated into the report by specifying the location of the corresponding -`umccrise` output files (including results from `PCGR`, `PURPLE`, and `Manta`) -using the `--umccrise` argument. The **`Mutated genes`**, -**`Structural variants`** and **`CN altered genes`** report sections will +output files (including results from `PCGR`, `PURPLE`, and `Manta`). +The **`Mutated genes`**, **`Structural variants`** and **`CN altered genes`** report sections will contain information about expression levels of the mutated genes, genes located within detected SVs and CN altered regions, respectively. The results in the **`Fusion genes`** section will be ordered based on the @@ -280,9 +262,12 @@ adenocarcinoma dataset is used as reference cohort (`--dataset TEST `). rnasum.R \ --sample_name test_sample_WTS \ --dataset TEST \ - --dragen_rnaseq inst/rawdata/test_data/dragen \ + --manta_tsv \ + --pcgr_tiers_tsv \ + --purple_gene_tsv \ + --salmon \ + --arriba_tsv \ --report_dir inst/rawdata/test_data/dragen/RNAsum \ - --umccrise inst/rawdata/test_data/umccrised/test_sample_WGS \ --save_tables FALSE ``` @@ -292,7 +277,7 @@ The HTML report `test_sample_WTS.RNAsum.html` will be created in the #### 2. WTS data only In this scenario, only [WTS](#wts) data will be used and only expression levels -of key **[`UMCCR Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**, +of key **[`Cancer genes`](https://github.com/umccr/umccrise/blob/master/workflow.md#key-cancer-genes)**, **`Fusion genes`**, **`Immune markers`** and homologous recombination deficiency genes (**`HRD genes`**) will be reported. Moreover, gene fusions reported in the `Fusion genes` report section will not contain information about evidence @@ -303,7 +288,8 @@ is used as the reference cohort (`--dataset TEST`). rnasum.R \ --sample_name test_sample_WTS \ --dataset TEST \ - --dragen_rnaseq inst/rawdata/test_data/dragen \ + --salmon inst/rawdata/test_data/dragen \ + --arriba \ --report_dir inst/rawdata/test_data/dragen/RNAsum \ --save_tables FALSE ``` @@ -386,4 +372,4 @@ that are presented in the HTML report. #### Code of Conduct -The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md). +The code of conduct can be accessed [here](./CODE_OF_CONDUCT.md)